倍精度で計測

$ /home/t_azu/NVIDIA_GPU_Computing_SDK/C/bin/linux/release/nbody -benchmark -fp64
Run "nbody -benchmark [-n=<numBodies>]" to measure perfomance.
	-fullscreen (run n-body simulation in fullscreen mode)
	-fp64       (use double precision floating point values for simulation)

> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> Compute 2.1 CUDA device: [GeForce GT 430]
2048 bodies, total time for 10 iterations: 80.215 ms
= 0.523 billion interactions per second
= 15.687 double-precision GFLOP/s at 30 flops per interaction

約16GFLOPS。理論性能では単精度:倍精度は12:1との話だったと思いますが、実測ではほぼ6:1。