GTX Titanのnbody benchmark
とりあえずnbodyを。
float
[t_azu@linux]$ ./nbody -benchmark Run "nbody -benchmark [-numbodies=<numBodies>]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation) > Windowed mode > Simulation data stored in video memory > Single precision floating point simulation > 1 Devices used for simulation GPU Device 0: "GeForce GTX TITAN" with compute capability 3.5 > Compute 3.5 CUDA device: [GeForce GTX TITAN] 14336 bodies, total time for 10 iterations: 30.587 ms = 67.193 billion interactions per second = 1343.858 single-precision GFLOP/s at 20 flops per interaction
double
[t_azu@linux]$ ./nbody --benchmark -fp64 Run "nbody -benchmark [-numbodies=<numBodies>]" to measure perfomance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies=<N> (number of bodies (>= 1) to run in simulation) -device=<d> (where d=0,1,2.... for the CUDA device to use) -numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy=<file.bin> (load a tipsy model file for simulation) > Windowed mode > Simulation data stored in video memory > Double precision floating point simulation > 1 Devices used for simulation GPU Device 0: "GeForce GTX TITAN" with compute capability 3.5 > Compute 3.5 CUDA device: [GeForce GTX TITAN] 14336 bodies, total time for 10 iterations: 103.493 ms = 19.858 billion interactions per second = 595.751 double-precision GFLOP/s at 30 flops per interaction