bandwidthtest
[t_azu@linux]$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GT 640
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6407.4
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6385.8
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 25084.9
nbody
[t_azu@linux]$ ./nbody -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure perfomance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GT 640" with compute capability 3.0
> Compute 3.0 CUDA device: [GeForce GT 640]
2048 bodies, total time for 10 iterations: 3.795 ms
= 11.052 billion interactions per second
= 221.043 single-precision GFLOP/s at 20 flops per interaction
[t_azu@linux]$ ./nbody -benchmark -fp64 -benchmark
> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GT 640" with compute capability 3.0
> Compute 3.0 CUDA device: [GeForce GT 640]
2048 bodies, total time for 10 iterations: 59.168 ms
= 0.709 billion interactions per second
= 21.266 double-precision GFLOP/s at 30 flops per interaction