ELSA GT640 LP CUDAの基本性能（bandwidthtest, nbody）

bandwidthtest

[t_azu@linux]$ ./bandwidthTest 
[CUDA Bandwidth Test] - Starting...
Running on...

 Device 0: GeForce GT 640
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			6407.4

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			6385.8

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(MB/s)
   33554432			25084.9

nbody

[t_azu@linux]$ ./nbody -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure perfomance.
	-fullscreen       (run n-body simulation in fullscreen mode)
	-fp64             (use double precision floating point values for simulation)
	-hostmem          (stores simulation data in host memory)
	-benchmark        (run benchmark to measure performance) 
	-numbodies=<N>    (number of bodies (>= 1) to run in simulation) 
	-device=<d>       (where d=0,1,2.... for the CUDA device to use)
	-numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
	-compare          (compares simulation results running once on the default GPU and once on the CPU)
	-cpu              (run n-body simulation on the CPU)
	-tipsy=<file.bin> (load a tipsy model file for simulation)

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GT 640" with compute capability 3.0

> Compute 3.0 CUDA device: [GeForce GT 640]
2048 bodies, total time for 10 iterations: 3.795 ms
= 11.052 billion interactions per second
= 221.043 single-precision GFLOP/s at 20 flops per interaction
[t_azu@linux]$ ./nbody -benchmark -fp64 -benchmark

> Windowed mode
> Simulation data stored in video memory
> Double precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GT 640" with compute capability 3.0

> Compute 3.0 CUDA device: [GeForce GT 640]
2048 bodies, total time for 10 iterations: 59.168 ms
= 0.709 billion interactions per second
= 21.266 double-precision GFLOP/s at 30 flops per interaction