==35449== NVPROF is profiling process 35449, command: python test.py CUDA: 0.0146556854248 Cubic (CPU): 0.102976298332 ==35449== Profiling application: python test.py ==35449== Profiling result: Time(%) Time Calls Avg Min Max Name 92.21% 160.40ms 11 14.582ms 14.573ms 14.591ms cudaPy_eval_5F_cubic_5F_cuda_2E_vec_5F_eval_5F_cubic_5F_spline_5F_3_24_1_2E_array_28_float64_2C__20_1d_2C__20_A_29__2E_array_28_float64_2C__20_1d_2C__20_A_29__2E_array_28_int32_2C__20_1d_2C__20_A_29__2E_array_28_float64_2C__20_3d_2C__20_A_29__2E_array_28_float64_2C__20_2d_2C__20_A_29__2E_array_28_float64_2C__20_1d_2C__20_A_29__2E_array_28_float64_2C__20_2d_2C__20_A_29__2E_array_28_float64_2C__20_2d_2C__20_A_29_ 5.96% 10.371ms 8 1.2963ms 864ns 7.7480ms [CUDA memcpy HtoD] 1.83% 3.1808ms 1 3.1808ms 3.1808ms 3.1808ms [CUDA memcpy DtoH] ==35449== API calls: Time(%) Time Calls Avg Min Max Name 51.83% 186.86ms 1 186.86ms 186.86ms 186.86ms cuCtxCreate 43.08% 155.32ms 2 77.660ms 14.494ms 140.83ms cuCtxSynchronize 3.08% 11.115ms 8 1.3893ms 13.512us 7.9849ms cuMemcpyHtoD 1.10% 3.9722ms 1 3.9722ms 3.9722ms 3.9722ms cuMemcpyDtoH 0.23% 826.08us 1 826.08us 826.08us 826.08us cuLinkAddData 0.17% 616.49us 8 77.060us 10.158us 185.57us cuMemAlloc 0.13% 481.89us 8 60.235us 7.0910us 195.87us cuMemFree 0.13% 472.65us 1 472.65us 472.65us 472.65us cuModuleLoadDataEx 0.08% 295.02us 1 295.02us 295.02us 295.02us cuLinkComplete 0.06% 223.88us 11 20.352us 15.322us 47.780us cuLaunchKernel 0.05% 196.30us 1 196.30us 196.30us 196.30us cuModuleUnload 0.01% 48.672us 1 48.672us 48.672us 48.672us cuLinkCreate 0.01% 43.044us 1 43.044us 43.044us 43.044us cuDeviceGetName 0.01% 26.410us 30 880ns 491ns 3.0850us cuCtxGetCurrent 0.00% 3.8480us 5 769ns 440ns 1.3050us cuFuncGetAttribute 0.00% 2.9880us 2 1.4940us 846ns 2.1420us cuDeviceGetCount 0.00% 1.8360us 1 1.8360us 1.8360us 1.8360us cuModuleGetFunction 0.00% 1.3050us 1 1.3050us 1.3050us 1.3050us cuLinkDestroy 0.00% 1.3020us 2 651ns 601ns 701ns cuDeviceGet 0.00% 1.2380us 1 1.2380us 1.2380us 1.2380us cuDeviceGetAttribute 0.00% 1.1530us 1 1.1530us 1.1530us 1.1530us cuDeviceComputeCapability