Hello Sir,
My tracer works well with the
rodinia_2.0-ft, so I guess it is satisfying environment requirements.
So, I am trying to see what is going behind the scene. I am using the LeNet example given
here. I generate executable with the following command as
compute_20 is not supported in CUDA 11:
$ nvcc -arch=sm_60 *.cu -lcublas -o lenet
When I launch the application on real GPU, I get output as:
millisecond : 0.003392
millisecond : 0.017792
millisecond : 0.012160
millisecond : 0.017056
millisecond : 0.012288
millisecond : 0.039968
millisecond : 0.023648
millisecond : 0.012480
Learning
error: 6.247417e-01, time_on_gpu: 10.856199
Time - 10.856199
Error Rate: 22.60%
But when I launch it for trace generation then I get output as:
millisecond : 0.003232
millisecond : 0.061184
millisecond : 0.037984
millisecond : 0.039712
millisecond : 0.037376
millisecond : 0.061504
millisecond : 0.049216
millisecond : 0.038272
Learning
error: -nan, time_on_gpu: 0.000000
Time - 0.000000
Error Rate: -nan%
I guess that during trace generation it is not reading the dataset. Therefore, I am getting error and Error Rate as nan. Is there any requirement of data format for the tracer tool? Or am I missing any tracer tool-specific argument while generating the executable by nvcc? Kindly help. Thank you.