To generate traces for the DNN application

244 views
Skip to first unread message

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Mar 31, 2021, 11:18:46 AM3/31/21
to accel-sim
Hello,
I want to generate traces for the DNN application such as given below:

The above LeNet architecture is trained and tested for the MNIST dataset in the given code. May I get the detailed procedure to generate the traces and run them in the Accel-Sim framework? What should be the directory structure for Cuda codes and regarding data files? What commands shall be used? Any crucial strategy to avoid memory problems?
I really appreciate any help you can provide. Thanks.

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Mar 31, 2021, 11:32:51 AM3/31/21
to accel-sim
Hello,

I would highly encourage you to read and go through the accel-sim readme tutorial and the nvbit tracer tutorial listed below:

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 3, 2021, 5:11:12 PM4/3/21
to accel-sim
Hi,
I am reading a tutorial to generate traces for specific individual application:

If our application, e.g., vectoradd, requires data, how to specify it in the following command?
LD_PRELOAD=./tracer_tool/tracer_tool.so ./nvbit_release/test-apps/vectoradd/vectoradd

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Apr 3, 2021, 5:23:02 PM4/3/21
to accel-sim
just add it as a normal std input argument, so execute the program as normal and add the LD_PRELOAD in the header. For example, if the vector add want an argument value of 1000:

LD_PRELOAD=./tracer_tool/tracer_tool.so ./nvbit_release/test-apps/vectoradd/vectoradd 1000

You can google and read more about LD_PRELOAD trick. It is a Linux ldd feature.





Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 3, 2021, 5:27:08 PM4/3/21
to accel-sim
It's simple! Thank you very much.

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 3, 2021, 9:15:13 PM4/3/21
to accel-sim
Hi Mahmoud Sir,
Now, I want to generate traces for the training and testing of the LeNet on the MNIST dataset. I make the executable of the application and the same directory has /data subdirectory which contains training and testing dataset (images, labels). This code works well on the real GPU. But, when I issue the following command, it does not generate the traces.
$ LD_PRELOAD=./tracer_tool/tracer_tool.so ./nvbit_release/test-apps/LeNet5_Training_Testing/lenet
It is generating 'traces' directory with kernelslist (having Cuda memcpy commands only) and empty 'stats.csv' files.
Please guide to get the traces for this application. Thank you.

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Apr 4, 2021, 9:20:47 AM4/4/21
to accel-sim
Hi,

1- Have you used any of the kernel limits environment DYNAMIC_KERNEL_LIMIT_START or DYNAMIC_KERNEL_LIMIT_END? If so, please ensure to clear them or use a new prompt window form scratch.

2- The is the code of nvbit tracer and this is where we trace kernel. You can add checkpoint print statements there to see what is going on.
3- Please, ensure to have nvbit requirements listed here:

4- If you have any error or inquiries regarding nvbit, you can ask the nvbit team as they are more expert than us in nvbit.
You can report an issue to them here:

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 5, 2021, 10:09:26 AM4/5/21
to accel-sim
Hello Sir,
My tracer works well with the  rodinia_2.0-ft, so I guess it is satisfying environment requirements. 
So, I am trying to see what is going behind the scene. I am using the LeNet example given here. I generate executable with the following command as compute_20 is not supported in CUDA 11:
$ nvcc -arch=sm_60 *.cu -lcublas -o lenet

When I launch the application on real GPU, I get output as:
millisecond : 0.003392
millisecond : 0.017792
millisecond : 0.012160
millisecond : 0.017056
millisecond : 0.012288
millisecond : 0.039968
millisecond : 0.023648
millisecond : 0.012480
Learning
error: 6.247417e-01, time_on_gpu: 10.856199

 Time - 10.856199
Error Rate: 22.60%


But when I launch it for trace generation then I get output as:
millisecond : 0.003232
millisecond : 0.061184
millisecond : 0.037984
millisecond : 0.039712
millisecond : 0.037376
millisecond : 0.061504
millisecond : 0.049216
millisecond : 0.038272
Learning
error: -nan, time_on_gpu: 0.000000

 Time - 0.000000
Error Rate: -nan%


I guess that during trace generation it is not reading the dataset. Therefore, I am getting error and Error Rate as nan. Is there any requirement of data format for the tracer tool? Or am I missing any tracer tool-specific argument while generating the executable by nvcc?  Kindly help. Thank you.

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Apr 5, 2021, 10:18:35 AM4/5/21
to Ajinkya Bankar, accel-sim
what is the command that you used on real HW without tracer? and what is the command you used for tracing?
have you tried to dig into the tracer code by yourself and add checkpoint print statements as mentioned in my last email, to see how the process/kernel launches on the tracer is happening? 

--
You received this message because you are subscribed to a topic in the Google Groups "accel-sim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/accel-sim/_R4_jiaHtIw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to accel-sim+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/accel-sim/a8bafce2-3c46-46d7-86c8-c5a938201851n%40googlegroups.com.


--
Thanks!
-Mahmoud

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 5, 2021, 10:26:33 AM4/5/21
to accel-sim
Hello,
Thanks for the reply. I generate the application executable with:
$ nvcc -arch=sm_60 *.cu -lcublas -o lenet
and use $ ./lenet to execute it on the real HW without tracer. I use the following command to get the traces:
$ LD_PRELOAD=./tracer_tool/tracer_tool.so ./nvbit_release/test-apps/LeNet5_Training_Testing/lenet
I have kept /data directory in the /LeNet5_Training_Testing where my application executable is present.
It's difficult to understand the tracer code, so I haven't tested yet how the kernel launches on the tracer.

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 5, 2021, 11:14:38 AM4/5/21
to accel-sim

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 7, 2021, 11:55:14 AM4/7/21
to accel-sim
Hi,
I could figure out that the problem was due to /data directory path was not specified properly during the launch of the trace generation process.
But, after specifying it correctly, I get the following error before it starts to generate the traces:
lenet: arch/gm10x_hal.cpp:181: void set_imm_relative_control_flow(uint64_t*, int64_t): Assertion `!IS_LARGER_THAN_24BIT(imm)' failed.
Aborted (core dumped)


Please help to debug the problem. Thanks.

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Apr 7, 2021, 12:22:33 PM4/7/21
to accel-sim
What is GPU hardware platform do you have?

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Apr 7, 2021, 1:10:34 PM4/7/21
to accel-sim
Please see this:
It seems this is an Nvbit bug in Kepler and Maxwell cards. If you move to Volta, Turing or Ampere cards, this bug may be gone.

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 7, 2021, 2:32:03 PM4/7/21
to accel-sim
I have an NVIDIA GeForce 1080Ti card. It can generate the traces for other applications but giving this error for LeNet. So, can we say that it's a card problem?

Mahmoud Khairy

<khairy2011@gmail.com>
unread,
Apr 7, 2021, 2:45:14 PM4/7/21
to Ajinkya Bankar, accel-sim
Yes, it is a hardware problem as the Nvbit issue below has shown. You can follow up with the Nvbit team on this, or you can move to other hardware platforms that I mentioned.

Do you have multiple hardware on your system? please run "nvidia-smi" command and see how many GPUs do you have and ensure you are using the right hardware. 



--
Thanks!
-Mahmoud

Ajinkya Bankar

<ajinkyasbankar@gmail.com>
unread,
Apr 7, 2021, 2:48:48 PM4/7/21
to accel-sim
I have two GPUs on my machine. But, unfortunately, they are the same. I will try other options.
Thank you for the help.
Reply all
Reply to author
Forward
0 new messages