Does Accel-Sim + Latest NVBit work with A100 Processor ?

132 views
Skip to first unread message

HT MICRO

<htmicro2020@gmail.com>
unread,
Nov 28, 2022, 3:02:19 PM11/28/22
to accel-sim
When I run a NVBit tracer provided on MNIST applications built with latest cudnn on a A100 based system (DGX / HGX) I see an assert (related to  handling LDGSTS with two mem refs)
If I bypass the assert, there are issues with post-processing the traces that are generated.  (Maybe expected, but I wanted to try the workflow described).

This is my first time using Accel-sim.  I am looking for any guidance on using NVBit / Accel-Sim on A100 / H100 systems.

Appreciate your help.

-Hemant

Rodrigo Huerta Gañan

<rodrigo.huerta.ganan@upc.edu>
unread,
Nov 29, 2022, 4:15:03 AM11/29/22
to accel-sim
I don't know if A100 is supported without touching anything. Maybe you need to change hardcoded limit of SMs and something more. About getting traces, that problem is solved in the tracer that can be found in the dev branch. I tried and I got successfully traces of a 3080ti and Deepbench that were crashing in the release branch.

Mahmood

<mahmood.nt@gmail.com>
unread,
Nov 29, 2022, 7:19:58 AM11/29/22
to Rodrigo Huerta Gañan, accel-sim

See this commit for the tracer which records LDGSTS

https://github.com/accel-sim/accel-sim-framework/commit/e02c99dbadefc0b9dc95317100be2a446eec142c

However, even if you trace this instruction, the detailed implementation of the instruction is not ready yet. Hence, you will get error about the source (or destination) memory space.

 

Regards,

Mahmood

--
You received this message because you are subscribed to the Google Groups "accel-sim" group.
To unsubscribe from this group and stop receiving emails from it, send an email to accel-sim+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/accel-sim/62282d38-2aee-4ad3-989b-bcdaf0ad97bdn%40googlegroups.com.

 

HT MICRO

<htmicro2020@gmail.com>
unread,
Nov 29, 2022, 1:34:24 PM11/29/22
to accel-sim
Hello Mahmood,

Thank You for explaining the LDGSTS issue and link to the commit.
How does one contribute to this project to help with completing  the "detailed implementation of instructions" to allow using A100 based systems ?   Particularly,  how big is that effort or putting it another way,  how far is this project from being able to trace A100 binaries. 

Rodrigo mentioned above that the dev branch is already able to get traces for 3080ti and Deepbench

Best Regards,
Hemant


On Tuesday, November 29, 2022 at 6:19:58 AM UTC-6 wrote:

See this commit for the tracer which records LDGSTS

https://github.com/accel-sim/accel-sim-framework/commit/e02c99dbadefc0b9dc95317100be2a446eec142c

However, even if you trace this instruction, the detailed implementation of the instruction is not ready yet. Hence, you will get error about the source (or destination) memory space.

 

Regards,

Mahmood

 

From: Rodrigo Huerta Gañan
Sent: Tuesday, November 29, 2022 10:15 AM
To: accel-sim
Subject: Re: Does Accel-Sim + Latest NVBit work with A100 Processor ?

 

I don't know if A100 is supported without touching anything. Maybe you need to change hardcoded limit of SMs and something more. About getting traces, that problem is solved in the tracer that can be found in the dev branch. I tried and I got successfully traces of a 3080ti and Deepbench that were crashing in the release branch.

On Monday, November 28, 2022 at 9:02:19 PM UTC+1  wrote:

When I run a NVBit tracer provided on MNIST applications built with latest cudnn on a A100 based system (DGX / HGX) I see an assert (related to  handling LDGSTS with two mem refs)

If I bypass the assert, there are issues with post-processing the traces that are generated.  (Maybe expected, but I wanted to try the workflow described).

 

This is my first time using Accel-sim.  I am looking for any guidance on using NVBit / Accel-Sim on A100 / H100 systems.

 

Appreciate your help.

 

-Hemant

Mahmood

<mahmood.nt@gmail.com>
unread,
Nov 29, 2022, 1:38:47 PM11/29/22
to HT MICRO, accel-sim

I am not part of Accelsim team. Though you can create your own branch, modify/update missing parts and create pull requests like any other Github projects.

Regarding the dev branch, I encountered a problem with the dev code and it turned out the loop control part is not correct. See difference from Connie120’s pull request.

 

Regards,

Mahmood

Reply all
Reply to author
Forward
0 new messages