Support tensor core instruction for Turing architecture

sjlee

<sunjung900407@gmail.com>

unread,

Jul 1, 2021, 10:16:06 PM7/1/21

to accel-sim

Hi all,

First, thank you for your work and I think that Accel-sim is a powerful tool to simulate GPU environments.

I already verify that Accel-sim supports cuDNN and tensor core (HMMA.884 instruction) by using not only your provided code (like Deepbench) but also using the latest CUTLASS code provided by NVIDIA with Volta generation.

However, when I use HMMA.1688 or IMMA.8816 instructions which are FP16 or integer instruction for Turing generation by using the latest CUTLASS code, I have errors like this:

--------------------------------------------------------------------------------

accel-sim.out: abstract_hardware_model.cc:316: void warp_inst_t::generate_mem_accesses(): Assertion `0' failed.

./accel-sim/accel-sim-framework/util/job_launching/../../sim_run_11.0/cutlass_profiler/4096_4096_1024/RTX2060/slurm.sim: line 51: 7199 Aborted (core dumped) ./accel-sim/accel-sim-framework/util/job_launching/../../sim_run_11.0/gpgpu-sim-builds/accelsim-commit-4c2bf09a79d6b57bb10fe1898700930a5dd5531f_modified_0.0/accel-sim.out -config ./gpgpusim.config -trace ./traces/kernelslist.g

--------------------------------------------------------------------------------

Thus, I want to know whether Accel-sim can support HMMA.1688 or IMMA.8816 instructions or not.

I also want to know that Accel-sim can support these instructions but doesn’t support newly added LSDM instructions as you mentioned in other conversations. If this is correct, I will try to replace LDSM instruction with LDS instruction.

Mahmoud Khairy

<khairy2011@gmail.com>

unread,

Jul 6, 2021, 10:55:44 AM7/6/21

to accel-sim

HMMA, IMMA and LDSM are all supported in Turing architecture as shown in the ISA def file of turning below. They may not be modeled 100% correctly or they may be some corner cases that are not handled correctly.

https://github.com/accel-sim/accel-sim-framework/blob/release/gpu-simulator/ISA_Def/turing_opcode.h

I think the error you have is from LDSM. Yes, please try to replace LDSM with LDS and see if this fixes the issue. But, remember replacing with just one instruction (1:1) may generate an unequal number of memory requests.

sjlee

<sunjung900407@gmail.com>

unread,

Jul 8, 2021, 8:38:25 PM7/8/21

to accel-sim

Thank you for your reply.

I will check referred source code and I will replace whole source code about the LSDM considering your advice!

2021년 7월 6일 화요일 오후 11시 55분 44초 UTC+9에 khair...@gmail.com님이 작성:

Reply all

Reply to author

Forward