Scoreboard deadlock detected

267 views
Skip to first unread message

Yue Chen

<c370797515@gmail.com>
unread,
Apr 2, 2022, 11:31:28 AM4/2/22
to accel-sim
Hello, I am running a custom workload on gpgpu-sim, and got this error log.

---------------------------
Release register - warp:3, reg: 1428
Reserved Register - warp:3, reg: 617
Reserved register - warp:3, reg: 617
warp_inst: Register Released - warp:3, reg: 1428
Release register - warp:3, reg: 1428
Reserved Register - warp:3, reg: 617
Reserved register - warp:3, reg: 617


GPGPU-Sim uArch: ERROR ** deadlock detected: last writeback core 2 @ gpu_sim_cycle 47478 (+ gpu_tot_sim_cycle 4294867296) (52522 cycles ago)
GPGPU-Sim uArch: DEADLOCK  shader cores no longer committing instructions [core(# threads)]:
GPGPU-Sim uArch: DEADLOCK  0(128) 1(128) 2(128) 3(96)
------------------------------

Then i used gdb and macro of shader_trace.h look into the cause.  I found the gpgpu-sim stay around function checkCollision() and don't issue any instruction.

The instructions as follow

ld.const.u8 %r593, [%rd380];
ld.const.u8 %r594, [%rd372];
ld.const.u8 %r595, [%rd384];
shl.b32 %r596, %r595, 24;

The dst register of ld.const.u8 not release any more(three ld.const all). so shl.b32 stay RAW hazard. It looks like ld.const.u8 cause this result. But I can't work out the reason why(maybe another instruction lead to deadlock?)

gpgpu-sim environments is in Docker:

CUDA 10.1
GPGPU-Sim 4.2.0 dev branch (in 4.0.1 this bug produce also)
Ubuntu 18.04.4 LTS
config as SM75_RTX2060

how to reproduce it:

I make a trivial environment in GPGPU-Sim 4.0 and still reproduce it. you can reproduce it by  this simple code,when change macro man_x to 4, the bug disappear.

#include <cuda_runtime.h>
#include <iostream>

__constant__ uint8_t const_memory[16][16];

#define max_n 8

__global__
void test(uint32_t threads, uint32_t* output) {
    const uint32_t thread = (blockDim.x * blockIdx.x + threadIdx.x);
    if(thread < threads)
    {
        uint8_t temp1 = const_memory[thread % max_n][thread % max_n];
        output[thread & 0xf] = temp1;
    }
}


int main(int argc, char* argv[]) {
    uint32_t threads = 1024;
    uint32_t threadsperblock = 128;
    uint32_t* d_output = nullptr;
    cudaMalloc(&d_output, 20 * sizeof(uint32_t));

    dim3 grid((threads + threadsperblock - 1) / threadsperblock);
    dim3 block(threadsperblock);
    test<<<grid, block>>>(threads, d_output);
    return 0;
}


Thank you!



Junrui Pan

<panjunrui100@gmail.com>
unread,
Apr 2, 2022, 11:18:08 PM4/2/22
to accel-sim
Hi, 

Thank you for your interest. Have you tried other configs? 
And I'm assuming this is PTX mode?
There could be quite a lot of reasons for this kind of issue. Could you please try turning on the trace mode? Uncomment these lines in the config file.

Please let me know how it loooks.

Thanks,
Junrui

Yue Chen

<c370797515@gmail.com>
unread,
Apr 3, 2022, 12:26:05 AM4/3/22
to accel-sim
Thanks for your reply

Sure I running in PTX mode.

I try your config in the simple code I refer before.
I upload truncated log(because scoreboard fails keep loop) and ptx generated by gpgpusim in addition




Thanks for your reply

Sure I running in PTX mode.

I try your config in the simple code I refer before.
I upload truncated log(because scoreboard fails keep loop) and ptx generated by gpgpusim in addition
gpgpusim-log
ptxfile

Junrui Pan

<panjunrui100@gmail.com>
unread,
Apr 4, 2022, 12:03:51 AM4/4/22
to accel-sim
Looks weird that your ptx file is SM30 while you are running rtx 2060 config. This line:
Extracting PTX file and ptxas options    1: vec_add.1.sm_30.ptx -arch=sm_30

Looks like you compiled the binary with only sm30 compute capability. Could you please check that? 
Ideally, the gpgpu-sim will extract all version of compute capability ptx from the fatbin (fat binary). Since it only extracted sm_30, I would assume this is because it compiled only with sm_30. Please confirm that and try adding the correct version of RTX 2060. 

Thanks,
Junrui

Yue Chen

<c370797515@gmail.com>
unread,
Apr 5, 2022, 12:02:09 PM4/5/22
to accel-sim
It extract sm_30 default may related to CUDA version. I just compile with common flag --cudart shared. 

These days I change my station from CUDA 10.1 to CUDA 11.0. Now it extract sm_52 default, but problem still produce.

Could you tell me this program can run normally in your station? I think the problem is produced by gpgpu-sim

In addition, I upload new files from new station
ptxfile
gpgpusim.config
gpgpusim-log

Junrui Pan

<panjunrui100@gmail.com>
unread,
Apr 5, 2022, 9:18:42 PM4/5/22
to accel-sim
Let me see if I have time.

But to be honest, sm_52 could still cause problem.
Turing should be SM_75

Yue Chen

<c370797515@gmail.com>
unread,
Apr 6, 2022, 1:34:03 AM4/6/22
to accel-sim
I have fixed it, the problem produced in constant_cycle(). Can I make a pull request to dev branch?

Junrui Pan

<panjunrui100@gmail.com>
unread,
Apr 6, 2022, 9:17:51 PM4/6/22
to accel-sim
Glad to hear that!

Please do so and than you for your contribution. We will review it.

Thanks

Ardhi Wiratama Baskara Yudha

<ardhy.yudha@gmail.com>
unread,
Dec 10, 2023, 12:24:48 AM12/10/23
to accel-sim
Hello, 

How did you solve this issue? I didn't see any PR related to this. Thanks!

Reply all
Reply to author
Forward
0 new messages