Printing from Kernel

97 views

Skip to first unread message

Daniel Gerzhoy

unread,

Aug 1, 2017, 3:13:57 PM8/1/17

to gem5-gpu Developers List

Hi I'm attempting to debug an issue with a benchmark (more complex than the one I am about to include here), and I have included prints in my kernel to determine what is going on. However whatever the print is, the simulator segfaults. Before I go digging into the code to see whats wrong, I want to know if there is a proper way to print from a kernel in gem5-gpu. Thanks.

The program hello.cu:

#include <stdio.h>

#include <cutil.h>

__global__ void Hello()

{

printf("Hello from kernel!\n");

}

int main(int argc, char *argv[]) {

printf("Hello from main\n");

Hello<<<1,1>>>();

return 0;

}

Result of makefile (note, -u ceil etc from more complicated benchmark with math functions):

nvcc -O3 -c -arch sm_20 --keep --compiler-options -fno-strict-aliasing \

-DGEM5_FUSION -I. -I/homes/dgerzhoy/cuda/include/ -I/homes/dgerzhoy/NVIDIA_GPU_Computing_SDK/C/common/inc/ \

-I../../../gem5/util/m5 -I../../libcuda -L/homes/dgerzhoy/NVIDIA_GPU_Computing_SDK/C/lib -lcutil -DUNIX hello.cu -o hello

python ../../common/sizeHack.py -f hello.cu.cpp -t sm_20

the place is 561864

change will be:

ub.c"

static void __sti____cudaRegisterAll_13_hello_cpp1_ii_acb85650(void) { __cudaFatCubinHandle = __cudaRegisterFatBinary2((void*)(&__fatDeviceText), sizeof(__deviceText_$compute_20$)); atexit(__cudaUnregisterBinaryUtil); __cudaRegisterFunction(__cudaFatCubinHandle, (const char*)((void ( *)(void))

g++-4.4 -DOUTPUT -DDEBUG -O3 -g -c hello.cu.cpp -o hello.cu_o

g++-4.4 -DOUTPUT -DDEBUG -O3 -DGEM5_FUSION hello.cu_o -L../../libcuda -lcuda \

-L/homes/dgerzhoy/NVIDIA_GPU_Computing_SDK/C/lib \

-lz -static -static-libgcc -o gem5_fusion_hello -L/usr/lib64 -lcutil_x86_64 -lm5op_x86 -lm -lc -u sin -u ceil -u log -u sqrt -u floor -u exp2 -u exp -u log2 -u log1p -u remquo

make: warning: Clock skew detected. Your build may be incomplete.

run_command (from benchmark directory):

../../../gem5/build/X86_VI_hammer_GPU/gem5.opt ../../../gem5-gpu/configs/se_fusion.py --access-host-pagetable -n 4 -c ./gem5_fusion_hello

GDB Backtrace:

Program received signal SIGSEGV, Segmentation fault.

0x0000000000c721b6 in st_impl (pI=0xf475c80, thread=0x5596c40) at build/X86_VI_hammer_GPU/gpgpu-sim/cuda-sim/instructions.cc:3551

3551 mem->write(addr,size/8,&data.s64,thread,pI);

(gdb) bt

#0 0x0000000000c721b6 in st_impl (pI=0xf475c80, thread=0x5596c40) at build/X86_VI_hammer_GPU/gpgpu-sim/cuda-sim/instructions.cc:3551

#1 0x0000000000c63585 in ptx_thread_info::ptx_exec_inst (this=0x5596c40, inst=..., lane_id=0) at build/X86_VI_hammer_GPU/gpgpu-sim/cuda-sim/opcodes.def:106

#2 0x0000000000c483ca in core_t::execute_warp_inst_t (this=0x459d480, inst=..., warpId=0) at build/X86_VI_hammer_GPU/gpgpu-sim/abstract_hardware_model.cc:772

#3 0x0000000000d83c7a in func_exec_inst (inst=..., this=0x459d480) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:688

#4 shader_core_ctx::issue_warp (this=0x459d480, pipe_reg_set=..., next_inst=next_inst@entry=0xf475c80, active_mask=..., warp_id=warp_id@entry=0) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:707

#5 0x0000000000d844b8 in scheduler_unit::cycle (this=0x6457630) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:865

#6 0x0000000000d84981 in issue (this=0x459d480) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:721

#7 cycle (this=0x459d480) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:2546

#8 simt_core_cluster::core_cycle (this=0x581d1e0) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:3219

#9 0x0000000000d62ed0 in gpgpu_sim::core_cycle_start (this=0x5793000) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/gpu-sim.cc:1180

#10 0x0000000000c453b2 in GPGPUSimComponentWrapper::componentCycleStart (this=0x22ca700) at /import/RaidHome/dgerzhoy/gem5-gpu_cct_lat_debug/gem5-gpu/src/gpu/gpgpu-sim/cuda_gpu.hh:122

#11 0x0000000000b517a1 in EventQueue::serviceOne (this=0x3a80500) at build/X86_VI_hammer_GPU/sim/eventq.cc:228

#12 0x0000000000b73bc8 in doSimLoop (eventq=0x3a80500) at build/X86_VI_hammer_GPU/sim/simulate.cc:218

#13 0x0000000000b7410b in simulate (num_cycles=<optimized out>) at build/X86_VI_hammer_GPU/sim/simulate.cc:131

#14 0x00000000008db7fc in _wrap_simulate__SWIG_0 (args=<optimized out>) at build/X86_VI_hammer_GPU/python/swig/event_wrap.cc:5611

#15 _wrap_simulate (self=<optimized out>, args=<optimized out>) at build/X86_VI_hammer_GPU/python/swig/event_wrap.cc:5660

#16 0x00007ffff775c188 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0

#17 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0

#18 0x00007ffff775c690 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0

#19 0x00007ffff775d3bb in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0

#20 0x00007ffff775d3bb in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0

#21 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0

#22 0x00007ffff771c9e2 in PyEval_EvalCode () from /usr/lib/libpython2.7.so.1.0

#23 0x00007ffff775c3cf in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0

#24 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0

#25 0x00007ffff775c690 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0

#26 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0

#27 0x00007ffff771c9e2 in PyEval_EvalCode () from /usr/lib/libpython2.7.so.1.0

#28 0x00007ffff771ca7c in PyRun_StringFlags () from /usr/lib/libpython2.7.so.1.0

#29 0x0000000000b5a8af in m5Main (argc=<optimized out>, argv=<optimized out>) at build/X86_VI_hammer_GPU/sim/init.cc:221

#30 0x00000000007a1033 in main (argc=7, argv=0x7fffffffe958) at build/X86_VI_hammer_GPU/sim/main.cc:58

Thanks,

Dan Gerzhoy

Jason Lowe-Power

unread,

Aug 1, 2017, 3:38:15 PM8/1/17

to Daniel Gerzhoy, gem5-gpu Developers List

Hi Dan,

As far as I know we never implemented print in GPU kernels. I know NVIDIA kind of supports it, but I don't know how they've implemented it.

If I were to try to make it work, I think I would modify GPGPU-Sim to just print it to host stdout (not implement it "correctly" to print through guest stdout). I wonder if there is just some special function in PTX that you could grab in GPGPU-Sim and implement the "print" there?

Hope this helps!

Jason

Reply all

Reply to author

Forward

0 new messages