Printing from Kernel

97 views
Skip to first unread message

Daniel Gerzhoy

unread,
Aug 1, 2017, 3:13:57 PM8/1/17
to gem5-gpu Developers List
Hi I'm attempting to debug an issue with a benchmark (more complex than the one I am about to include here), and I have included prints in my kernel to determine what is going on. However whatever the print is, the simulator segfaults. Before I go digging into the code to see whats wrong, I want to know if there is a proper way to print from a kernel in gem5-gpu. Thanks.

The program hello.cu:

#include <stdio.h>
#include <cutil.h>


__global__ void Hello()
{
printf("Hello from kernel!\n");
}

int main(int argc, char *argv[]) {

printf("Hello from main\n");

Hello<<<1,1>>>();

return 0;

}

Result of makefile (note, -u ceil etc from more complicated benchmark with math functions):

nvcc  -O3 -c -arch sm_20 --keep --compiler-options -fno-strict-aliasing \
                -DGEM5_FUSION -I. -I/homes/dgerzhoy/cuda/include/ -I/homes/dgerzhoy/NVIDIA_GPU_Computing_SDK/C/common/inc/ \
                 -I../../../gem5/util/m5 -I../../libcuda -L/homes/dgerzhoy/NVIDIA_GPU_Computing_SDK/C/lib -lcutil -DUNIX hello.cu -o hello
python ../../common/sizeHack.py -f hello.cu.cpp -t sm_20
the place is 561864
change will be:
ub.c"
static void __sti____cudaRegisterAll_13_hello_cpp1_ii_acb85650(void) { __cudaFatCubinHandle = __cudaRegisterFatBinary2((void*)(&__fatDeviceText), sizeof(__deviceText_$compute_20$)); atexit(__cudaUnregisterBinaryUtil); __cudaRegisterFunction(__cudaFatCubinHandle, (const char*)((void ( *)(void))
g++-4.4 -DOUTPUT -DDEBUG  -O3 -g -c hello.cu.cpp -o hello.cu_o
g++-4.4 -DOUTPUT -DDEBUG  -O3 -DGEM5_FUSION hello.cu_o -L../../libcuda -lcuda \
                -L/homes/dgerzhoy/NVIDIA_GPU_Computing_SDK/C/lib \
                -lz -static -static-libgcc -o gem5_fusion_hello -L/usr/lib64 -lcutil_x86_64 -lm5op_x86 -lm -lc  -u sin -u ceil -u log -u sqrt -u floor -u exp2 -u exp -u log2 -u log1p -u remquo
make: warning:  Clock skew detected.  Your build may be incomplete.

run_command (from benchmark directory):
../../../gem5/build/X86_VI_hammer_GPU/gem5.opt ../../../gem5-gpu/configs/se_fusion.py --access-host-pagetable -n 4 -c ./gem5_fusion_hello

GDB Backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000c721b6 in st_impl (pI=0xf475c80, thread=0x5596c40) at build/X86_VI_hammer_GPU/gpgpu-sim/cuda-sim/instructions.cc:3551
3551              mem->write(addr,size/8,&data.s64,thread,pI);
(gdb) bt
#0  0x0000000000c721b6 in st_impl (pI=0xf475c80, thread=0x5596c40) at build/X86_VI_hammer_GPU/gpgpu-sim/cuda-sim/instructions.cc:3551
#1  0x0000000000c63585 in ptx_thread_info::ptx_exec_inst (this=0x5596c40, inst=..., lane_id=0) at build/X86_VI_hammer_GPU/gpgpu-sim/cuda-sim/opcodes.def:106
#2  0x0000000000c483ca in core_t::execute_warp_inst_t (this=0x459d480, inst=..., warpId=0) at build/X86_VI_hammer_GPU/gpgpu-sim/abstract_hardware_model.cc:772
#3  0x0000000000d83c7a in func_exec_inst (inst=..., this=0x459d480) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:688
#4  shader_core_ctx::issue_warp (this=0x459d480, pipe_reg_set=..., next_inst=next_inst@entry=0xf475c80, active_mask=..., warp_id=warp_id@entry=0) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:707
#5  0x0000000000d844b8 in scheduler_unit::cycle (this=0x6457630) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:865
#6  0x0000000000d84981 in issue (this=0x459d480) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:721
#7  cycle (this=0x459d480) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:2546
#8  simt_core_cluster::core_cycle (this=0x581d1e0) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/shader.cc:3219
#9  0x0000000000d62ed0 in gpgpu_sim::core_cycle_start (this=0x5793000) at build/X86_VI_hammer_GPU/gpgpu-sim/gpgpu-sim/gpu-sim.cc:1180
#10 0x0000000000c453b2 in GPGPUSimComponentWrapper::componentCycleStart (this=0x22ca700) at /import/RaidHome/dgerzhoy/gem5-gpu_cct_lat_debug/gem5-gpu/src/gpu/gpgpu-sim/cuda_gpu.hh:122
#11 0x0000000000b517a1 in EventQueue::serviceOne (this=0x3a80500) at build/X86_VI_hammer_GPU/sim/eventq.cc:228
#12 0x0000000000b73bc8 in doSimLoop (eventq=0x3a80500) at build/X86_VI_hammer_GPU/sim/simulate.cc:218
#13 0x0000000000b7410b in simulate (num_cycles=<optimized out>) at build/X86_VI_hammer_GPU/sim/simulate.cc:131
#14 0x00000000008db7fc in _wrap_simulate__SWIG_0 (args=<optimized out>) at build/X86_VI_hammer_GPU/python/swig/event_wrap.cc:5611
#15 _wrap_simulate (self=<optimized out>, args=<optimized out>) at build/X86_VI_hammer_GPU/python/swig/event_wrap.cc:5660
#16 0x00007ffff775c188 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#17 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0
#18 0x00007ffff775c690 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#19 0x00007ffff775d3bb in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#20 0x00007ffff775d3bb in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#21 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0
#22 0x00007ffff771c9e2 in PyEval_EvalCode () from /usr/lib/libpython2.7.so.1.0
#23 0x00007ffff775c3cf in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#24 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0
#25 0x00007ffff775c690 in PyEval_EvalFrameEx () from /usr/lib/libpython2.7.so.1.0
#26 0x00007ffff771c6b5 in PyEval_EvalCodeEx () from /usr/lib/libpython2.7.so.1.0
#27 0x00007ffff771c9e2 in PyEval_EvalCode () from /usr/lib/libpython2.7.so.1.0
#28 0x00007ffff771ca7c in PyRun_StringFlags () from /usr/lib/libpython2.7.so.1.0
#29 0x0000000000b5a8af in m5Main (argc=<optimized out>, argv=<optimized out>) at build/X86_VI_hammer_GPU/sim/init.cc:221
#30 0x00000000007a1033 in main (argc=7, argv=0x7fffffffe958) at build/X86_VI_hammer_GPU/sim/main.cc:58


Thanks,

Dan Gerzhoy

Jason Lowe-Power

unread,
Aug 1, 2017, 3:38:15 PM8/1/17
to Daniel Gerzhoy, gem5-gpu Developers List
Hi Dan,

As far as I know we never implemented print in GPU kernels. I know NVIDIA kind of supports it, but I don't know how they've implemented it.

If I were to try to make it work, I think I would modify GPGPU-Sim to just print it to host stdout (not implement it "correctly" to print through guest stdout). I wonder if there is just some special function in PTX that you could grab in GPGPU-Sim and implement the "print" there?

Hope this helps!

Jason
Reply all
Reply to author
Forward
0 new messages