SASSI GPU Instrumentation Tool

156 views
Skip to first unread message

Dan Johnson

unread,
Nov 2, 2015, 12:03:50 PM11/2/15
to gpuo...@googlegroups.com
A new tool from NVIDIA Research may be of interest to the same audience as Ocelot:

NVIDIA Research has released SASSI, a compiler-based instrumentation tool that enables users to collect user-specified, fine-grained statistics (using custom handler functions written in CUDA) for GPU kernels -- running at near hardware speeds.

At Micro-48 this year, we will be offering an interactive tutorial for using this new tool to build custom instrumentation libraries for GPU programs.

https://github.com/NVlabs/SASSI/wiki/MICRO48-Tutorial

More information:



Regards,

Daniel Johnson and Mark Stephenson
NVIDIA Research

Fanny Nina Paravecino

unread,
Dec 2, 2015, 9:36:07 AM12/2/15
to gpuocelot
Hi Dan,

I have been working on SASSI since the last few weeks. And I have been facing an issue with my particular application. I'm trying to compile a .cu file, and then merge with .cpp file. For that sense, I compile using -dc flag first, then -dlink to create the respective objects, and finally compile with the .cpp file to create the executable. Executable is able to be created but when I run I got an error for memory copying. I'm injecting memory analysis using default SASSI handlers.

You can find my implementation in this github: https://github.com/fninaparavecino/mcx.git and the compilation mechanism was:

usr/local/sassi7/bin/nvcc -c -gencode arch=compute_35,code=sm_35 -Xptxas --sassi-inst-before="memory" -Xptxas --sassi-before-args="mem-info" -g -O3 -dc -o mcx_core.o  mcx_core.cu
        /usr/local/sassi7/bin/nvcc -I~/Desktop/SASSI/example/inc -c -gencode arch=compute_35,code=sm_35 -g -O3 -dlink -o mcx_core_dlink.o  mcx_core.o
        /usr/local/sassi7/bin/nvcc -I/usr/local/sassi7/include -g -O3  -c -o mcx_utils.o  mcx_utils.c
        /usr/local/sassi7/bin/nvcc -I/usr/local/sassi7/include -g -O3  -c -o mcx_shapes.o  mcx_shapes.c
        /usr/local/sassi7/bin/nvcc -I/usr/local/sassi7/include -g -O3  -c -o tictoc.o  tictoc.c
        /usr/local/sassi7/bin/nvcc -I/usr/local/sassi7/include -g -O3  -c -o mcextreme.o  mcextreme.c
        /usr/local/sassi7/bin/nvcc -I/usr/local/sassi7/include -g -O3  -c -o cjson/cJSON.o  cjson/cJSON.c
        /usr/local/sassi7/bin/nvcc mcx_core.o mcx_core_dlink.o mcx_utils.o mcx_shapes.o tictoc.o mcextreme.o cjson/cJSON.o -o ../bin/mcx -L/usr/local/sassi7/lib64 -lcudart -lm -lstdc++ -m64 -L~/Desktop/SASSI/instlibs/lib -L/usr/local/sassi7/extras/CUPTI/lib64 -lcupti -lcudadevrt

When it runs
segfault (signal 11) is found.

Using valgrind I got the following analysis:

...
lauching MCX simulation for time window [0.00e+00ns 5.00e+00ns] ...
simulation run# 1 ...     ==17665== Invalid write of size 8
==17665==    at 0x413A98: ???
==17665==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==17665==
==17665==
==17665== Process terminating with default action of signal 11 (SIGSEGV)
==17665==  Access not within mapped region at address 0x0
==17665==    at 0x413A98: ???
==17665==  If you believe this happened as a result of a stack
==17665==  overflow in your program's main thread (unlikely but
==17665==  possible), you can try to increase the size of the
==17665==  main thread stack using the --main-stacksize= flag.
==17665==  The main thread stack size used in this run was 8388608.
==17665==
==17665== HEAP SUMMARY:
==17665==     in use at exit: 77,616,868 bytes in 273,680 blocks
==17665==   total heap usage: 291,353 allocs, 17,673 frees, 80,843,756 bytes allocated

Error happens in this particular line:
     cudaMemcpyToSymbol(gproperty, cfg->prop,  cfg->medianum*sizeof(Medium), 0, cudaMemcpyHostToDevice);
          mcx_cu_assess(cudaGetLastError(),__FILE__,__LINE__); // -> triggers a cuda error
     cudaMemcpyToSymbol(gdetpos, cfg->detpos,  cfg->detnum*sizeof(float4), 0, cudaMemcpyHostToDevice);

cudaGetLastError() reported the following cuda error:

     MCX ERROR(-13):invalid device symbol in unit mcx_core.cu:1172

Would you have any idea why SASSI is not able to actually inject properly the memory analysis for this kernel? It seems that having SASSI memory handler is making the executable run wrongly.

Any advice would be highly appreciate.

Fanny.

Reply all
Reply to author
Forward
0 new messages