Can someone introduce general debugging technique on AC cluster?

Xiaowei Zhan

unread,

Aug 5, 2010, 4:42:41 PM8/5/10

to vscse-many-core...@googlegroups.com

Hello,

I am trying to use "fprintf()" and "assert()", neither one works.
So I am seeking comments about how people this course debugging their codes.

Xiaowei

Xiao-Long Wu

unread,

Aug 5, 2010, 5:12:55 PM8/5/10

to vscse-many-core...@googlegroups.com

Hi Xiaowei,

Like I said previously, you can cut your kernel into stages and copy data back to host side for checking. When you locate the possible places, it's usually easy to find the bug. This is the usual way I debug my CUDA code.

Using printf() or fprintf() is probably not working for big thread blocks because you'll just see hundreds or thousands of messages. Using cuda-gdb or gdb can be helpful but to be honest I only use them to check memory access legitimacy. Mostly, I'd use a pencil and a piece of paper to draw the idea.

Hope this helps.

Xiao-Long

Brad Hittle

unread,

Aug 5, 2010, 7:18:08 PM8/5/10

to vscse-many-core...@googlegroups.com

One way I use to debug my problems is to formulate temporary buffers and store these on a thread-basis. For instance, for some image processing algorithm I'd use a texture of the same size as the input/output to write out temporary values, and compare these with some sort of gold standard like a cpu implementation.

-brad

Junjie Wu

unread,

Aug 5, 2010, 11:24:20 PM8/5/10

to VSCSE Many-core Processors 2010

If you want to use cuda-gdb, you need to compile the code with debug
flags. I know how to do this for the official SDK (google it), but I
don't know how to do it with parboil.

Once you have a debug build, you can use cuda-gdb. I bet it won't be
as useful as you expect though. I used qsub -V -I to request a node
for debugging during intro-to-cuda course, but I just read a thread
saying that we shouldn't debug on the cluster cause cuda-gdb will
exclusively occupy the gpu. I haven't tried debugging on my own
computer yet. I only have one gpu, and I don't want to kill my X
server.

- Junjie

Keith Callenberg

unread,

Aug 6, 2010, 12:03:21 PM8/6/10

to VSCSE Many-core Processors 2010

On Aug 5, 5:12 pm, Xiao-Long Wu <xiaol...@illinois.edu> wrote:
> Like I said previously, you can cut your kernel into stages and copy data back to host side for checking. When you locate the possible places, it's usually easy to find the bug. This is the usual way I debug my CUDA code.

I guess this is probably very basic to implement, but could someone
show an example of debugging like this? Maybe with the Lab 1 stencil
code? The way I am imagining this seems like it would be a hassle.

Thanks,

Keith

Kai Song

unread,

Aug 6, 2010, 12:29:15 PM8/6/10

to vscse-many-core...@googlegroups.com

The way I debug Lab1 is to allocate a block size of memory on the device, and pass that pointer to kernel. In order not effect the original code, I copied and pasted the kernel and change the kernel name, "__global__ void block2D_opt_3_debug(......, float *debug_buffer)". Then I can let each thread in any specific block save the frame into the debug_buffer. Once kernel returns, I use cudaMemcpy() to transfer the bebug_buffer to local buffer, and print it out on the host.

This might not be the best way, but it's how I get around with not having cuda-gdb on AC. I think in the long run, it's good to port the lab to your local cluster or get a interactive node later on to use cuda-gdb.

Kai

Reply all

Reply to author

Forward