Please help to debug wrong behavior on Kepler

27 views
Skip to first unread message

Dmitry N. Mikushin

unread,
Aug 3, 2012, 7:44:49 PM8/3/12
to asf...@googlegroups.com
Dear colleagues,

Unfortunately I was too fast to get 680M - there is currently no
debuggable developer driver for it, probably untill September. But I
need to get our dynamic code loader (joint work with Yunqing) running
on Kepler. At first, kernel successfully loads and runs, but when I'm
trying to run it again with some instruction modified, it throws error
700. This error status may mean a wide range of issues, and it's hard
to guess what the problem is. Debugger would help, but until September
my 680M is useless for that.

Is it correct that normal 680 is currently already debuggable? If yes,
and you have one on short hand, I'd kindly ask to make the following
test and dump the results here:

$ svn co http://asfermi.googlecode.com/svn/branches/libasfermi libasfermi
$ cd libasfermi/tests/dyloader/
$ make test1
$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.:../.. ./test1 100

Test should fail. Please relaunch it under cuda-gdb (only works with
developer driver):

$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.:../.. cuda-gdb ./test1
$ cuda set break_on_launch application
$ r 100
$ c
$ disass

So, break on every user kernel launch, run, continue after first break
and disass the point of the crash. It would be important to know:
1) How the top part of kernel looks like
2) At what address program crashed

Thanks,
- Dima.

Hou Yunqing

unread,
Aug 4, 2012, 12:48:25 AM8/4/12
to asf...@googlegroups.com
Hi D.,

Are you using CUDA toolkit 5 and associated driver? I'm under the impression that Kepler already can be debugged with those.

As for the 700 failure, have you confirmed that instruction address is simply LEPC+0x100000000? Also Kepler may be using a stricter memory access policy now so the old method may not work...

Yunqing

Sent from my iPod

Dmitry N. Mikushin

unread,
Aug 4, 2012, 9:03:27 AM8/4/12
to asf...@googlegroups.com
Hi Yunqing,

> Are you using CUDA toolkit 5 and associated driver? I'm under the impression that Kepler already can be debugged with those.

Depends on model. 680 should be debuggable, while 680M is not, because
its support was added later than the last release of developer driver.

> As for the 700 failure, have you confirmed that instruction address is simply LEPC+0x100000000? Also Kepler may be using a stricter memory access policy now so the old method may not work...

This is what could be confirmed with debugger I'm trying to find access to.

- D.

2012/8/4 Hou Yunqing <hyq.n...@gmail.com>:

Dmitry N. Mikushin

unread,
Aug 5, 2012, 5:46:52 PM8/5/12
to asf...@googlegroups.com
Hi again,

I fixed the problem, as of r760 dyloader should be compatible with
Kepler. Changes:

1) forgot to change 0x2 to 0x3 bank index for constant data used in
launch stubs emulating different regcounts
2) added an offset of free space between the initial code of loader
and the starting point of the code being dynamically written

Probably this is still not enough, currently we only tested very simple cases.

Thanks,
- D.

2012/8/4 Dmitry N. Mikushin <maem...@gmail.com>:

Hou Yunqing

unread,
Aug 6, 2012, 9:19:27 AM8/6/12
to asf...@googlegroups.com
Hi D.,

Good to see that! Actually I was thinking of helping when I saw the
first email, but my laptop's been dead for some time and I wasn't able
to get a working computer. Anyway, at least we now know that the old
approach should still work.

Cheers,
Yunqing
Reply all
Reply to author
Forward
0 new messages