Minor issue

31 views
Skip to first unread message

Steve Worley

unread,
Oct 30, 2009, 6:58:01 AM10/30/09
to gpuocelot
Note even a bug, just a "gotcha".

If you compile a kernel with nvcc --device-emulation and link to the
Ocelot libraries, the resulting program will segfault when run.
Obviously you shouldn't do that (it's nonsensical) but it can happen
when you have makefile issues that accidentally combine the options.
It took me a while to diagnose.

I'll use this thread for other issues I find as I keep testing my
older CUDA programs.
I'm really really pleased with Ocelot as an emulator with its memory
checking, it's just fantastic for finding bugs you never knew you
had.





Gregory Frederick Diamos

unread,
Oct 30, 2009, 2:33:40 PM10/30/09
to gpuo...@googlegroups.com
Steve Worley wrote:
> Note even a bug, just a "gotcha".
>
> If you compile a kernel with nvcc --device-emulation and link to the
> Ocelot libraries, the resulting program will segfault when run.
> Obviously you shouldn't do that (it's nonsensical) but it can happen
> when you have makefile issues that accidentally combine the options.
> It took me a while to diagnose.

I am actually surprised that the program will link correctly. I believe
that even nvcc with silently link --device-emulation programs against a
different library than libcudart.so.

I may take a look at what happens when an device-emulated program starts
up and maybe be able to include a more useful error message if it easy
to tell that it is in fact an emulated program.

>
> I'll use this thread for other issues I find as I keep testing my
> older CUDA programs.
> I'm really really pleased with Ocelot as an emulator with its memory
> checking, it's just fantastic for finding bugs you never knew you
> had.

Glad to hear that it is useful for people doing CUDA development. :)

>
>
>
>
>
>
> >

Steve Worley

unread,
Nov 8, 2009, 5:55:51 PM11/8/09
to gpuocelot
Ocelot is always a fun challenge to compile.. it's a good workout of
your system configuration!

I'm compiling on my main machine now building Ocelot on this box for
the first time. With the latest SVN (224) I get an interesting error:

In file included from /usr/include/c++/4.3/bits/stl_algo.h:65,
from /usr/include/c++/4.3/algorithm:67,
from /usr/local/include/boost/random/
mersenne_twister.hpp:20,
from hydrazine/interface/Test.h:14,
from hydrazine/interface/Test.cpp:11:
/usr/include/c++/4.3/cstdlib:132: error: ‘::realloc’ has not been
declared

The culprit is #include <cstdlib>
it may have to do with autoconfig not finding a glibc compatible
realloc.. some googling finds this is a problem in many projects.
I'll dig deeper and try to figure it out, but I'm posting this here
just to log it.. even if it's not an Ocelot specific issue, it may
come up for others as well.

Interestingly this did not come up on my build last week, but that was
a different PC using 32 bit Ubuntu, this is my main 64 bit machine and
it's possible I have different libraries, etc. That used SVN 221
Ocelot. [It's awesome how active Ocelot development is!]





On Oct 30, 10:33 am, Gregory Frederick Diamos

Steve Worley

unread,
Nov 8, 2009, 7:21:09 PM11/8/09
to gpuocelot
And following up, I solved the problem.. but am unsure how. :-)
I did (several times) do make clean and the full magic invocation of

libtoolize; aclocal; autoconf; automake; ./configure; make

I kept getting the ::realloc error.. until I didn't. Heh..
As I said, a fun challenge to compile. I probably did a make clean in
some
other directory that caused a stale file to get wiped.
Anyway, Ocelot is now compiled on this machine as well. :-)

Steve Worley

unread,
Nov 9, 2009, 1:48:20 AM11/9/09
to gpuocelot

Next small report:

On one of my CUDA apps, after successful compile and link, I get an
Ocelot crash. GDB shows the failure:


Starting program: /home/spworley/cap/gpusim
[Thread debugging using libthread_db enabled]
[New Thread 0x7f23b5fed790 (LWP 1275)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f23b5fed790 (LWP 1275)]
cuda::CudaRuntime::registerFatBinary (this=0x11388a8, binary=@0x0)
at ocelot/cuda/implementation/CudaRuntime.cpp:927
927 StringMap::iterator name = _binaryNames.find( binary.ident );


Now the real cause of this is not Ocelot, it's MY mistake.. I had left
--device-emulation on in the NVCC options.
Obviously this isn't correct! Perhaps Ocelot could detect when the
user screws up like this and give a graceful exit and error message
instead. I've done this before, too...

I often flip between GPU compile, device emulation, and Ocelot builds,
so it's inevitable for me to screw up again.



BTW: Ocelot found ANOTHER memory bug in my code. Thanks again! It was
such a subtle error I didn't believe Ocelot even after studying it for
a while... but that's why the mem checker tool is so darned useful.

Steve Worley

unread,
Nov 22, 2009, 9:31:26 PM11/22/09
to gpuocelot

During emulation, Ocelot does a great job identifying memory issues as
well as syncthreads() deadlocks.

Almost always it's been easy to track these issues down when Ocelot
identifies them, but I've unfortunately currently working with a 5000
line (!!!) Frankenstein kernel (using the ugly hack of block-ID to
switch among various tasks). This means the kernel is huge.

Currently I have a synchronization bug in my code. I get an Ocelot
error like:

==Ocelot== Emulator failed to run kernel
"_Z4walkILi4EEvP11GPUdatabaseP14NetAccumulatorjPKi" with exception:
==Ocelot== [PC 1380] [thread 0] [cta 0] bar.sync 0 - barrier deadlock
Cuda error: Fail in walk kernel launch in file './src/kernel.cu' in
line 2387 : Kernel launch failure..


The line number in Ocelot's output is just the source line of the
kernel launch, not of the syncthreads() problem.

Using the nvcc --device-emulation also identifies the syncthreads()
issue but not where.

What's the best strategy to figure out WHICH __syncthreads() in my
kernel is causing the issue?

Steve Worley

unread,
Nov 23, 2009, 12:42:05 AM11/23/09
to gpuocelot
Still experimenting with this, I compiled my tool using the nvcc -G
switch. Normally this adds extra per-line debug info to the compiled
code for cuda-GDB and I was just curious if Ocelot could use any of
those annotations. If you try to run the generated code with Ocelot
you get an interesting thrown exception from Ocelot:

terminate called after throwing an instance of
'parser::PTXParser::Exception'
what(): ./src/kernel.cu (79159, 6): syntax error, unexpected
TOKEN_PREDICATE_IDENTIFIER
Aborted


This likely isn't an Ocelot bug, but just an indirect message saying
"don't use that nvcc -G option with Ocelot".


BY the way, I'm still trying to figure out how to identify my
troublesome syncthreads().

-In regular compiles on the GPU I just get a CUDA error (failed
kernel) when I cudaThreadSynchronize().
-In Ocelot, I get the "barrier deadlock" message but no location info.
-With --device_emulation, I get a "incorrect use of __syncthreads()"
message with no location info.
-With cudaGDB, I am not sure. It's been running for 6 hours and hasn't
reached the "trouble spot" yet. The previous 3 methods all take less
than 20 seconds to fail.

Steve Worley

unread,
Nov 23, 2009, 2:45:43 AM11/23/09
to gpuocelot
I did solve my syncthreads() problem (it was a subtle one!).
I used the crude method of just using a billion printf()s in --device-
emulation mode.. by printing the threadID and "about to enter sync"
before every syncthreads, it was easy to take the output and identify
where the stall happened. Inelegant, but it worked.

Gregory Frederick Diamos

unread,
Nov 23, 2009, 3:34:11 AM11/23/09
to gpuo...@googlegroups.com
It should be fairly straightforward to get ocelot to print out the line
that generated the error. I'll take a look at it...
> --
>
> You received this message because you are subscribed to the Google Groups "gpuocelot" group.
> To post to this group, send email to gpuo...@googlegroups.com.
> To unsubscribe from this group, send email to gpuocelot+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/gpuocelot?hl=.
>
>

Gregory Frederick Diamos

unread,
Nov 23, 2009, 4:20:24 AM11/23/09
to gpuo...@googlegroups.com
Sorry that it wasn't able to help solve your problem, but I added
support for printing the line number of most exceptions generated by the
emulator.

It should be in r251.

Greg
Reply all
Reply to author
Forward
0 new messages