Separable Compile with Emulation (CUDA 6.5 + Ubuntu 14.04 via VirtualBox)

90 views
Skip to first unread message

Gabriel Hare

unread,
Feb 4, 2016, 11:38:50 PM2/4/16
to gpuocelot
I'm trying to use Ocelot when performing a separable compilation, as described here:

I can successfully compile, including Ocelot in the build, and running does something different than the usual (no-crashing) failure that I encounter.

Unfortunately, the "something different" is a crash:

LociMain: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

Aborted (core dumped)


For comparison here is the "usual failure", running in Ubuntu 14.04 via VirtualBox:

Loci-build$ ./LociMain 

Constructor particleFilter

ERROR: cudaMalloc

ERROR: cudaMemcpyHostToDevice

AutoFilter particleFilter start

ERROR: advanceParticles

ERROR: cudaMemcpyDeviceToHost

Average movement of 1000000 particles is |(0.000000, 0.000000, 0.000000)| = 0.000000


The commands used to build and run LociMain this are as follows:

$ cd ImageAnalysis

$ nvcc ../../Loci/ImageAnalysis/v3.cpp -x=cu -dc -o v3.cpp.o -ccbin /usr/bin/c++ -m64 -Xcompiler ,\"-fvisibility=hidden\",\"-fPIC\",\"-O3\",\"-DNDEBUG\" -arch=compute_20 -code=sm_20 -code=compute_20 -DNVCC -I/usr/local/cuda-6.5/include -I/usr/local/cuda-6.5/include

$ nvcc ../../Loci/ImageAnalysis/particle.cpp -x=cu -dc -o particle.cpp.o -ccbin /usr/bin/c++ -m64 -Xcompiler ,\"-fvisibility=hidden\",\"-fPIC\",\"-O3\",\"-DNDEBUG\" -arch=compute_20 -code=sm_20 -code=compute_20 -DNVCC -I/usr/local/cuda-6.5/include -I/usr/local/cuda-6.5/include

$ nvcc ../../Loci/ImageAnalysis/particleFilter.cpp -x=cu -dc -o particleFilter.cpp.o -ccbin /usr/bin/c++ -m64 -Xcompiler ,\"-fvisibility=hidden\",\"-fPIC\",\"-O3\",\"-DNDEBUG\" -arch=compute_20 -code=sm_20 -code=compute_20 -DNVCC -I/usr/local/cuda-6.5/include -I/usr/local/cuda-6.5/include

$ nvcc -arch=compute_20 -code=sm_20 -code=compute_20 -m64 -ccbin "/usr/bin/c++" -dlink v3.cpp.o particle.cpp.o particleFilter.cpp.o -o intermediate_link.o -Xcompiler -fPIC

$ ar qc libImageAnalysis.a  particleFilter.cpp.o particle.cpp.o v3.cpp.o intermediate_link.o

$ ranlib libImageAnalysis.a

$ cd ../

$ c++ -I/usr/local/cuda-6.5/include -I/home/ghare/Documents/Loci/ImageAnalysis -isystem /home/ghare/Documents/autowiring/contrib/autoboost -isystem /home/ghare/Documents/autowiring  -std=c++11 -fvisibility=hidden  -fPIC -O3 -DNDEBUG   -o Main.cpp.o -c ../Loci/Main.cpp

$ c++ -std=c++11 -fvisibility=hidden  -fPIC -O3 -DNDEBUG   Main.cpp.o  -o LociMain /home/ghare/Documents/autowiring-build/lib/libAutowiring.a ImageAnalysis/libImageAnalysis.a /usr/local/cuda-6.5/lib64/libcudart_static.a -lpthread /usr/l/checkout/gpuocelot/ocelot/build_local/lib/libocelot.so /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so

$ ./LociMain


As you can see above, the only place where Ocelot is included is during the final linking of the executable.

Incidentally, the "autowiring" project mentioned above is open source, but requires C++11:

https://github.com/leapmotion/autowiring


Finally, the configure.ocelot file is identical to the one defined in several online tutorials, such as this one:

https://www.udacity.com/wiki/cs344/llvm-ocelot-dev


Suggestions, questions, and guidance on debugging would all be appreciated!


-Gabriel



P.S. Although the "Loci" project is private, if it would help I can share the code compiled above - it is simply a modification the separable compilation tutorial code to call it from an "AutoFilter" method defined in the autowiring project.

Reply all
Reply to author
Forward
0 new messages