LociMain: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
Aborted (core dumped)
Loci-build$ ./LociMain
Constructor particleFilter
ERROR: cudaMalloc
ERROR: cudaMemcpyHostToDevice
AutoFilter particleFilter start
ERROR: advanceParticles
ERROR: cudaMemcpyDeviceToHost
Average movement of 1000000 particles is |(0.000000, 0.000000, 0.000000)| = 0.000000
The commands used to build and run LociMain this are as follows:
$ cd ImageAnalysis
$ nvcc ../../Loci/ImageAnalysis/v3.cpp -x=cu -dc -o v3.cpp.o -ccbin /usr/bin/c++ -m64 -Xcompiler ,\"-fvisibility=hidden\",\"-fPIC\",\"-O3\",\"-DNDEBUG\" -arch=compute_20 -code=sm_20 -code=compute_20 -DNVCC -I/usr/local/cuda-6.5/include -I/usr/local/cuda-6.5/include
$ nvcc ../../Loci/ImageAnalysis/particle.cpp -x=cu -dc -o particle.cpp.o -ccbin /usr/bin/c++ -m64 -Xcompiler ,\"-fvisibility=hidden\",\"-fPIC\",\"-O3\",\"-DNDEBUG\" -arch=compute_20 -code=sm_20 -code=compute_20 -DNVCC -I/usr/local/cuda-6.5/include -I/usr/local/cuda-6.5/include
$ nvcc ../../Loci/ImageAnalysis/particleFilter.cpp -x=cu -dc -o particleFilter.cpp.o -ccbin /usr/bin/c++ -m64 -Xcompiler ,\"-fvisibility=hidden\",\"-fPIC\",\"-O3\",\"-DNDEBUG\" -arch=compute_20 -code=sm_20 -code=compute_20 -DNVCC -I/usr/local/cuda-6.5/include -I/usr/local/cuda-6.5/include
$ nvcc -arch=compute_20 -code=sm_20 -code=compute_20 -m64 -ccbin "/usr/bin/c++" -dlink v3.cpp.o particle.cpp.o particleFilter.cpp.o -o intermediate_link.o -Xcompiler -fPIC
$ ar qc libImageAnalysis.a particleFilter.cpp.o particle.cpp.o v3.cpp.o intermediate_link.o
$ ranlib libImageAnalysis.a
$ cd ../
$ c++ -I/usr/local/cuda-6.5/include -I/home/ghare/Documents/Loci/ImageAnalysis -isystem /home/ghare/Documents/autowiring/contrib/autoboost -isystem /home/ghare/Documents/autowiring -std=c++11 -fvisibility=hidden -fPIC -O3 -DNDEBUG -o Main.cpp.o -c ../Loci/Main.cpp
$ c++ -std=c++11 -fvisibility=hidden -fPIC -O3 -DNDEBUG Main.cpp.o -o LociMain /home/ghare/Documents/autowiring-build/lib/libAutowiring.a ImageAnalysis/libImageAnalysis.a /usr/local/cuda-6.5/lib64/libcudart_static.a -lpthread /usr/l/checkout/gpuocelot/ocelot/build_local/lib/libocelot.so /usr/lib/x86_64-linux-gnu/librt.so /usr/lib/x86_64-linux-gnu/libdl.so
$ ./LociMain
As you can see above, the only place where Ocelot is included is during the final linking of the executable.
Incidentally, the "autowiring" project mentioned above is open source, but requires C++11:
https://github.com/leapmotion/autowiring
Finally, the configure.ocelot file is identical to the one defined in several online tutorials, such as this one:
https://www.udacity.com/wiki/cs344/llvm-ocelot-dev
Suggestions, questions, and guidance on debugging would all be appreciated!
-Gabriel
P.S. Although the "Loci" project is private, if it would help I can share the code compiled above - it is simply a modification the separable compilation tutorial code to call it from an "AutoFilter" method defined in the autowiring project.