--
You received this message because you are subscribed to the Google Groups "gpuocelot" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpuocelot+...@googlegroups.com.
To post to this group, send email to gpuo...@googlegroups.com.
Visit this group at https://groups.google.com/group/gpuocelot.
For more options, visit https://groups.google.com/d/optout.
Hi Leiming,
Before I dive into your dock image, please check if the following applies to you:
In the Makefile, nvcc takes options "-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20". The "code=sm_xx" part will generate device binary code for the real architecture sm_xx GPUs and discard the PTX assembly. Therefore, ocelot would not be able to extract the PTX from the output executable.
If it is possible, you can change it to "-gencode arch=compute_10,code=compute_10 -gencode arch=compute_20,code=compute_20", then the executable will keep the PTX assembly for the virtual compute_xx architecture. This probably will fix the problem. Let me know if it works or not.
Otherwise please send me your executable and compilation script such as Makefile.
Regards,
Jin
root@d72d012ace48:/home/test_gpuocelot/mcx/src# make fermi BACKEND=ocelotnvcc -c -g -Xcompiler -fopenmp -m64 -DUSE_ATOMIC -use_fast_math -DSAVE_DETECTORS -DUSE_CACHEBOX -use_fast_math -gencode arch=compute_20,code=sm_20 -DMCX_TARGET_NAME='"Fermi MCX"' -o mcx_core.o mcx_core.cucc -I/usr/local/cuda/include -g -Wall -O3 -std=c99 -m64 -fopenmp -c -o mcx_utils.o mcx_utils.ccc -I/usr/local/cuda/include -g -Wall -O3 -std=c99 -m64 -fopenmp -c -o mcx_shapes.o mcx_shapes.ccc -I/usr/local/cuda/include -g -Wall -O3 -std=c99 -m64 -fopenmp -c -o tictoc.o tictoc.ccc -I/usr/local/cuda/include -g -Wall -O3 -std=c99 -m64 -fopenmp -c -o mcextreme.o mcextreme.ccc -I/usr/local/cuda/include -g -Wall -O3 -std=c99 -m64 -fopenmp -c -o cjson/cJSON.o cjson/cJSON.ccc mcx_core.o mcx_utils.o mcx_shapes.o tictoc.o mcextreme.o cjson/cJSON.o -o ../bin/mcx -L/usr/local/lib `OcelotConfig -l` -ltinfo -fopenmp
GPU=1 (Ocelot Multicore CPU Backend (LLVM-JIT)) threadph=0 extra=1 np=1 nthread=8192 maxgate=1 repetition=1
Hi Leiming,
It looks like the syntax of an instruction is not recognized by PTXToLLVMTranslator (predicate value used as an immediate).
GPU=6 (Ocelot Multicore CPU Backend (LLVM-JIT)) threadph=1220 extra=5760 np=10000000 nthread=8192 maxgate=1 repetition=1
initializing streams ... init complete : 253 ms
requesting 5120 bytes of shared memory
lauching MCX simulation for time window [0.00e+00ns 5.00e+00ns] ...
simulation run# 1 ...
(8.467481) PTXToLLVMTranslator.cpp:1023: Assertion message: PTXOperand datatype pred not supported for immediate operand.
mcx: ocelot/translator/implementation/PTXToLLVMTranslator.cpp:1023: ir::LLVMInstruction::Operand translator::PTXToLLVMTranslator::_translate(const ir::PTXOperand&): Assertion `false' failed.
Aborted (core dumped)
You may want to pinpoint the instruction that causes the error and modify the LLVM backend.
git clone https://github.com/fangq/mcx.git
cd mcx
git checkout ocelot
cd src
make fermi BACKEND=ocelot AR=g++
mcx -L
devices: [ nvidia, llvm, emulated, amd ],
cd mcx/example/quicktest
../../bin/mcx -A -g 10 -n 1e7 -f qtest.inp -s qtest -r 1 -a 0 -b 0 -G #
For emulator backend, try to debug using the ocelot logging feature and see if you can locate the reason why your program hangs. https://github.com/gtcasl/gpuocelot/wiki/Debugging
GPU=5 (Ocelot PTX Emulator) threadph=0 extra=100 np=100 nthread=1024 maxgate=1 repetition=1
initializing streams ... init complete : 254 ms
requesting 5120 bytes of shared memory
lauching MCX simulation for time window [0.00e+00ns 5.00e+00ns] ...
simulation run# 1 ...
terminate called after throwing an instance of 'executive::RuntimeException'
what(): barrier deadlock:
context at: [PC: 320] mcx_core.cu:493:1 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
context at: [PC: 429] mcx_core.cu:551:1 00000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
context at: [PC: 318] mcx_core.cu:813:2 11111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
Aborted (core dumped)