ocelot/ocelot/ir/implementation/Module.cpp:238: void ir::Module::loadNow(): Assertion `

Leiming Yu

ungelesen,

25.04.2016, 08:19:0925.04.16

an gpuocelot

I am running mcx using gpuocelot inside a docker image (https://hub.docker.com/r/leimingy/ubuntu1204_gpuoce) and got the following error.

GPU=1 (Ocelot PTX Emulator) threadph=9765 extra=640 np=10000000 nthread=1024 maxgate=1 repetition=1

initializing streams ... mcx: ocelot/ocelot/ir/implementation/Module.cpp:238: void ir::Module::loadNow(): Assertion `_ptxPointer != 0' failed.

Aborted (core dumped)

Not sure how to fix it. Any suggestions?

Jin Wang

ungelesen,

25.04.2016, 09:46:0925.04.16

an gpuocelot

Hi Leiming,

Which CUDA version were you using? Ocelot is supposed to work with CUDA version up to 5.0. PTX format in >5.0 has changed and might not be recognized by Ocelot.

Otherwise please post your gdb traceback results so we may better help.

Jin

--
You received this message because you are subscribed to the Google Groups "gpuocelot" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gpuocelot+...@googlegroups.com.
To post to this group, send email to gpuo...@googlegroups.com.
Visit this group at https://groups.google.com/group/gpuocelot.
For more options, visit https://groups.google.com/d/optout.

Leiming Yu

ungelesen,

28.04.2016, 14:08:3028.04.16

an gpuocelot

I am using cuda 4.2.

The docker image is here.

https://hub.docker.com/r/leimingy/ubuntu1204_gpuocelot/

Jin Wang

ungelesen,

28.04.2016, 14:25:1928.04.16

an gpuocelot

Hi Leiming,

Before I dive into your dock image, please check if the following applies to you:

In the Makefile, nvcc takes options "-gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_20". The "code=sm_xx" part will generate device binary code for the real architecture sm_xx GPUs and discard the PTX assembly. Therefore, ocelot would not be able to extract the PTX from the output executable.

If it is possible, you can change it to "-gencode arch=compute_10,code=compute_10 -gencode arch=compute_20,code=compute_20", then the executable will keep the PTX assembly for the virtual compute_xx architecture. This probably will fix the problem. Let me know if it works or not.

Otherwise please send me your executable and compilation script such as Makefile.

Regards,

Jin

Leiming Yu

ungelesen,

02.05.2016, 14:05:2802.05.16

an gpuocelot

Jin,

Thanks for helping me out.

I compiled the mcx program by specifying sm_20. Below are the compilation msgs.

root@d72d012ace48:/home/test_gpuocelot/mcx/src# make fermi BACKEND=ocelot
nvcc -c -g  -Xcompiler -fopenmp -m64 -DUSE_ATOMIC -use_fast_math -DSAVE_DETECTORS -DUSE_CACHEBOX -use_fast_math -gencode arch=compute_20,code=sm_20 -DMCX_TARGET_NAME='"Fermi MCX"' -o mcx_core.o  mcx_core.cu
cc -I/usr/local/cuda/include -g -Wall -O3 -std=c99  -m64 -fopenmp -c -o mcx_utils.o  mcx_utils.c
cc -I/usr/local/cuda/include -g -Wall -O3 -std=c99  -m64 -fopenmp -c -o mcx_shapes.o  mcx_shapes.c
cc -I/usr/local/cuda/include -g -Wall -O3 -std=c99  -m64 -fopenmp -c -o tictoc.o  tictoc.c
cc -I/usr/local/cuda/include -g -Wall -O3 -std=c99  -m64 -fopenmp -c -o mcextreme.o  mcextreme.c
cc -I/usr/local/cuda/include -g -Wall -O3 -std=c99  -m64 -fopenmp -c -o cjson/cJSON.o  cjson/cJSON.c
cc mcx_core.o mcx_utils.o mcx_shapes.o tictoc.o mcextreme.o cjson/cJSON.o -o ../bin/mcx -L/usr/local/lib `OcelotConfig -l` -ltinfo  -fopenmp

Here are some steps to download and run the mcx test.

[1] download the docker image

$docker pull leimingy/ubuntu1204_gpuocelot:init

[2] run

$docker run -it leimingy/ubuntu1204_gpuocelot:init

[3] find the mcx and compile the project

$source /etc/profile
$cd /home/test_gpuocelot/mcx/src

$make fermi BACKEND=ocelot

$cd ../example/quicktest/

$./run_qtest.sh (I hit the error at this stage)

Let me know if you have any questions.

Best,

Leiming

Jin Wang

ungelesen,

02.05.2016, 14:14:3202.05.16

an gpuocelot

Hi Leiming,

As I explained in my previous email, the solution to your problem would be to replace "-gencode arch=compute_20,code=sm_20" with "-gencode arch=compute_20,code=compute_20", or simply "-arch=sm_20" in your Makefile that invokes the nvcc compiler.

Jin

Leiming Yu

ungelesen,

02.05.2016, 21:05:1302.05.16

an gpuocelot

Hello Jin,

Your suggestions works!

Here are the following issue.

[1] using emulated option for devices, configure.ocelot

./run_qtest.sh hangs, even with -n 1 (simulate 1 photon)

[2] using llvm for devices, configure.ocelot

I got the following error by simulating 1 photon migration

GPU=1 (Ocelot Multicore CPU Backend (LLVM-JIT)) threadph=0 extra=1 np=1 nthread=8192 maxgate=1 repetition=1

initializing streams ... init complete : 146 ms

requesting 5120 bytes of shared memory

lauching MCX simulation for time window [0.00e+00ns 5.00e+00ns] ...

simulation run# 1 ... (2.966550) PTXToLLVMTranslator.cpp:1023: Assertion message: PTXOperand datatype pred not supported for immediate operand.

mcx: ocelot/ocelot/translator/implementation/PTXToLLVMTranslator.cpp:1023: ir::LLVMInstruction::Operand translator::PTXToLLVMTranslator::_translate(const ir::PTXOperand&): Assertion `false' failed.

Aborted (core dumped)

Any suggestions?

Best,

Leiming

Jin Wang

ungelesen,

03.05.2016, 10:58:0603.05.16

an gpuocelot

Hi Leiming,

It looks like the syntax of an instruction is not recognized by PTXToLLVMTranslator (predicate value used as an immediate). You may want to pinpoint the instruction that causes the error and modify the LLVM backend.

For emulator backend, try to debug using the ocelot logging feature and see if you can locate the reason why your program hangs. https://github.com/gtcasl/gpuocelot/wiki/Debugging

jin

Qianqian Fang

ungelesen,

02.08.2016, 14:33:4702.08.16

an gpuocelot

On Tuesday, May 3, 2016 at 10:58:06 AM UTC-4, Jin Wang wrote:

Hi Leiming,

It looks like the syntax of an instruction is not recognized by PTXToLLVMTranslator (predicate value used as an immediate).

hi Jin

Leiming and I are working on a benchmark comparing GPU ocelot backend with OpenCL using MCX/MCXCL.

Over 5 years ago, I was indeed able to compile MCX with GPUOcelot and run simulations on NVIDIA GPUs, AMD GPUs and Intel CPUs. The MCX v0.5 source code was also included as part of the test suite in the Ocelot package (https://github.com/gtcasl/gpuocelot/tree/master/tests/unstructured/mcx/mcxlab). Our hope is to resurrect this feature using the latest MCX code and Ocelot.

After some online trouble shooting, I was able to successfully compile libocelot.so using gcc 4.8 on a Ubuntu 14.04 box. I then was able to reproduce the two issues Leiming reported earlier:

1. infinite loop when running on the nvidia backend, and

2. the "PTXToLLVMTranslator.cpp" error when running on the llvm CPU backend.

for the first issue, I managed to trace down to a few updates over the past years that caused the hanging (does not happen when using cuda). I committed the following simple fix,

https://github.com/fangq/mcx/commit/c113ae5eb208d48c24ff7b9d8c92c22a6c6aa3bf

https://github.com/fangq/mcx/commit/767e2d8d31952de642d986a76696ace236124c3f

and now the latest mcx code runs well using the nvidia backend with ocelot (only works for fermi and kepler, on maxwell, it dumped "cuModuleLoadDataEx() - returned 300. Failed to JIT module - mcx_core.cu using NVIDIA JIT with error").

The next step we'd like to do is to let the code run on the CPU and AMD GPU. However, for the llvm backend, I am getting the same error as Leiming:

GPU=6 (Ocelot Multicore CPU Backend (LLVM-JIT)) threadph=1220 extra=5760 np=10000000 nthread=8192 maxgate=1 repetition=1
initializing streams ... init complete : 253 ms
requesting 5120 bytes of shared memory
lauching MCX simulation for time window [0.00e+00ns 5.00e+00ns] ...
simulation run# 1 ... 
(8.467481) PTXToLLVMTranslator.cpp:1023: Assertion message: PTXOperand datatype pred not supported for immediate operand.
mcx: ocelot/translator/implementation/PTXToLLVMTranslator.cpp:1023: ir::LLVMInstruction::Operand translator::PTXToLLVMTranslator::_translate(const ir::PTXOperand&): Assertion `false' failed.
Aborted (core dumped)

I tried both llvm 3.4 and llvm 3.6, the error is the same.

You may want to pinpoint the instruction that causes the error and modify the LLVM backend.

I am curious if you can elaborate on this more? any tool or debugging option we can use for pinpointing the "bad" instruction?

This error should be quite easy to reproduce. If you run Ubuntu 12.04 or 14.04, you can simply run

git clone https://github.com/fangq/mcx.git
cd mcx
git checkout ocelot
cd src
make fermi BACKEND=ocelot AR=g++

then you can list all supported devices by running the binary produced under mcx/bin folder by

mcx -L

On my box, the first 4 are nvidia gpus (980Ti, 590 core 1 and core 2, and 730). the 5th is the

ocelot emulator, and the 6th is the llvm multi-core cpu backend.

My configure.ocelot has the following line in the "executive" section

devices: [ nvidia, llvm, emulated, amd ],

once you know the index of the llvm backend, you can then run the test script by

cd mcx/example/quicktest
../../bin/mcx -A -g 10 -n 1e7 -f qtest.inp -s qtest -r 1 -a 0 -b 0 -G #

where # is replaced by the device id of the llvm backend (on my system, it is 6).

For emulator backend, try to debug using the ocelot logging feature and see if you can locate the reason why your program hangs. https://github.com/gtcasl/gpuocelot/wiki/Debugging

I also tested with the emulator backend, I got the following error:

GPU=5 (Ocelot PTX Emulator) threadph=0 extra=100 np=100 nthread=1024 maxgate=1 repetition=1
initializing streams ... init complete : 254 ms
requesting 5120 bytes of shared memory
lauching MCX simulation for time window [0.00e+00ns 5.00e+00ns] ...
simulation run# 1 ... 
terminate called after throwing an instance of 'executive::RuntimeException'
  what():  barrier deadlock:
context at: [PC: 320] mcx_core.cu:493:1 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
context at: [PC: 429] mcx_core.cu:551:1 00000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
context at: [PC: 318] mcx_core.cu:813:2 11111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000


Aborted (core dumped)

I am not exactly sure this error means. Since the kernel does run with nvidia backend, I am wondering if

there are simple fixes that I can apply to make the code compatible with the latest ocelot.

any input would be highly appreciated

Qianqian

Jin Wang

ungelesen,

06.09.2016, 20:28:5606.09.16

an gpuocelot

Hi Qianqian,

Ocelot uses REPORT_BASE macros to turn on/off the logging message: https://github.com/gtcasl/gpuocelot/wiki/Debugging

In PTXToLLVMTranslator.cpp, you may want to change the value of "REPORT_BASE" to 1 and rebulid the ocelot. Ocelot may then print out the PTX instruction information when translating them to LLVM IR.

You may also want to build ocelot in debug mode (build.py -d) and use gdb to see where your program hangs.

jin

Allen antworten

Antwort an Autor

Weiterleiten