running an OpenCL app in Apptainer

78 views
Skip to first unread message

Sandeep Sarangi

unread,
Sep 12, 2023, 12:51:39 PM9/12/23
to discuss
Hello,

I'm trying to run the software provided here


using Apptainer. I was able to use the developer's provided Dockerfile (Dockerfile.nvidia) to get the GPU version of the app to work inside a Docker container. So I pushed the Docker image to Docker Hub and tried to create a corresponding Apptainer image by doing

sudo apptainer pull mdt.sif docker://stillill/mdt:latest

However the GPU version of the app does not work in the Apptainer container. When I shell into the Apptainer container by doing

apptainer shell --nv mdt.sif

and run the same tests as in the Docker container I get an error

pyopencl.cffi_cl.RuntimeError: clBuildProgram failed: BUILD_PROGRAM_FAILURE -

I feel like the error is related to the version of OpenCL in the Apptainer container but I'm not sure. I say this because running "clinfo" in the Docker container and Apptainer container return slightly different results. Would anyone have any idea why the app works fine with Docker but not Apptainer? The CPU version of the app works fine in both Docker and Apptainer but I'm interested in using the GPU version. In case it matters, the Apptainer version is 1.2.2 and the Docker version is 24.0.5.

Thanks!

David Godlove

unread,
Sep 13, 2023, 10:44:04 AM9/13/23
to Sandeep Sarangi, discuss
It's kindof hard for me to say what the problem might be without looking at the container itself and trying to run it.  The error suggests to me that your code might be trying to write to the container file system.  Maybe you could try adding an overlay or just running the container with the --writable-tmpfs option to create an ephemeral writable overlay in memory.  

There is an additional gotcha when you are running OpenCL apps that you need to use the --bind /etc/OpenCL option/argument pair.  See this section of the docs for more info.  But I don't think this is your issue. 

--
You received this message because you are subscribed to the Google Groups "discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@apptainer.org.
To view this discussion on the web visit https://groups.google.com/a/apptainer.org/d/msgid/discuss/ca6c08c6-252d-4321-baf8-b08012d24471n%40apptainer.org.

Sandeep Sarangi

unread,
Sep 14, 2023, 4:14:45 PM9/14/23
to discuss, David Godlove, discuss
Thanks so much for these suggestions! I wasn't familiar with these options. I did try each individually and then both but I still get the same pyopencl error as before. I don't think the problem is specific to the MDT app in the container either. In the Apptainer container I can't run the hello world PyOpenCL program given here (demo.py) https://documen.tician.de/pyopencl/. Though demo.py runs fine in the Docker container.

David Godlove

unread,
Sep 18, 2023, 12:23:16 PM9/18/23
to Sandeep Sarangi, discuss
Hi again.  I downloaded your container and played around with it a little bit but I could not piece together enough information to figure out how to run the tests you mentioned.  I was able to get the demo.py script to run without issue through the container on my system, but I think it is using the CPU.  If you can provide detailed instructions for how to reproduce the issue you are seeing (without making assumptions about the system or user's access to specialized data) I can try again.  Thanks. 

Sandeep Sarangi

unread,
Sep 19, 2023, 6:08:01 PM9/19/23
to discuss, David Godlove, discuss, Sandeep Sarangi
Hi David,

Thanks so much for testing this out! Regarding demo.py, that also works in Apptainer for me on the CPU but it fails on the GPU. When you run demo.py do you get prompted which device to run on as I do? I'm attaching a screenshot of what I see when running demo.py on the GPU in Apptainer. The error I get running demo.py is similar to what I get when running the developer example for MDT.

To run the developer MDT example I clone the MDT repo to my home folder, i.e.,

cd ~
git clone https://github.com/robbert-harms/MDT.git

Then I launch the Apptainer container (called mdt.sif) from my home folder:

apptainer shell --nv mdt.sif

Then once in the container I go to the MDT/tests folder and run the test_example_data.py Python script, i.e.,

cd MDT/tests
python3 test_example_data.py

I've also attached some screenshots of running the MDT test. Let me know if you need anything else.

Thanks again!
mdt3.png
demo.png
mdt1.png
mdt2.png

David Godlove

unread,
Sep 21, 2023, 5:52:16 PM9/21/23
to Sandeep Sarangi, discuss
Unfortunately, when I try this it hangs forever trying to import imp. Have you seen this error?  Do you know how I can progress to replicate the original issue you reported?  

$ apptainer shell --nv mdt.sif
Apptainer> cd MDT/tests/
Apptainer> python3 test_example_data.py
[2023-09-21 15:46:42,595] [INFO] [mdt] [fit_model] - Preparing BallStick_r1 with the cascaded initializations.
[2023-09-21 15:46:42,596] [INFO] [mdt.lib.processing.model_fitting] [fit_composite_model] - Using MDT version 1.2.6
[2023-09-21 15:46:42,596] [INFO] [mdt.lib.processing.model_fitting] [fit_composite_model] - Preparing for model BallStick_r1
[2023-09-21 15:46:42,608] [INFO] [mdt.models.composite] [_prepare_input_data] - No volume options to apply, using all 103 volumes.
[2023-09-21 15:46:42,608] [INFO] [mdt.utils] [estimate_noise_std] - Trying to estimate a noise std.
[2023-09-21 15:46:42,608] [INFO] [mdt.utils] [estimate_noise_std] - Estimated global noise std 19.613178253173828.
[2023-09-21 15:46:42,609] [INFO] [mdt.lib.processing.model_fitting] [_model_fit_logging] - Fitting BallStick_r1 model
[2023-09-21 15:46:42,609] [INFO] [mdt.lib.processing.model_fitting] [_model_fit_logging] - The 4 parameters we will fit are: ['S0.s0', 'w_stick0.w', 'Stick0.theta', 'Stick0.phi']
[2023-09-21 15:46:42,609] [INFO] [mdt.lib.processing.model_fitting] [fit_composite_model] - Saving temporary results in /tmp/tmpb1gn4dhlmdt_example_data_test/mdt_example_data/b1k_b2k/output/b1k_b2k_example_slices_24_38_mask/BallStick_r1/tmp_results.
/usr/lib/python3/dist-packages/mot/lib/utils.py:148: DeprecationWarning: pyopencl.array.vec is deprecated. Please use pyopencl.cltypes for OpenCL vector and scalar types
  return getattr(cl_array.vec, vector_type)
[2023-09-21 15:46:42,650] [INFO] [mdt.lib.processing.processing_strategies] [_process_chunk] - Computations are at 0.00%, processing next 8865 voxels (8865 voxels in total, 0 processed). Time spent: 0:00:00:00, time left: ? (d:h:m:s).
[2023-09-21 15:46:42,650] [INFO] [mdt.lib.processing.model_fitting] [_process] - Starting optimization
[2023-09-21 15:46:42,650] [INFO] [mdt.lib.processing.model_fitting] [_process] - Using MOT version 0.11.3
[2023-09-21 15:46:42,650] [INFO] [mdt.lib.processing.model_fitting] [_process] - We will use a single precision float type for the calculations.
[2023-09-21 15:46:42,650] [INFO] [mdt.lib.processing.model_fitting] [_process] - Using device 'CPU - pthread-12th Gen Intel(R) Core(TM) i7-12700H (Portable Computing Language)'.
[2023-09-21 15:46:42,650] [INFO] [mdt.lib.processing.model_fitting] [_process] - Using compile flags: ('-cl-denorms-are-zero', '-cl-mad-enable', '-cl-no-signed-zeros')
[2023-09-21 15:46:42,650] [INFO] [mdt.lib.processing.model_fitting] [_process] - We will use the optimizer Powell with optimizer settings {'patience': 2}
/usr/lib/python3/dist-packages/pytools/py_codegen.py:146: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp

David Godlove

unread,
Oct 11, 2023, 1:03:20 PM10/11/23
to Sandeep Sarangi, discuss
I apologize for the delay in responding.  I think I was able to replicate this and figure out what was wrong.  I created a new version of your container that had strace installed and used it to track which libraries are opened.  Looks like the container wants libnvidia-nvvm.so and can't find it.  That is not one of the libraries that is bind mounted into the container by default with the --nv option.  If you are able to edit the configuration file on your system called nvliblist.conf you can just add libnvidia-nvvm.so to the file and it should work.  

If you can't edit that file (because you are not the administrator on the system for instance) things become a little more complicated.  You can create an environment variable that will do the same thing as the --nv option and add the nvvm library in yourself.  But that is going to be dependent on the system where you are running.  In my case, this is the correct command to bind-mount the library and it's symlink reference into the container.  

export APPTAINER_BINDPATH=/lib64/libnvidia-nvvm.so.525.85.05:/.singularity.d/libs/libnvidia-nvvm.so.525.85.05,/lib64/libnvidia-nvvm.so.525.85.05:/.singularity.d/libs/libnvidia-nvvm.so.4

I was able to get this information by running the following command to search for the library on my host system.  

$ ldconfig --print-cache | grep nvvm
libnvidia-nvvm.so.4 (libc6,x86-64) => /lib64/libnvidia-nvvm.so.4
libnvidia-nvvm.so.4 (libc6) => /lib/libnvidia-nvvm.so.4
libnvidia-nvvm.so (libc6,x86-64) => /lib64/libnvidia-nvvm.so
libnvidia-nvvm.so (libc6) => /lib/libnvidia-nvvm.so


If you look at the 64 bit version of the library in lib64 you can see that it's a symlink to the actual library that is tagged with the driver version.  

$ ll /lib64/libnvidia-nvvm.so.4
lrwxrwxrwx. 1 root root 27 Jul 14 09:33 /lib64/libnvidia-nvvm.so.4 -> libnvidia-nvvm.so.525.85.05


You will need to use a similar method to figure out where the libraries are on your host and what is the driver version.  Then you can use a command like the one I've listed above to set the bind path.  

Let me know if that works and if you need any more help.  
Reply all
Reply to author
Forward
0 new messages