[attn Pedro]: JIDT GPU

19 views
Skip to first unread message

Simon Watson

unread,
Feb 1, 2022, 6:52:59 AM2/1/22
to Java Information Dynamics Toolkit (JIDT) discussion
Hi Pedro,

Joe advised that you might be the person to speak to about issues using JIDT GPU functionality.

I have an automated routine to compile using a PyTorch container as the base installation.
the CPU version works fine but my GPU work fails. It is my belief that this is likely due to a mismatch between my CUDA library and my driver possibly resulting from the use of CUDA 11.5 to work with compatibility mode to compile with NVIDIA driver 470.86 which should provide the functionality of CUDA 11.4. (As per the nvidia-smi below, this does appear to be working as expected).

Assuming this is the issue, can you advise:
  • Is it possible to make use of this 'compatibility mode' in the compilation of JIDT for GPU functionality? If so, how?
    Is it preferred that I add another cuda library (say 11.4) and if so, how do I 'point' the JIDT compilation process at that new library? (I have attempted using GPU_LIBRARY_PATH set via an environmental variable but I wasn't exactly sure what level to point it at and I didn't see any obvious change in behavior.)
  • If I am completely wrong, do you know what the issue might be?
  • Finally - I'm keen to get competent at GPU programming - any hot tips on good books/sites that will get me up the curve quickly? (I'm good at python and reasonable at C++)
Full details are below - please advise if any other information might be of benefit.

Regards,

Simon
-------------------------

In demos/java, I execute:  ./example10GPUBenchmark.sh
The error I get is as follows:
gpuKnnLibrary.c(38) : CUDA Runtime API error 222: the provided PTX was compiled with an unsupported toolchain..
plotExample10BenchmarkResults.py:20: UserWarning: loadtxt: Empty input file: "WhiteNoise.txt"
res1 = np.loadtxt(filename1)
Traceback (most recent call last):
File "plotExample10BenchmarkResults.py", line 21, in <module>
res2 = np.loadtxt(filename2)
File "/opt/conda/lib/python3.8/site-packages/numpy/lib/npyio.py", line 1067, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/opt/conda/lib/python3.8/site-packages/numpy/lib/_datasource.py", line 193, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/opt/conda/lib/python3.8/site-packages/numpy/lib/_datasource.py", line 533, in open
raise IOError("%s not found." % path)
OSError: Correlated2D.txt not found.

the application environment is as follows:
  OS:             Ubuntu 20.04 (Focal Fossa)
  CUDA:           NVIDIA CUDA 11.5.0
  cuBLAS:         11.7.3.1
  cuDNN:          8.3.1.96
  NCCL:           2.11.4
  NVIDIA Driver:  470.86
470.86 has support for CUDA 11.4. However, this container was built with CUDA 11.5 and is run in Minor Version Compatibility mode.

Note, this is inside a container but the container technology is exactly equivalent to the above specification (the container base is nvcr.io/nvidia/pytorch 21.11-py3). I see PyTorch functionality fully integrated with my machine's gpu.

The output from nvidia-smi is as follows:
Tue Feb  1 21:06:26 2022      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86       Driver Version: 470.86       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   64C    P0    37W /  N/A |   1151MiB /  8111MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Reply all
Reply to author
Forward
0 new messages