Help using Blackwell GPUS with EMAN2

72 views
Skip to first unread message

Marcell Zimanyi

unread,
May 28, 2025, 5:08:45 PMMay 28
to EMAN2
Hi All,

I have a new workstation with a Blackwell GPU, and I am having some trouble using GPU acceleration for GMM jobs. My workstation has an RTX 5090, and AMD 9950x3D processor, 128GB RAM, and a 4TB SSD running on Ubuntu 24.04. Overall, I'm quite new to Linux so I'm sorry if this is a basic question. I pasted the output of some version check commands at the end of this message. It seems to me that tensorflow is running on CUDA version 12.6, whereas my GPU has version 12.8. Is it possible to make my GPU compatible with EMAN2 at this point, or is it still too new/lacks support from major python packages? Is anyone else using a Blackwell GPU?

Thanks,

Marcell

```
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm/gmm_model_01$ conda list | grep tensorflow
tensorflow                2.18.0          cuda126py312h5379a72_200    conda-forge
tensorflow-base           2.18.0          cuda126py312hfb0ba9c_200    conda-forge
tensorflow-estimator      2.18.0          cuda126py312hd49ae37_200    conda-forge
tensorflow-gpu            2.18.0          cuda126py312h418687c_200    conda-forge
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm/gmm_model_01$ e2version.py
EMAN 2.99.67 ( GITHUB: 2025-05-07 13:09 - commit: NOT-INSTALLED-FROM-GIT-REPO )
Your EMAN2 is running on: Linux-6.11.0-25-generic-x86_64-with-glibc2.39 6.11.0-25-generic
Your Python version is: 3.12.10
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm/gmm_model_01$ nvidia-smi
Wed May 28 13:07:51 2025      
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        Off |   00000000:01:00.0  On |                  N/A |
|  0%   30C    P8             11W /  575W |     363MiB /  32607MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1958      G   /usr/lib/xorg/Xorg                      153MiB |
|    0   N/A  N/A            2718      G   /usr/bin/gnome-shell                     34MiB |
|    0   N/A  N/A            3252      G   ...exec/xdg-desktop-portal-gnome         23MiB |
|    0   N/A  N/A            3916      G   .../6103/usr/lib/firefox/firefox         66MiB |
|    0   N/A  N/A          226022      G   /usr/bin/nautilus                        16MiB |
+-----------------------------------------------------------------------------------------+
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm/gmm_model_01$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
```

Steve Ludtke

unread,
May 28, 2025, 5:12:07 PMMay 28
to em...@googlegroups.com
Hi Marcel,
looking at this, it appears that you have a current NVIDIA driver installed, which supports up to CUDA 12.8. You also have a cuda 12.6 version of tensorflow installed in your EMAN environment, also fine.  I suspect your issue is the last check you did showing that the version of cuda installed at the operating system level (not the python libraries in conda) is version 12.0 not 12.6.

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eman2/fe64f818-6c23-4980-b52f-40226b1225e4n%40googlegroups.com.

Marcell Zimanyi

unread,
May 28, 2025, 5:19:49 PMMay 28
to EMAN2
Hi Steve,

Thanks for taking a look. I will let my current jobs finish, then I will update CUDA toolkit and reboot my workstation.

Best wishes,

Marcell

Marcell Zimanyi

unread,
May 28, 2025, 8:16:11 PMMay 28
to EMAN2
Hi Steve,

I updated the CUDA toolkit to the latest version (12.9), restarted my workstation, and I am still running into an error called 'CUDA_ERROR_INVALID_HANDLE'. I'll share the full command and error in one block here, then another with the updated
checks. Do you have any other suggestions for GPU compatibility?

Thanks,

Marcell

(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ e2gmm_model_fit.py --path gmm_model_01 --map J689_Class_One_volume_map.mrc --resolution 2.6 --writetxt --rebuild_rotamer 2025-05-28 17:11:24.970886: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered WARNING: All log messages before absl::InitializeLog() is called are written to STDERR E0000 00:00:1748477484.979531 4736 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered E0000 00:00:1748477484.982902 4736 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered Input model: gmm_model_01/model_input.pdb 586 residues, 4444 atoms. Using existing projection file gmm_model_01/map_projections.hdf... Loading 1297 particles of box size 360. shrink to 206 1000/1297 R 1297/1297 Data read complete Image size: (1297, 206, 104) W0000 00:00:1748477491.662231 4736 gpu_device.cc:2433] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer. W0000 00:00:1748477491.666785 4736 gpu_device.cc:2433] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer. I0000 00:00:1748477491.762614 4736 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29162 MB memory: -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:01:00.0, compute capability: 12.0 Initializing... Shape of CA model: (586, 5) 2344 atoms in backbone building decoder with 4444 Gaussian, using 64 anchor points Traceback (most recent call last): File "/home/marcell/anaconda3/envs/eman2/bin/e2gmm_model_fit.py", line 884, in <module> main() File "/home/marcell/anaconda3/envs/eman2/bin/e2gmm_model_fit.py", line 293, in main gen_model=build_decoder_CA(pts[None,...], icls, meanzero=False, freeamp=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/marcell/anaconda3/envs/eman2/bin/e2gmm_model_fit.py", line 37, in build_decoder_CA tf.keras.layers.Dropout(.2), ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/layers/regularization/dropout.py", line 53, in __init__ self.seed_generator = backend.random.SeedGenerator(seed) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/random/seed_generator.py", line 87, in __init__ self.state = self.backend.Variable( ^^^^^^^^^^^^^^^^^^^^^^ File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/common/variables.py", line 186, in __init__ self._initialize_with_initializer(initializer) File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 48, in _initialize_with_initializer self._initialize(lambda: initializer(self._shape, dtype=self._dtype)) File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 39, in _initialize self._value = tf.Variable( ^^^^^^^^^^^^ File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 48, in <lambda> self._initialize(lambda: initializer(self._shape, dtype=self._dtype)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/random/seed_generator.py", line 84, in seed_initializer return self.backend.convert_to_tensor([seed, 0], dtype=dtype) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 139, in convert_to_tensor return tf.cast(x, dtype) ^^^^^^^^^^^^^^^^^ tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Cast] name:
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ nvidia-smi Wed May 28 17:14:37 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 5090 Off | 00000000:01:00.0 On | N/A | | 0% 32C P8 15W / 575W | 650MiB / 32607MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1964 G /usr/lib/xorg/Xorg 151MiB | | 0 N/A N/A 2681 G /usr/bin/gnome-shell 111MiB | | 0 N/A N/A 3530 G .../6103/usr/lib/firefox/firefox 314MiB | +-----------------------------------------------------------------------------------------+ (eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Wed_Apr__9_19:24:57_PDT_2025 Cuda compilation tools, release 12.9, V12.9.41 Build cuda_12.9.r12.9/compiler.35813241_0 (eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ e2version.py EMAN 2.99.67 ( GITHUB: 2025-05-07 13:09 - commit: NOT-INSTALLED-FROM-GIT-REPO ) Your EMAN2 is running on: Linux-6.11.0-26-generic-x86_64-with-glibc2.39 6.11.0-26-generic Your Python version is: 3.12.10 (eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ conda list | grep tensorflow tensorflow 2.18.0 cuda126py312h5379a72_200 conda-forge tensorflow-base 2.18.0 cuda126py312hfb0ba9c_200 conda-forge tensorflow-estimator 2.18.0 cuda126py312hd49ae37_200 conda-forge tensorflow-gpu 2.18.0 cuda126py312h418687c_200 conda-forge

Steve Ludtke

unread,
May 28, 2025, 8:51:22 PMMay 28
to em...@googlegroups.com
Hi Marcell,
so, to be clear, the problems you're struggling with here have nothing to do with EMAN at all, they are just getting a self-consistent version of tensorflow functioning within your Anaconda environment, so if you don't get debugging satisfaction here, you can look more broadly for solutions, but I'll offer what advice I can.

There are 2 components that need to be installed at the operating system level:
- NVidia driver
- CUDA

Then for Python in Anaconda:
- CUDA (the python part)
- TensorFlow

The only trick is getting compatible versions of each of these packages. Generally speaking, if you install the NVidia driver and CUDA using your operating system package manager you should have those self-consistent with each other. The only real trick there is that some operating systems don't stay very up-to date with these things, and the system provided packages may be too old for compatibility. That is, in many cases, things are self-consistent automatically and just work.

If you look at one of the lines from your anaconda packages:
tensorflow-gpu 2.18.0 cuda126py312h418687c_200 conda-forge
you can see that this is version 2.18.0 of tensorflow and it's based on CUDA 12.6 supporting Python 3.12 (the rest is just a hash). EMAN2 requires at least Python 3.12 now (because of other dependencies), and to support Python 3.12 you need to have at least tensorflow 2.18, which requires at least CUDA 12.6.   So a whole house of cards...

If you need Python CUDA 12.6, then ideally you'd like the operating system to also have CUDA 12.6 though you _might_ be ok if the OS has something newer than 12.6. CUDA 12.6 also has a minimum NVidia driver requirement, which I think is 560. 

You mentioned installing CUDA 12.9, but if you look at the nvidia-smi output, you'll see that your driver only claims to support up to CUDA 12.8...


Marcell Zimanyi

unread,
May 29, 2025, 12:05:27 PMMay 29
to EMAN2
Hi Steve,

I really appreciate your advice. I apologize, I'm new at managing my own workstation, so I'm still getting the hang of the basics. This is very helpful and gives me a lot to work with. I will try to harmonize all my drivers and packages.

Best wishes,

Marcell
Reply all
Reply to author
Forward
0 new messages