Hi Steve,
I updated the CUDA toolkit to the latest version (12.9), restarted my workstation, and I am still running into an error called 'CUDA_ERROR_INVALID_HANDLE'. I'll share the full command and error in one block here, then another with the updated
checks. Do you have any other suggestions for GPU compatibility?
Thanks,
Marcell
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ e2gmm_model_fit.py --path gmm_model_01 --map J689_Class_One_volume_map.mrc --resolution 2.6 --writetxt --rebuild_rotamer
2025-05-28 17:11:24.970886: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1748477484.979531 4736 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1748477484.982902 4736 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Input model: gmm_model_01/model_input.pdb
586 residues, 4444 atoms.
Using existing projection file gmm_model_01/map_projections.hdf...
Loading 1297 particles of box size 360. shrink to 206
1000/1297 R 1297/1297
Data read complete
Image size: (1297, 206, 104)
W0000 00:00:1748477491.662231 4736 gpu_device.cc:2433] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
W0000 00:00:1748477491.666785 4736 gpu_device.cc:2433] TensorFlow was not built with CUDA kernel binaries compatible with compute capability 12.0. CUDA kernels will be jit-compiled from PTX, which could take 30 minutes or longer.
I0000 00:00:1748477491.762614 4736 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 29162 MB memory: -> device: 0, name: NVIDIA GeForce RTX 5090, pci bus id: 0000:01:00.0, compute capability: 12.0
Initializing...
Shape of CA model: (586, 5)
2344 atoms in backbone
building decoder with 4444 Gaussian, using 64 anchor points
Traceback (most recent call last):
File "/home/marcell/anaconda3/envs/eman2/bin/e2gmm_model_fit.py", line 884, in <module>
main()
File "/home/marcell/anaconda3/envs/eman2/bin/e2gmm_model_fit.py", line 293, in main
gen_model=build_decoder_CA(pts[None,...], icls, meanzero=False, freeamp=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcell/anaconda3/envs/eman2/bin/e2gmm_model_fit.py", line 37, in build_decoder_CA
tf.keras.layers.Dropout(.2),
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/layers/regularization/dropout.py", line 53, in __init__
self.seed_generator = backend.random.SeedGenerator(seed)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/random/seed_generator.py", line 87, in __init__
self.state = self.backend.Variable(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/common/variables.py", line 186, in __init__
self._initialize_with_initializer(initializer)
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 48, in _initialize_with_initializer
self._initialize(lambda: initializer(self._shape, dtype=self._dtype))
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 39, in _initialize
self._value = tf.Variable(
^^^^^^^^^^^^
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 48, in <lambda>
self._initialize(lambda: initializer(self._shape, dtype=self._dtype))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/random/seed_generator.py", line 84, in seed_initializer
return self.backend.convert_to_tensor([seed, 0], dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/marcell/anaconda3/envs/eman2/lib/python3.12/site-packages/keras/src/backend/tensorflow/core.py", line 139, in convert_to_tensor
return tf.cast(x, dtype)
^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InternalError: {{function_node __wrapped__Cast_device_/job:localhost/replica:0/task:0/device:GPU:0}} 'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, 0, reinterpret_cast<CUstream>(stream), params, nullptr)' failed with 'CUDA_ERROR_INVALID_HANDLE' [Op:Cast] name:
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ nvidia-smi
Wed May 28 17:14:37 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07 Driver Version: 570.133.07 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 Off | 00000000:01:00.0 On | N/A |
| 0% 32C P8 15W / 575W | 650MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1964 G /usr/lib/xorg/Xorg 151MiB |
| 0 N/A N/A 2681 G /usr/bin/gnome-shell 111MiB |
| 0 N/A N/A 3530 G .../6103/usr/lib/firefox/firefox 314MiB |
+-----------------------------------------------------------------------------------------+
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Apr__9_19:24:57_PDT_2025
Cuda compilation tools, release 12.9, V12.9.41
Build cuda_12.9.r12.9/compiler.35813241_0
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ e2version.py
EMAN 2.99.67 ( GITHUB: 2025-05-07 13:09 - commit: NOT-INSTALLED-FROM-GIT-REPO )
Your EMAN2 is running on: Linux-6.11.0-26-generic-x86_64-with-glibc2.39 6.11.0-26-generic
Your Python version is: 3.12.10
(eman2) marcell@turul:/mnt/raid5/HCMV_pr/gmm$ conda list | grep tensorflow
tensorflow 2.18.0 cuda126py312h5379a72_200 conda-forge
tensorflow-base 2.18.0 cuda126py312hfb0ba9c_200 conda-forge
tensorflow-estimator 2.18.0 cuda126py312hd49ae37_200 conda-forge
tensorflow-gpu 2.18.0 cuda126py312h418687c_200 conda-forge