Hi Muyuan and Steve,
Thanks for some suggestions,
Here is the output to test_tensorflow.py
python ./test_tensorflow.py
2023-10-20 13:58:09.285796: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Testing basic operations...
2023-10-20 13:58:10.127359: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.127517: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.140237: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.140426: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.140562: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.140687: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.140961: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-20 13:58:10.264079: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.264236: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.264365: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.264479: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.264592: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.264704: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.269799: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.269945: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.270077: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.270203: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.270327: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.270440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 22276 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:41:00.0, compute capability: 8.9
2023-10-20 13:58:10.270711: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-10-20 13:58:10.270818: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 21385 MB memory: -> device: 1, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:61:00.0, compute capability: 8.9
1 + 1 = 2
Testing matrix multiplication...
2023-10-20 13:58:10.414032: I tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:630] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
tf.Tensor(
[[ 2. 22. 16.]
[ 6. 23. 17.]
[ 0. 12. 4.]], shape=(3, 3), dtype=float32)
Testing convolution...
2023-10-20 13:58:10.425522: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8800
tf.Tensor(
[[18. 28. 27. 14.]
[17. 39. 27. 5.]
[25. 31. 23. 5.]
[12. 18. 9. 3.]], shape=(4, 4), dtype=float32)
Testing training set...
tf.Tensor([ 0.04707954 -0.12480973 0.15579893], shape=(3,), dtype=float32)
tf.Tensor([ 0.14167915 -0.05361223 0.06201855], shape=(3,), dtype=float32)
tf.Tensor([ 0.1726551 -0.11010747 -0.55970323], shape=(3,), dtype=float32)
tf.Tensor([-0.12425488 0.173307 -0.06009084], shape=(3,), dtype=float32)
4
Testing training...
2023-10-20 13:58:10.623839: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x55d0318e3e80 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-20 13:58:10.623861: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (0): NVIDIA GeForce RTX 4090, Compute Capability 8.9
2023-10-20 13:58:10.623866: I tensorflow/compiler/xla/service/service.cc:181] StreamExecutor device (1): NVIDIA GeForce RTX 4090, Compute Capability 8.9
2023-10-20 13:58:10.626268: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2023-10-20 13:58:10.690681: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
iter 0, loss 26.09, mean grad 18.62
iter 5, loss 11.54, mean grad 11.93
iter 10, loss 3.57, mean grad 5.71
iter 15, loss 1.02, mean grad 2.29
iter 20, loss 1.20, mean grad 2.62
iter 25, loss 1.47, mean grad 3.74
iter 30, loss 1.03, mean grad 3.16
iter 35, loss 0.41, mean grad 1.68
iter 40, loss 0.11, mean grad 0.72
iter 45, loss 0.11, mean grad 0.94
iter 50, loss 0.12, mean grad 1.27
Truth:
[[3. 3.]
[2. 4.]]
Estimate:
[[2.969382 2.8553815]
[2.0285828 3.9924746]]
For the current eman2 I used, I installed it yesterday with the standard binary installation (before I installed eman2 by the miniconda).
EMAN 2.99.47 ( GITHUB: 2023-03-04 13:31 - commit: 3f313008c3185410fe859663e763dffb9c0b6fcc )
Your EMAN2 is running on: Linux-5.15.0-76-generic-x86_64-with-glibc2.37 5.15.0-76-generic
Your Python version is: 3.9.16
For cuda, I installed 11.8 because I found eman2 might need cuda11.8 during the installation of the binary package. Please correct me if I used the wrong cuda.
Nividia-smi output is
Oct 20 14:10:22 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
### 11.8 cuda is still compatible with driver, because I checked motioncorr2 and aretomo are fucntional when I used cuda11.8.
Again, because CPU mode also generates a blank training output, I guess the issue is not related to CUDA version.
Regarding the command for training, (the tomo has been preprocessed by eman2)
e2tomoseg_convnet.py --trainset=particles/L29_002_tomo3d_xyz_trimed_preproc__mt_trainset.hdf --nettag=mt --learnrate=0.01 --niter=10 --ncopy=1 --batch=20 --nkernel=40,40,1 --ksize=15,15,15 --poolsz=2,1,1 --trainout --training --device=cpu
e2tomoseg_convnet.py --trainset=particles/L29_002_tomo3d_xyz_trimed_preproc__mt_trainset.hdf --nettag=mt --learnrate=0.01 --niter=10 --ncopy=1 --batch=20 --nkernel=40,40,1 --ksize=15,15,15 --poolsz=2,1,1 --trainout --training --device=gpu
### here just show 2 cases of GPU and CPU modes, both outputs have blanks (attached in previous email, named as for_steve_1.hdf).
Much appreciated,
Victor,