Errors while using the EMAN2 CNN

112 views
Skip to first unread message

wjnic...@gmail.com

unread,
Dec 14, 2021, 1:11:40 PM12/14/21
to EMAN2
Hello,

I lately had to do a clean install of my ubuntu 18.04. I want to train and use CNN on some data of mine.

First off, I've been having lots of "Segmentation fault (core dumped)" errors and EMAN2 crashes while opening windows or doing random operations. Can't find a pattern. I also see this message when I start EMAN2 now: "Failed to establish dbus connection"

Then, although it seems to still do the job, I am getting a lot of odd messages while training and applying the CNNs.

This while training:
NOT Writing notes, ppid=-2
Using CPU...
2021-12-14 09:52:44.786526: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
loading particles...
2493 particles loaded, 2493 in training set, 0 in validation set
(2493, 64, 64)
Std of particles:  0.9237895
Setting up model...
2021-12-14 09:52:52.322633: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-12-14 09:52:52.334993: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-12-14 09:52:52.378053: E tensorflow/stream_executor/cuda/cuda_driver.cc:328] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2021-12-14 09:52:52.378121: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: caliban
2021-12-14 09:52:52.378144: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: caliban
2021-12-14 09:52:52.378325: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 495.29.5
2021-12-14 09:52:52.378389: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 495.29.5
2021-12-14 09:52:52.378411: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 495.29.5
2021-12-14 09:52:52.379529: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-14 09:52:52.380367: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Training...
2021-12-14 09:52:52.936772: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-12-14 09:52:52.940989: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3092590000 Hz
NOT Writing notes, ppid=-2
Using GPU...
2021-12-14 09:53:35.423203: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
loading particles...
2493 particles loaded, 2493 in training set, 0 in validation set
(2493, 64, 64)
Std of particles:  0.9237895
Setting up model...
2021-12-14 09:53:46.616442: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-12-14 09:53:46.617794: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-12-14 09:53:46.663840: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:46.664263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:05:00.0 name: NVIDIA GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2021-12-14 09:53:46.664391: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-12-14 09:53:46.714065: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-12-14 09:53:46.714329: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-12-14 09:53:46.741835: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-12-14 09:53:46.749068: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-12-14 09:53:46.800952: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-12-14 09:53:46.808427: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-12-14 09:53:46.900610: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2021-12-14 09:53:46.900878: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:46.901358: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:46.901697: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-12-14 09:53:46.902123: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-14 09:53:46.902767: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-12-14 09:53:46.902911: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:46.903271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:05:00.0 name: NVIDIA GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2021-12-14 09:53:46.903391: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-12-14 09:53:46.903462: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
2021-12-14 09:53:46.903521: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.10
2021-12-14 09:53:46.903578: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-12-14 09:53:46.903639: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-12-14 09:53:46.903698: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-12-14 09:53:46.903764: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.10
2021-12-14 09:53:46.903824: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2021-12-14 09:53:46.903941: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:46.904352: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:46.904683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-12-14 09:53:46.916412: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
2021-12-14 09:53:49.424055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-12-14 09:53:49.424176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-12-14 09:53:49.424193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-12-14 09:53:49.424521: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:49.424779: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:49.424993: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-12-14 09:53:49.425175: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7214 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1080, pci bus id: 0000:05:00.0, compute capability: 6.1)
Training...
2021-12-14 09:53:50.393479: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-12-14 09:53:50.415877: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3092590000 Hz
2021-12-14 09:53:50.560290: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.7
2021-12-14 09:53:53.143770: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.10
iteration 0, cost -2.658
iteration 1, cost -2.735
iteration 2, cost -2.770
iteration 3, cost -2.805
iteration 4, cost -2.836
iteration 0, cost -2.642
iteration 5, cost -2.841
iteration 6, cost -2.885
iteration 7, cost -2.907
iteration 8, cost -2.947
iteration 9, cost -2.996
iteration 10, cost -3.053
iteration 11, cost -3.096
iteration 12, cost -3.122
iteration 13, cost -3.153
iteration 14, cost -3.161
iteration 15, cost -3.198
iteration 16, cost -3.221
iteration 17, cost -3.241
iteration 18, cost -3.260
iteration 19, cost -3.288
Writting network output of training set to neuralnets/trainout_nnet_save__good.hdf...
Saving the trained net to neuralnets/nnet_save__good.hdf...
Done

This while applying:
NOT Writing notes, ppid=-2
Using GPU...
2021-12-14 10:06:04.534113: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Loading the Neural Net...
Traceback (most recent call last):
  File "/home/wjnicol/miniconda3/envs/eman2/bin/e2tomoseg_convnet.py", line 512, in <module>
    main()
  File "/home/wjnicol/miniconda3/envs/eman2/bin/e2tomoseg_convnet.py", line 119, in main
    convnet=StackedConvNet_tf.load_network(options.from_trained, imgsz=tsz, bsz=1)
  File "/home/wjnicol/miniconda3/envs/eman2/bin/e2tomoseg_convnet.py", line 426, in load_network
    hdr=EMData(fname,0)
  File "/home/wjnicol/miniconda3/envs/eman2/lib/python3.7/site-packages/EMAN2.py", line 2906, in db_emd_init
    self.__initc(*parms)
Boost.Python.ArgumentError: Python argument types in
    EMData.__init__(EMData, NoneType, int)
did not match C++ signature:
    __init__(_object*, int nx, int ny)
    __init__(_object*, int nx, int ny, int nz)
    __init__(_object*, int nx, int ny, int nz, bool is_real)
    __init__(_object*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > filename)
    __init__(_object*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > filename, int image_index)
    __init__(_object*, EMAN::EMData that)
    __init__(_object*)
iteration 10, cost -3.026
iteration 11, cost -3.047
iteration 12, cost -3.060

Ludtke, Steven J.

unread,
Dec 14, 2021, 11:58:43 PM12/14/21
to em...@googlegroups.com
We'd probably need to see the complete output from e2version.py, and it would be useful to know the complete specs of your machine. Maybe also the full output of lsmod


--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)




--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/e589eb03-083c-4788-abfe-796ac141e1a8n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages