Train the neural network fails with 'Xbyak::Error'

418 views
Skip to first unread message

Tracy Nixon

unread,
Apr 18, 2019, 12:50:02 PM4/18/19
to EMAN2
When using EMAN2.22 I successfully imported tiltseries, generated a tomogram, picked good and bad 'particles' and segmented them. When running the next step of Train the neural network the program fails, either with cpu or gpu chosen as the device.

Using GPU #0..
terminate called after throwing an instance of 'Xbyak::Error'
  what():  internal error
Aborted
NOT Writing notes, ppid=-2
Using CPU...
terminate called after throwing an instance of 'Xbyak::Error'
  what():  internal error
Aborted

I tried installing a fresh version of EMAN 2.22 and that went to completion. Same error.

Any help understanding and overcoming this error is greatly appreciated.

Tracy

Muyuan Chen

unread,
Apr 18, 2019, 1:09:27 PM4/18/19
to EMAN2
I googled a bit and it seems to be a tensorflow bug...
What is your system? Are you sure your GPU and CUDA are working? If you are using linux, try "nvidia-smi" or "nvcc --version".
You can also try reinstalling tensorflow. Just make sure you are calling conda in EMAN, and do "conda remove" and "conda install"
If you are willing to give up the GPU, you can also try installing "tensorflow" instead of "tensorflow-gpu-base".

Tracy Nixon

unread,
Apr 18, 2019, 1:35:34 PM4/18/19
to EMAN2
I have linux debian jessie with amd threadripper 32 threads 128 GBRam 4 gtx 1080 gpu. gpu works well with relion, cryosparc, amber. I have to 'module load' for those programs to set environment variables. I have also eman2.22 running under module load to set environment, and it has worked before on all other tasks. I tried installing a fresh EMAN2.22 outside of the module load system, and that gave the same error. Noted during the fresh install of EMAN2.22 tensorflow-gpu-base was installed apparently ok. 

nvidia-smi shows:
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    1      8415      C   pmemd.cuda.MPI                              1747MiB |
|    3      9534      G   /usr/bin/Xorg                                533MiB |
|    3     10265      G   /usr/bin/gnome-shell                         388MiB |
|    3     11195      G   ...quest-channel-token=6004185231539653442   343MiB |
|    3     56132      G   /opt/chimera/1.11.2/bin/python2.7             32MiB |
|    3     83022      G   /opt/chimera/1.11.2/bin/python2.7             64MiB |
+-----------------------------------------------------------------------------+

nvcc --versioin shows:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

usually i would module load cuda to set its environment, but does eman2.22 need me to do that ?

Muyuan Chen

unread,
Apr 18, 2019, 2:53:09 PM4/18/19
to EMAN2
The neural network part of EMAN does not directly communicate with CUDA. All GPU operations are done through the tensorflow package. In this case, it seems that the program crashed when it tries to import tensorflow.
Can you trigger the same error by with
     python -c "import tensorflow"
? If so, re-installing the package or switching to a different version might help.

Tracy Nixon

unread,
Apr 18, 2019, 4:08:47 PM4/18/19
to EMAN2
yes - same error; I'll try to re-install the package or another version. 

Tracy Nixon

unread,
Apr 18, 2019, 5:34:00 PM4/18/19
to EMAN2
conda remove tensorflow-gpu-base removed tensorflow
conda install tensorflow-gpu-base installed tensorflow 1.5, but also upgraded PyQt from 4. to 5.6. so now eman e2projectmanager.py fails to run listing no PyQt4.

I tried other versions of tensorflow, but installing always seemed to prompt updating Qt and several other programs as well downgrading tensor and deleting eman-deps.

So, although installing tensor again corrected the python -c "import tensorflow", it created a new problem. Uggh!

Other ideas?

Tracy Nixon

unread,
Apr 18, 2019, 6:59:44 PM4/18/19
to EMAN2
Hi Muyuan,

I removed EMAN2.2 and installed the eman2.21.linux64.centos7.sh and it is working now, with the gpu. :)

thanks for your help!

Tracy

Muyuan Chen

unread,
Apr 18, 2019, 7:36:29 PM4/18/19
to EMAN2
EMAN2.21 uses Theano backend, so it is not surprising that it works.  However this is not an ideal solution as there are many upgrades since then.
You can also try the continuous build (the unstable release version) which is using Qt5... 

Tracy Nixon

unread,
Apr 21, 2019, 8:14:23 AM4/21/19
to EMAN2
I tried the continuous build, and get the same error at the top of the posting. Will now try to reinstall tensorflow-gpu-base.

Steve Ludtke

unread,
Apr 21, 2019, 11:39:56 AM4/21/19
to em...@googlegroups.com
Could it be a version problem with the underlying Nvidia driver?

--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)



--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tracy Nixon

unread,
Apr 26, 2019, 3:55:21 PM4/26/19
to EMAN2
I just downloaded and installed EMAN2.3 and got the same issue. I'm now looking into removing and reinstalling tensorflow-gpu-base and get conda to give the following. Is it ok to continue with that uninstall to then install, or do I have to worry about what is being removed and updated in addition to tensorflow-gpu-base?

The following packages will be REMOVED:

  eman-deps-14.1-0
  tensorflow-gpu-1.5.0-0
  tensorflow-gpu-base-1.5.0-py27had95abb_0

The following packages will be UPDATED:

  cryptography                         2.3.1-py27hc365091_0 --> 2.6.1-py27h1ba5d50_0
  libarchive                               3.3.3-h7d0bbab_0 --> 3.3.3-h5d8350f_5
  libpng                                  1.6.34-hb9fc6fc_0 --> 1.6.37-hbc83047_0
  openssl                                 1.0.2r-h7b6447c_0 --> 1.1.1b-h7b6447c_1
  python                                 2.7.14-h1571d57_31 --> 2.7.16-h9bab390_0
  qt                                       5.9.6-h8703b6f_2 --> 5.9.7-h5867ecd_1


Proceed ([y]/n)? n

AND, the nvidia-smi result is:
nixon@BTN1-3:~/EMAN2$ nvidia-smi
Fri Apr 26 15:48:26 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |

Steve Ludtke

unread,
Apr 26, 2019, 4:34:16 PM4/26/19
to em...@googlegroups.com
Hi Tracy,
I can now replicate this problem on one computer, but have had limited success so far figuring out why one works and the other doesn't. The main difference that stood out was that the failing machine had Cuda 10.1 installed and the working machine had Cuda 10.0, but downgrading the 10.1 machine to 10.0 didn't seem to fully resolve the problem. I'm using Arch linux on these machines which gives fairly fine-grained control over packages, but I'm still a little mystified by what's going on, because Tensorflow was fine on both machines prior to a small update, but rolling back doesn't seem to fix it.  Very odd.

I'll report more if I figure anything out!
 
--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)


Tracy Nixon

unread,
Apr 29, 2019, 1:14:54 PM4/29/19
to EMAN2
I have now been able to 'conda remove tensorflow-gpu-base and conda install tensorflow-gpu-base, and e2projectmanager.py now works. It can use the gpu for segmentation jobs. Yea!

$ conda remove tensorflow-gpu-base
Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/nixon/EMAN2

  removed specs:
    - tensorflow-gpu-base


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    libpng-1.6.37              |       hbc83047_0         364 KB
    qt-5.9.7                   |       h5867ecd_1        85.9 MB
    ------------------------------------------------------------
                                           Total:        86.2 MB

The following packages will be REMOVED:

  eman-deps-14.1-0
  tensorflow-gpu-1.5.0-0
  tensorflow-gpu-base-1.5.0-py27had95abb_0

The following packages will be UPDATED:

  cryptography                         2.3.1-py27hc365091_0 --> 2.6.1-py27h1ba5d50_0
  libarchive                               3.3.3-h7d0bbab_0 --> 3.3.3-h5d8350f_5
  libpng                                  1.6.34-hb9fc6fc_0 --> 1.6.37-hbc83047_0
  openssl                                 1.0.2r-h7b6447c_0 --> 1.1.1b-h7b6447c_1
  python                                 2.7.14-h1571d57_31 --> 2.7.16-h9bab390_0
  qt                                       5.9.6-h8703b6f_2 --> 5.9.7-h5867ecd_1


Proceed ([y]/n)? y


Downloading and Extracting Packages
qt-5.9.7             | 85.9 MB   | ################################################### | 100% 
libpng-1.6.37        | 364 KB    | ################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

$ conda install tensorflow-gpu-base
Collecting package metadata: done
Solving environment: done

## Package Plan ##

  environment location: /home/nixon/EMAN2

  added / updated specs:
    - tensorflow-gpu-base


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    astor-0.7.1                |           py27_0          42 KB
    c-ares-1.15.0              |       h7b6447c_1          98 KB
    gast-0.2.2                 |           py27_0         138 KB
    grpcio-1.16.1              |   py27hf8bcb03_1         1.0 MB
    tensorflow-gpu-base-1.7.0  |   py27h5b7bae4_1       134.3 MB
    termcolor-1.1.0            |           py27_1           7 KB
    ------------------------------------------------------------
                                           Total:       135.6 MB

The following NEW packages will be INSTALLED:

  astor              pkgs/main/linux-64::astor-0.7.1-py27_0
  c-ares             pkgs/main/linux-64::c-ares-1.15.0-h7b6447c_1
  gast               pkgs/main/linux-64::gast-0.2.2-py27_0
  grpcio             pkgs/main/linux-64::grpcio-1.16.1-py27hf8bcb03_1
  tensorflow-gpu-ba~ pkgs/main/linux-64::tensorflow-gpu-base-1.7.0-py27h5b7bae4_1
  termcolor          pkgs/main/linux-64::termcolor-1.1.0-py27_1


Proceed ([y]/n)? y


Downloading and Extracting Packages
astor-0.7.1          | 42 KB     | ################################################### | 100% 
c-ares-1.15.0        | 98 KB     | ################################################### | 100% 
grpcio-1.16.1        | 1.0 MB    | ################################################### | 100% 
gast-0.2.2           | 138 KB    | ################################################### | 100% 
termcolor-1.1.0      | 7 KB      | ################################################### | 100% 
tensorflow-gpu-base- | 134.3 MB  | ################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


Tracy
To unsubscribe from this group, send email to em...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to em...@googlegroups.com.

Steve Ludtke

unread,
Apr 29, 2019, 1:19:36 PM4/29/19
to em...@googlegroups.com
That's great!  Thanks for reporting. We'll have to see if this helps others with a similar OS version.

--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine 
Charles C. Bell Jr., Professor of Structural Biology
Dept. of Biochemistry and Molecular Biology                      (www.bcm.edu/biochem)
Academic Director, CryoEM Core                                        (cryoem.bcm.edu)
Co-Director CIBR Center                                    (www.bcm.edu/research/cibr)


To unsubscribe from this group, send email to eman2+un...@googlegroups.com

For more options, visit this group at
http://groups.google.com/group/eman2

--- 
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages