A possible error in training for segmentation in the newest version

158 views
Skip to first unread message

digvij...@gmail.com

unread,
Feb 20, 2021, 6:29:41 PM2/20/21
to EMAN2
Hi all,

I noticed that the training of the trainset for segmentation is failing in the newest version with the error mentioned below. But it proceeds and suceeds in the older versions of EMAN2.  Does EMAN2 install its own Keras and tensorflow libraries or use the ones default in the system?

Here's my EMAN2's latest version:

EMAN 2.91 ( GITHUB: 2021-02-19 22:23 - commit: dfb041b )
Your Python version is: 3.7.9


Error when training with gpu1 or gpu0 or gpun with the above version:

Using GPU #1..

loading particles...

5875 particles loaded, 5875 in training set, 0 in validation set

(5875, 64, 64)

Std of particles:  1.131026

Setting up model...

WARNING:tensorflow:From /data/software/repo/eman2/unstable/bin/e2tomoseg_convnet.py:329: The name tf.keras.initializers.TruncatedNormal is deprecated. Please use tf.compat.v1.keras.initializers.TruncatedNormal instead.


WARNING:tensorflow:From /home/dsingh/.local/lib/python3.7/site-packages/tensorflow_core/python/keras/initializers.py:94: calling TruncatedNormal.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.

Instructions for updating:

Call initializer instance with the dtype argument instead of passing it to the constructor

WARNING:tensorflow:From /home/dsingh/.local/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.

Instructions for updating:

If using Keras pass *_constraint arguments to layers.

Training...

Traceback (most recent call last):

  File "/data/software/repo/eman2/unstable/bin/e2tomoseg_convnet.py", line 512, in <module>

    main()

  File "/data/software/repo/eman2/unstable/bin/e2tomoseg_convnet.py", line 172, in main

    convnet.do_training(data, labels, shuffle=False, learnrate=options.learnrate, niter=options.niter)

  File "/data/software/repo/eman2/unstable/bin/e2tomoseg_convnet.py", line 368, in do_training

    for image, label in dataset:

  File "/home/dsingh/.local/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2034, in __iter__

    return iter(self._dataset)

  File "/home/dsingh/.local/lib/python3.7/site-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 343, in __iter__

    raise RuntimeError("__iter__() is only supported inside of tf.function "

RuntimeError: __iter__() is only supported inside of tf.function or when eager execution is enabled.

Muyuan Chen

unread,
Feb 20, 2021, 8:12:20 PM2/20/21
to em...@googlegroups.com
I am on the latest master branch right now, and I cannot trigger the error. I have not tested this on the actual 2.9 release, only a continuous build a few days before the release, and it seems to work too. 
Did you compile from source? Maybe you are on a later tensorflow version than I am? I have tensorflow 2.1, cudnn 7.6.5, and cuda 11. The environment that comes with the binary should be as old or slightly older. 

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/0f57faee-9fb5-4333-b88d-1b4c93a64b3dn%40googlegroups.com.

digvij...@gmail.com

unread,
Feb 20, 2021, 9:18:42 PM2/20/21
to EMAN2
Yes, mine is compiled from the source. 
Initially, I thought the issue may be with my GPU architecture. But then the older versions of EMAN2 worked well so I thought maybe this issue is with the newest version(s). 

Muyuan Chen

unread,
Feb 20, 2021, 9:25:39 PM2/20/21
to em...@googlegroups.com
When you say older version works, did you check out older versions from git then compile again, or just downloaded older binaries? It may have to do with your conda libraries (tensorflow etc) rather than the code itself. 

digvij...@gmail.com

unread,
Feb 20, 2021, 10:41:22 PM2/20/21
to EMAN2
Few older versions still exist in our machines, so it was not recently either recompiled from any old source code or installed using older binaries.  

The older version was activated by loading its environment module (module load eman2/2.39). Here's the older version that did the training using gpu without any errors. 

EMAN 2.39 (GITHUB: 2020-05-07 14:43 - commit: 0b3cbc2 )
Your Python version is: 3.7.7




,

Muyuan Chen

unread,
Feb 20, 2021, 10:56:06 PM2/20/21
to em...@googlegroups.com
So, as I said, it is probably a library related issue. Check your conda environment of the latest version, and set to a correct tensorflow version. Hopefully it will work...

digvij...@gmail.com

unread,
Feb 21, 2021, 12:33:35 AM2/21/21
to EMAN2
Thank you, I will check it. 

digvij...@gmail.com

unread,
Feb 27, 2021, 11:29:22 PM2/27/21
to EMAN2
I noticed one more issue with the segmentation in the newest version. 

Applying a net (built using older version) using the new EMAN2 produces weird results. I am attaching two pictures showing the neat segmentation applied using the older version vs the weird segmentation applied using the newest version. 

Are the nets and training datasets built using the old version of EMAN2 not applicable to new versions of EMAN2? 

Cheers
New.png
Old.png

Muyuan Chen

unread,
Feb 27, 2021, 11:51:21 PM2/27/21
to em...@googlegroups.com
Depending on how old is the old version, this could be the case. This is mostly because the backend library for the neural networks, tensorflow, keeps upgrading. And to support the latest hardware, we have to keep updating the program with the library, and it doesn't necessarily have backward compatibility. It might be easier to get the newest version working than to support older versions...

digvij...@gmail.com

unread,
Feb 27, 2021, 11:57:02 PM2/27/21
to EMAN2
I see. 

But the old manually annotated dataset would still be valid for re-training in the newer versions? 

Muyuan Chen

unread,
Feb 28, 2021, 12:30:17 AM2/28/21
to em...@googlegroups.com
Yes. The format of dataset should be unchanged.

digvij...@gmail.com

unread,
Feb 28, 2021, 12:36:07 AM2/28/21
to EMAN2
Cool. Thanks
Reply all
Reply to author
Forward
0 new messages