error with e2gmm.py Train GMM

28 views
Skip to first unread message

Madeline Rollins

unread,
Jul 10, 2024, 5:56:40 PMJul 10
to EMAN2
Hi all,
I am testing out the e2gmm.py GUI and I was wondering if anyone could help me troubleshoot an error message that I am getting an error during the "Train GMM" step that aborts the run. I am using EMAN version 2.99 and python version 2.7.2 on SBGrid.

Here is the command and error message:
e2gmm_refine_point.py --model gmm_00/run_01_model_gmm.txt --decoderin gmm_00/run_01_decoder.h5 --decoderout gmm_00/run_01_decoder.h5 --ptclsin gmm_00/particles.lst --ptclrepin gmm_00/run_01_ptrep_46.hdf --heter  --sym c1 --maxboxsz 46 --niter 10  --nmid 4 --modelreg 0.0 --perturb 0.25 --pas 110 --ptclsclip 384 --minressz 7  --encoderin gmm_00/run_01_encoder.h5 --encoderout gmm_00/run_01_encoder.h5 --chunk 0,5
241 gaussian in the model
/programs/x86_64-linux/eman2/2.99/eman2_extlib/eman2-2.99-7rk3/lib/python3.9/site-packages/keras/initializers/initializers_v2.py:120: UserWarning: The initializer HeNormal is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
/programs/x86_64-linux/eman2/2.99/eman2_extlib/eman2-2.99-7rk3/lib/python3.9/site-packages/keras/initializers/initializers_v2.py:120: UserWarning: The initializer RandomNormal is unseeded and being called multiple times, which will return identical values  each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initalizer instance more than once.
  warnings.warn(
Loading 257557 particles of box size 384. shrink to 48
 51000/51511 R       done      
identified 1257 3-D particles, merging gradients over each 3-D particle
Data read complete
Ptcl rep shape:  (51511, 241)
Traceback (most recent call last):
  File "/programs/x86_64-linux/eman2/2.99/bin/e2gmm_refine_point.py", line 1215, in <module>
    main()
  File "/programs/x86_64-linux/eman2/2.99/bin/e2gmm_refine_point.py", line 340, in main
    encode_model=tf.keras.models.load_model(f"{options.encoderin}",compile=False)
  File "/programs/x86_64-linux/eman2/2.99/eman2_extlib/eman2-2.99-7rk3/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/programs/x86_64-linux/eman2/2.99/eman2_extlib/eman2-2.99-7rk3/lib/python3.9/site-packages/keras/saving/legacy/save.py", line 227, in load_model
    raise IOError(
OSError: No file or directory found at gmm_00/run_01_encoder.h5

Ludtke, Steven J.

unread,
Jul 10, 2024, 6:12:40 PMJul 10
to em...@googlegroups.com
Need to see the full output from e2version.py. 2.99 isn't enough information  :^)    e2gmm has been under very heavy development after the first 2.99 release so the version you're using may have very different problems than another "2.99".  Would be useful to see a screenshot of the GUI as well, since that helps date it.

-----------------------------------------
Steven Ludtke, slud...@gmail.com 


--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/518937a3-8238-4ba1-9f84-3d7769ca484cn%40googlegroups.com.



---
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry 
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center


Madeline Rollins

unread,
Jul 11, 2024, 10:11:58 AMJul 11
to EMAN2
Yes, of course. I realized that I provided the wrong python version in my first post - that was the output from running sbgrid-info -L python.
Here is the output from e2version.py:

EMAN 2.99.47 ( GITHUB: 2023-03-04 13:31 - commit: 3f313008c3185410fe859663e763dffb9c0b6fcc )
Your EMAN2 is running on: Linux-4.18.0-305.57.1.el8_4.x86_64-x86_64-with-glibc2.28 4.18.0-305.57.1.el8_4.x86_64
Your Python version is: 3.9.16

I attached a screenshot of the GUI as well.
e2gmm_gui_screenshot.png

Steve Ludtke

unread,
Jul 11, 2024, 11:41:27 AMJul 11
to em...@googlegroups.com
Thanks, that helps a lot. Your version is a little out of date, but not as bad as I feared. While I would recommend updating to the latest snapshot, to gain some very useful new features for interpretation of results, you can reasonably run with that version.

Assuming your screenshot came from the folder where you ran into issues, I suspect you just missed a step. You created a GMM folder (gmm_00), but you didn’t create a “Run” within that folder. 

The general sequence is (and I realize this isn’t intuitive, there is a somewhat dated youtube video demonstrating for this reason):
“New GMM” - select refinement folder
“Create Run” - give descriptive name, “Runs” share the same data within a gmm_XX folder, but have completely independent neural networks
“Resolution” - creates initial pattern of Gaussians, adjust two text boxes adjacent to the button to get the desired number and distribution of Gaussians
Adjust any parameters you need to now! You cannot change things like Pos/Amp or number of latent dim for this run after this step without starting from scratch!
“Train Neutral” - this initializes the neural network to the “neutral” locations of the gaussians you just created (just takes a couple of minutes usually)
“Train GMM” - this will take the bulk of the time. The number if iterations required to achieve good results varies widely with the project. 20-30 is a minimal number, for big projects 100 may be needed.
You can run a small number of iterations, then press “Train GMM” again to continue, but the only parameters you can change when doing that are “Train Iter”, “Model Reg”, and “Model Perturb”. That is, if you run 25 iterations, and find that the results haven’t converged to a good answer yet, you can change train iter to 50 and hit “Train GMM” again and it will continue from after the 25.


To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/e791be3a-2000-4384-b386-dbb0f03c0d43n%40googlegroups.com.
<e2gmm_gui_screenshot.png>

Madeline Rollins

unread,
Jul 11, 2024, 12:42:25 PMJul 11
to EMAN2
Thanks so much! I've attached an additional screenshot of the GUI once I click on the GMM folder (gmm_00) that shows the run I created in that folder (run_01).

Based on the steps you provided, I think the error that I included in the first post (OSError: No file or directory found at gmm_00/run_01_encoder.h5) might be happening at the "Train Neutral" step? When I tried it yesterday, this step took much longer than a couple of minutes (maybe 1.5 hours or so). When I check the gmm_00 folder, I can see a file for run_01_encoder.h5, but I'm not sure why it says that the file is missing.

e2gmm_GUI_screenshot_02.png

Steve Ludtke

unread,
Jul 18, 2024, 7:36:06 AMJul 18
to em...@googlegroups.com
Hi Madeline,
very sorry for not getting back to you. I left for vacation and a few messages got buried. If you launch the program with a gmm_xx folder and a “run” created, but have not yet run the complete “Train Neutral” and “Train GMM” sequence, you will get some warning messages on the console as soon as you select the “run”. That’s just saying that it’s trying to show you the results of that run, but you haven’t fully run it yet. You can ignore those messages. “Train Neutral” shouldn’t normally take 90 minutes to run. Usually it’s more like 10 minutes. If it takes a long time this implies that you may not have your machine set up to use the GPU (I assume you’re running on a machine with a decent Nvidia GPU?).  If “Train Neutral” runs for a while, and doesn’t seem to produce any obvious error messages at the very end of the run, then you can safely proceed with the final step. Just be warned that if you aren’t running on a GPU, the final step could take a REALLY long time to run.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/7036efc2-de91-45f7-91ca-4910cfc4346fn%40googlegroups.com.
<e2gmm_GUI_screenshot_02.png>

Madeline Rollins

unread,
Jul 23, 2024, 10:55:10 AMJul 23
to EMAN2
Hi Steve,
No worries at all! I tried re-running the job on our cluster using an interactive job (with 1 Nvidia A40 or A100 GPU) and "Train Neutral" does run much faster. However, the "Train GMM" step doesn't get past the first chunk. I attached the full error message output from the terminal for reference, but I noticed this tensorflow message occurred early in the run:  

tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__TensorScatterAdd_device_/job:localhost/replica:0/task:0/device:CPU:0}} indices[341] = [31, 48] does not index into shape [48,48] [Op:TensorScatterAdd]

The e2gmm_refine_point.py command will restart, but then I get the error "OSError: No file or directory found at gmm_00/run_01_encoder.h5" The Train GMM sequence will restart a few times until it eventually terminates.

I'm not sure which of my inputs from the earlier steps are causing this issue and couldn't find anything about it in previous posts to the group. Do you have any advice on how to troubleshoot this?
e2gmm_error_messages
Reply all
Reply to author
Forward
0 new messages