e2gmm.py error message

Maia Azubel

unread,

Sep 7, 2023, 1:23:39 PM9/7/23

to EMAN2

Hi Steve and Muyuan,

I've been testing e2gmm.py with tomography data.

First, I used ~10% of my dataset and it run without problems. The #Ptcl Batches automatically assigned was 7.

I noticed, though, that if I wanted to change the resolution and/or threshold for gaussian generation I needed to start a New GMM.

After testing different iteration numbers, Model Perturbation values, I decided to test on my entire data set.

Here's the version of eman2 I'm using.

e2version.py

EMAN 2.99.52 ( GITHUB: 2023-05-03 19:22 - commit: c218addd4a0bb0223d3be501adde8eaceb6f0433 )

Your EMAN2 is running on: Linux-4.18.0-425.10.1.el8_7.x86_64-x86_64-with-glibc2.28 4.18.0-425.10.1.el8_7.x86_64

Your Python version is: 3.9.16

I run everything from the GUI using

e2gmm.py

For the entire dataset I chose Resolution of 15A and threshold of 0.4

I got

147 gaussians - (64 pos; 83 neg)

The #Ptcls batches automatically assigned = 73

I got a popup with following message

Error running e2gmm_refine_point see console for details. If memory exhausted, increase batches.

console error message

Loading 3685466 particles of box size 256. shrink to 44

50000/50485 R done

identified 1329 3-D particles, merging gradients over each 3-D particle

Data read complete

Ptcl rep shape: (50485, 147)

Traceback (most recent call last):

File "/programs/x86_64-linux/eman2/nightly/bin/e2gmm_refine_point.py", line 1230, in <module>

main()

File "/programs/x86_64-linux/eman2/nightly/bin/e2gmm_refine_point.py", line 343, in main

encode_model=tf.keras.models.load_model(f"{options.encoderin}",compile=False)

File "/programs/x86_64-linux/eman2/nightly/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler

raise e.with_traceback(filtered_tb) from None

File "/programs/x86_64-linux/eman2/nightly/lib/python3.9/site-packages/keras/layers/merging/concatenate.py", line 98, in build

raise ValueError(

ValueError: A `Concatenate` layer should be called on a list of at least 1 input. Received: input_shape=(None, 4)

It is unclear to me that increasing the #Ptcl Batches is the solution here. If yes, what is your recommendation regarding the number? If not, is there any other thing I should try?

Thanks!

Maia

Maia Azubel

unread,

Sep 11, 2023, 3:55:49 PM9/11/23

to em...@googlegroups.com

Hi Steve and Muyuan,

Following up my previous email regarding the error encountered when running e2gmm.py

I increase ten-fold #Ptcls batches 730

But still got the following error message in the console

Loading 3685466 particles of box size 256. shrink to 44

5000/5048 R done

identified 133 3-D particles, merging gradients over each 3-D particle

Data read complete

Ptcl rep shape: (5048, 147)

Traceback (most recent call last):

File "/programs/x86_64-linux/eman2/nightly/bin/e2gmm_refine_point.py", line 1230, in <module>

main()

File "/programs/x86_64-linux/eman2/nightly/bin/e2gmm_refine_point.py", line 343, in main

encode_model=tf.keras.models.load_model(f"{options.encoderin}",compile=False)

File "/programs/x86_64-linux/eman2/nightly/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler

raise e.with_traceback(filtered_tb) from None

File "/programs/x86_64-linux/eman2/nightly/lib/python3.9/site-packages/keras/layers/merging/concatenate.py", line 98, in build

raise ValueError(

ValueError: A `Concatenate` layer should be called on a list of at least 1 input. Received: input_shape=(None, 4)

Neutral gaussian model missing (gmm_07/15-0306-25-100-01_model_gmm.txt)

Traceback (most recent call last):

File "/programs/x86_64-linux/eman2/nightly/bin/e2gmm.py", line 2270, in sel_run

self.decoder = tf.keras.models.load_model(f"{self.gmm}/{self.currunkey}_decoder.h5",compile=False)

File "/programs/x86_64-linux/eman2/nightly/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler

raise e.with_traceback(filtered_tb) from None

File "/programs/x86_64-linux/eman2/nightly/lib/python3.9/site-packages/keras/saving/legacy/save.py", line 227, in load_model

raise IOError(

OSError: No file or directory found at gmm_07/15-0306-25-100-01_decoder.h5

Run gmm_07 -> 15-0306-25-100-01 results incomplete. No stored decoder found. <__main__.EMGMM object at 0x7fdafb8900d0>

Any suggestion?

Thanks,

Maia

From: em...@googlegroups.com <em...@googlegroups.com> on behalf of Maia Azubel <maz...@stanford.edu>
Sent: Thursday, September 7, 2023 10:23 AM
To: EMAN2 <em...@googlegroups.com>
Subject: [EMAN2] e2gmm.py error message

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB6657978949FAD6ED733654EBDFEEA%40BY5PR02MB6657.namprd02.prod.outlook.com.

Steve Ludtke

unread,

Sep 11, 2023, 5:22:56 PM9/11/23

to em...@googlegroups.com

Hi Maia,

sorry, this one is on me, not Muyuan, and I'm in South Korea at IMC this week, hence my sluggish replies (I was in the middle of 40 hours of travel to get here when your first message arrived).

The batch mechanism you're referring to here is not really ideal, and is on top of the particle level batching used for the training, but it's the only strategy I've come up with in using the current architecture for dealing with very large data sets. It does work for me, it's just slow and inefficient. The number of batches you're having suggested would seem to imply a much larger data set than I've ever tested on before. To comment I need to know a little more about the project.

- number of particles in the full set

- tilt range and step

- A/pix

- box size at full sampling (I see what it says in the log, but it's unclear if that is the size with the extra padding)

The whole batch idea is still a work in progress as are several of the other new features. It's great to have people trying to test out some of the new ideas on their own data, but if you're going to live at the bleeding edge, you'll probably need to update your EMAN2 version frequently, as e2gmm continues to evolve fairly rapidly.

You should be able to press the 'resolution' button multiple times without creating a new gmm folder. It's only once you've started the actual training process that you can't easily go back to that step. You should even be able to change the number of Gaussians after "Train Neutral", but if you do, you need to run "Train Neutral" again. Also, you need to make sure that before you run any training, you have selected the Position/Amplitude settings you plan to use. Once you train with a particular GMM you can't change those.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB6657835F257C95B59D0CCC35DFF2A%40BY5PR02MB6657.namprd02.prod.outlook.com.

Maia Azubel

unread,

Sep 11, 2023, 6:27:36 PM9/11/23

to em...@googlegroups.com

Thanks, Steve.

To answer your questions

~100K subtomos

+/-60 with 3° increment

1.34Å/pixel

boxsize = 256

I'll work on updating the version I'm currently using.

Again I want to thank you (and apologize if my second email sounded as I was impatient, the support you and Muyuan give is wonderful!)

Maia

From: em...@googlegroups.com <em...@googlegroups.com> on behalf of Steve Ludtke <slud...@gmail.com>
Sent: Monday, September 11, 2023 2:22 PM
To: em...@googlegroups.com <em...@googlegroups.com>
Subject: Re: [EMAN2] e2gmm.py error message

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/115757CB-AF13-47D1-9BF4-37CA98122298%40gmail.com.

Steve Ludtke

unread,

Sep 18, 2023, 10:59:35 AM9/18/23

to em...@googlegroups.com

Ok, severely jetlagged, but at least back in the US again :^)

Any luck on the newer version?

That is quite a lot of particles (and I could have seen those numbers in the original run output you sent, but didn’t quite believe what I was seeing). You’re talking about 4 million subtilt images, with a pretty significant box size. Just running the numbers, it seems like even with a pretty high particle density in a 4k tomogram, you’d be talking about something like 500 tomograms worth of data here!? While I have seen a couple of projects of this scope, they certainly aren’t very common yet. Also, you still didn’t specify, is that box size of 256 the padded box-size, or the unpadded box size. ie - when you extract subtilt series, you normally pad the boxes by an extra 2x to help eliminate some of the material above/below the particle. Is 256 the unpadded particle?

I think for a data set that large to work in any reasonable way, we may need to switch to some sort of stochastic sampling of the particles for the training process. That wouldn’t be exceptionally difficult to implement, but doesn’t exist right now. Having said that, I suspect the training would probably work just as well with only 10,000 particles, even if the final embedded distribution won’t be as smooth. Were you seeing results with 10% of the data which indicated some features in the latent space that you needed 100k to completely fill out? It should be possible to run the full 100k particles through the network trained on 10k particles, without having to train on the complete set. There is no provided tool in the GUI to do this, but seems like it should be doable from the command line. Let me know if you want to try that and I’ll see if I can work out the best approach…. If you really need to do the training with the full 100k data set, either a stochastic approach or waiting a really long time would be the two approaches.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB66574FCC3B5903D9E9D26A92DFF2A%40BY5PR02MB6657.namprd02.prod.outlook.com.

Maia Azubel

unread,

Sep 18, 2023, 6:11:39 PM9/18/23

to em...@googlegroups.com

Thanks, Steve!!

Yes, to the new version, and I did a new update a couple of hours ago.

e2version.py

EMAN 2.99.52 ( GITHUB: 2023-09-18 13:04 - commit: f52fc311f )
Your EMAN2 is running on: Linux-4.18.0-338.el8.x86_64-x86_64-with-centos-8 4.18.0-338.el8.x86_64

I get the following message when I start e2gmm.py

Traceback (most recent call last):

File "/data/maia2/miniconda3/envs/eman2/bin/e2gmm.py", line 1472, in sel_run

self.decoder = tf.keras.models.load_model(f"{self.gmm}/{self.currunkey}_decoder.h5",compile=False)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model

return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 168, in load_model_from_hdf5

custom_objects=custom_objects)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/model_config.py", line 55, in model_from_config

return deserialize(config, custom_objects=custom_objects)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/layers/serialization.py", line 106, in deserialize

printable_module_name='layer')

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/utils/generic_utils.py", line 292, in deserialize_keras_object

config, module_objects, custom_objects, printable_module_name)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/utils/generic_utils.py", line 250, in class_and_config_for_serialized_keras_object

raise ValueError('Unknown ' + printable_module_name + ': ' + class_name)

ValueError: Unknown layer: Functional

Run gmm_00 -> 15-25-150 results incomplete. No stored decoder found. <__main__.EMGMM object at 0x7fb550c58230>

but I ignored it as it looks to be related to gmm-00

I can now press resolution multiple times without having to create a new run. I see several differences.

In this new version I don't see either #Ptcls Batches or Input Res. Also, to the left of Train Neural New there is an option for Train Neural Model, and to the left of New Dynamics an option for Run Dynamics. I don't know if/when to use Train Neural Model and Run Dynamics. I'm attaching a screenshot of e2gmm.py GUI to make this clearer, but also because I was surprised not to see any of the options to display the Neural Map, Neural Model, Dynamic Map, etc. on the lower right corner (as it was in the previous version).

For now, I'm trying to reproduce what I was doing before with the new version.

>New GMM

>Create a run

Press resolution (15A - 0.3)

Got: Resolution=15.0 -> Ngauss=87 (4-40-01)

I don't see on the console any info for positive and negative as before.

>Train Neural New

...

e2make3dpar.py --input gmm_08/4-40-01_model_projs.hdf --output gmm_08/4-40-01_model_recon.hdf --pad 320 --mode trilinear --keep 1 --threads 24

e2make3dpar.py

833 input images

Using 833 images

3D Fourier dimensions are 322 320 320

3D Fourier subvolume is 322 320 320

You will require approximately 0.198 GB of memory to reconstruct this volume

Warning: no radial correction applied for this mode

Exiting

>New Dynamics. (still going )

To your questions. I have more than 1k ptcls per tomogram. Originally, I had a smaller box size (190) but I was seeing some artifacts and per Muyuan's suggestion I used Rng XYZ to display the volume and it was clear that I was cutting some of the density.

I used

e2spt_extract.py --jsonali spt_72/aliptcls3d_02.lst --mindist 100 --keep 0.99 --newlabel 256ncp --boxsz_unbin 256 --parallel=thread:24

so, I hope I understand your question correctly, 256 is the unpadded particle.

For the 10% dataset I mentioned in my previous email I used --keep 0.1. Regarding the features I saw for 10K I'm attaching one example (819 gaussian -459pos).

I'm interested to see a representation of the entire dataset in part because if I'm not mistaken the way that I extracted the 10% of the dataset is not completely random. By using --keep 0.1 from a previous refinement I extracted the 'best' particles (which not necessarily are the 'most dynamic' particles, right?). But to be honest, a main goal is to be able to classify the entire dataset. I was thinking that I might be able to use the different maps I got after running Kmeans and Build Map from the e2gmm GUI as references in e2spt_refinemulti_new.py. But I'd really like to hear what your recommendation is, and I'm happy to be better at updating the version and try new things when they become available.

Many thanks again!

Maia

From: em...@googlegroups.com <em...@googlegroups.com> on behalf of Steve Ludtke <slud...@gmail.com>

Sent: Monday, September 18, 2023 7:59 AM

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/7C528F51-6135-4C32-8EE3-9A6DC8D03DCE%40gmail.com.

e2gmm-GUI20230918.png

iter80ModelPertub0.05.png

Ludtke, Steven J.

unread,

Sep 18, 2023, 6:25:07 PM9/18/23

to em...@googlegroups.com

Something clearly isn't right there. The e2gmm interface you're seeing is a very old, not a new version, though your e2version seems correct.

type "which e2gmm.py", and make sure it's pointing to the copy in your new installation... Something very strange is clearly happening

---

Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB6657692A3F76C7BF4B8C4D7CDFFBA%40BY5PR02MB6657.namprd02.prod.outlook.com.
<e2gmm-GUI20230918.png><iter80ModelPertub0.05.png>

Maia Azubel

unread,

Sep 18, 2023, 7:45:33 PM9/18/23

to em...@googlegroups.com

which e2gmm.py

~/miniconda3/envs/eman2/bin/e2gmm.py

Should I remove this installation and start from scratch?

Attaching screenshot from today's update.

Maia

From: em...@googlegroups.com <em...@googlegroups.com> on behalf of Ludtke, Steven J. <slu...@bcm.edu>
Sent: Monday, September 18, 2023 3:24 PM

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/93EE4AF7-9132-489F-91BA-A6269D0F2825%40bcm.edu.

Screenshot 2023-09-18 at 4.43.46 PM.png

Ludtke, Steven J.

unread,

Sep 18, 2023, 8:47:21 PM9/18/23

to em...@googlegroups.com

Hi Maia,

I'm very puzzled.You clearly have the current version checked out based on e2version. I'm not sure I'd suggest completely deleting and starting from scratch without slightly more debugging first.

To confirm, you did:

cd <source>

git pull

cd <build>

make clean

cmake <source> -DENABLE_OPTIMIZE_MACHINE=ON

make -j 8 install

?

---

Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB6657C8E79911F1A2A6F1F9E2DFFBA%40BY5PR02MB6657.namprd02.prod.outlook.com.
<Screenshot 2023-09-18 at 4.43.46 PM.png>

Maia Azubel

unread,

Sep 18, 2023, 10:14:59 PM9/18/23

to em...@googlegroups.com

Yes. I did that.

Copied and pasted the info from the console in case I'm missing anything.

Maia

From: em...@googlegroups.com <em...@googlegroups.com> on behalf of Ludtke, Steven J. <slu...@bcm.edu>

Sent: Monday, September 18, 2023 5:46 PM

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/0A9827FA-4490-42FE-A36A-C9F252C50DE3%40bcm.edu.

eman-build.txt

Ludtke, Steven J.

unread,

Sep 18, 2023, 10:40:23 PM9/18/23

to em...@googlegroups.com

Ahh, there is an error in there:

/data/maia2/eman2/libEM/glutil.cpp:55:13: fatal error: GL/glu.h: No such file or directory

#include "GL/glu.h"

^~~~~~~~~~

You may be missing the OpenGL Mesa header files. Maybe try:

yum install bzip2 mesa-libGL-devel freeglut freeglut-devel -y

(https://bitsanddragons.wordpress.com/2020/07/13/fatal-error-gl-glu-h-no-such-file-or-directory-installing-eman2/)

---

Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB6657E76074F3791175059BA2DFFAA%40BY5PR02MB6657.namprd02.prod.outlook.com.
<eman-build.txt>

Maia Azubel

unread,

Sep 19, 2023, 7:26:09 PM9/19/23

to em...@googlegroups.com

The lib has been installed in the three systems I work with.

I repeated the build and noticed that last time I didn't not do

>make clean

I did >make clean this time and didn't get any error.

In one of the systems, everything is running smoothly. However, in the other two systems I get

"Illegal instruction (core dumped)"

for any eman2 command. I'm trying to figure out what is the problem from our end, and I know this is beyond the help you can provide but wanted to let you know where things stand.

Thanks for your help,

Maia

From: em...@googlegroups.com <em...@googlegroups.com> on behalf of Ludtke, Steven J. <slu...@bcm.edu>

Sent: Monday, September 18, 2023 7:39 PM

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/9FD7DA43-9AC4-44BD-A40B-4A5F6287B442%40bcm.edu.

Ludtke, Steven J.

unread,

Sep 19, 2023, 11:36:00 PM9/19/23

to em...@googlegroups.com

Hi Maia,

did you build EMAN2 on one machine and try to run the same binaries on another machine? ie - are you using a shared home directory on multiple machines or somesuch? When you build EMAN2 from source with the suggested "-DENABLE_OPTIMIZE_MACHINE=ON" option, it compiles taking advantage of the features available to the specific CPU on that computer. The resulting binary will be forward compatible in almost every case, but won't be backwards compatible in many cases. That is, if you compile on a machine built in 2022 then try to run the binaries on a machine from 2017, the binaries may not work (illegal instruction). However, if you compile on the 2017 machine, it will likely run fine on the 2022 machine, with the caveat that it might be slightly slower than it could be. When we distribute compiled binaries, we generally target a machine 5-7 years old, so it will be fairly well optimized, but still run on most computers still in use.

---

Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/MN2PR02MB66726BAADBE6506E36A83A1ADFFAA%40MN2PR02MB6672.namprd02.prod.outlook.com.

Maia Azubel

unread,

Sep 20, 2023, 3:05:02 PM9/20/23

to em...@googlegroups.com

I initially built EMAN2 in our oldest computer.

After your email I was advised to try to build in one of the other computers, as the oldest is an intel chipset and the newer are AMD. Now it's working in all three systems.

Thanks again for your help!

Maia

From: em...@googlegroups.com <em...@googlegroups.com> on behalf of Ludtke, Steven J. <slu...@bcm.edu>

Sent: Tuesday, September 19, 2023 8:35 PM

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CFB6DF26-0BE5-4BAA-B8B3-AD0EC21612F6%40bcm.edu.

Maia Azubel

unread,

Jan 16, 2024, 6:30:50 PM1/16/24

to em...@googlegroups.com

Hi Steve,

I have a few questions regarding e2gmm.py

I'm using

e2version.py

EMAN 2.99.54 ( GITHUB: 2023-12-05 12:49 - commit: 4e38f2525 )

Your EMAN2 is running on: Linux-4.18.0-338.el8.x86_64-x86_64-with-centos-8 4.18.0-338.el8.x86_64

Your Python version is: 3.7.9

I have a large dataset (more info below). With the newer version the program is running (in most cases*). I've been playing with the number of gaussians, input res, train iter, and model pertub parameters. (Haven't explored changing latent dim (I'm using 4) or model reg (using 0)).

I'm using Kmeans to generate subsets of particles and Build Map to get the corresponding maps.

Would it make sense to use e2refine_new.py to try to improve the maps?
Is there a way to get *.lst file from the generated subsets (to be use for --ptcls) ?
Is there a way to plot the scatterplot in 3D? In other words, I can use Matlab to 3D-plot but what is the file I should use?
If I see significant motion in a specific area, would make sense to use the Mask option?
When do you recommend increasing the number of latent dim, or model reg?

As a side (less important) question,

Is it possible to set that the default for scatter plot is Symbol instead of Contour (or at least it doesn't switch back to contour every time a new run is selected)?

Many thanks,
Maia

*For most of the runs I get message

...

xxx gaussians in the model
in 2 groups

...

However, in some cases I don't get the 'in 2 groups' and then I get memory exhausted error message.

(Full error from console

Data read complete

Ptcl rep shape: (50485, 283)

Traceback (most recent call last):

File "/data/maia2/miniconda3/envs/eman2/bin/e2gmm_refine_point.py", line 1231, in <module>

main()

File "/data/maia2/miniconda3/envs/eman2/bin/e2gmm_refine_point.py", line 344, in main

encode_model=tf.keras.models.load_model(f"{options.encoderin}",compile=False)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model

return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 171, in load_model_from_hdf5

load_weights_from_hdf5_group(f['model_weights'], model.layers)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 677, in load_weights_from_hdf5_group

' layers.')

ValueError: You are trying to load a weight file containing 4 layers into a model with 0 layers.

e2gmm_refine_point.py --model gmm_09/10-037075-25-100-4-160-01_model_gmm.txt --decoderin gmm_09/10-037075-25-100-4-160-01_decoder.h5 --decoderout gmm_09/10-037075-25-100-4-160-01_decoder.h5 --ptclsin gmm_09/particles.lst --ptclrepin gmm_09/10-037075-25-100-4-160-01_ptrep_26.hdf --heter --sym c1 --maxboxsz 68 --niter 26 --nmid 4 --modelreg 0.0 --perturb 0.1 --pas 110 --ptclsclip 256 --minressz 6 --encoderin gmm_09/10-037075-25-100-4-160-01_encoder.h5 --encoderout gmm_09/10-037075-25-100-4-160-01_encoder.h5 --chunk 0,73

283 gaussians in the model

Loading 3685466 particles of box size 256. shrink to 72

50000/50485 R done

identified 1329 3-D particles, merging gradients over each 3-D particle

Data read complete

Ptcl rep shape: (50485, 283)

Traceback (most recent call last):

File "/data/maia2/miniconda3/envs/eman2/bin/e2gmm_refine_point.py", line 1231, in <module>

main()

File "/data/maia2/miniconda3/envs/eman2/bin/e2gmm_refine_point.py", line 344, in main

encode_model=tf.keras.models.load_model(f"{options.encoderin}",compile=False)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/save.py", line 146, in load_model

return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 171, in load_model_from_hdf5

load_weights_from_hdf5_group(f['model_weights'], model.layers)

File "/data/maia2/miniconda3/envs/eman2/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 677, in load_weights_from_hdf5_group

' layers.')

ValueError: You are trying to load a weight file containing 4 layers into a model with 0 layers.)

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB6657F1B2F09AF413ED3906B5DFF9A%40BY5PR02MB6657.namprd02.prod.outlook.com.

Steve Ludtke

unread,

Jan 17, 2024, 3:03:45 PM1/17/24

to em...@googlegroups.com

On Jan 16, 2024, at 5:30 PM, Maia Azubel <maz...@stanford.edu> wrote:

Hi Steve,
I have a few questions regarding e2gmm.py

I'm using
e2version.py
EMAN 2.99.54 ( GITHUB: 2023-12-05 12:49 - commit: 4e38f2525 )
Your EMAN2 is running on: Linux-4.18.0-338.el8.x86_64-x86_64-with-centos-8 4.18.0-338.el8.x86_64
Your Python version is: 3.7.9

I have a large dataset (more info below). With the newer version the program is running (in most cases*). I've been playing with the number of gaussians, input res, train iter, and model pertub parameters. (Haven't explored changing latent dim (I'm using 4) or model reg (using 0)).

I'm using Kmeans to generate subsets of particles and Build Map to get the corresponding maps.
Would it make sense to use e2refine_new.py to try to improve the maps?

Yes, the quick reconstructions done within e2gmm, while "correct" don't allow for orientation adjustments. I wouldn't entirely trust any features emerging without doing re-refinements at least on a fraction of the subsets.

Is there a way to get *.lst file from the generated subsets (to be use for --ptcls) ?

Yes, the "Save Set" button in the lower right corner below the 2-D plot window will produce a .lst file with fairly obvious naming convention in the appropriate gmm_xx folder. Note that this will save a single .lst file containing particles for all of the selected sets if multiple sets are selected, not an individual file for each selected set.

Is there a way to plot the scatterplot in 3D? In other words, I can use Matlab to 3D-plot but what is the file I should use?

The gmm_XX/*_aug.txt files contain the information plotted in the 2-D plot as a simple multicolumn text file. You can use external analysis software to add additional columns to this file as well, which e2gmm will allow you to use for classification, plotting, etc. or you can use other multidimensional visualization software to look at the contents of the file. For example, if you wanted to do a 5-D PCA instead of the basic 2-D PCA, or wanted to try some other more time-intensive algorithm you can just add any new results at the end, after the "latent" columns. I think the wiki actually discusses one way to do this?

If I see significant motion in a specific area, would make sense to use the Mask option?

Using multiple masks is generally intended for the situation where there is large motion in one domain and smaller, but important motions in other regions. If you don't use any masks, most of the latent space will likely be used to characterize the single large motion, and/or you may find a single parameter incorporating aspects of the motion in two different domains. By specifying a list of masks, you can insure a mapping of a region to a specific latent subspace. This also allows you to look for correlations in the motions between masked regions. Having said that, this is a fairly new feature still, so I'm not sure we understand ourselves yet the best way to utilize it. I have a student working on a system with pretty complex motion right now trying to answer some of these questions.

When do you recommend increasing the number of latent dim, or model reg?

The smaller the number of latent dimensions you can use and still characterize the motions you're interested in, the better. Note that if you use multiple masks, that requires having N latent dim for each mask. ie - if you have 3 masks, latent dim would be a multiple of 4 (there is another region defined by anything not inside a mask), with 8 effectively being the smallest practical value.

As a side (less important) question,
Is it possible to set that the default for scatter plot is Symbol instead of Contour (or at least it doesn't switch back to contour every time a new run is selected)?

That's automatically determined based on the number of points right now, and I don't think it has a strategy to take into account the redundancies in tomographic particles. I find this behavior a bit annoying as well, but it hasn't irritated me enough yet to find time to add a GUI widget to let you control it.

If you want to hack this behavior in your own copy of EMAN2 (which will get overwritten the next time you update, of course), the line responsible for this decision is in a file called emplot2d.py. At line 322 you should see:

if len(data)<4 and (diff(self.data[key][self.axes[key][0]])>=0).all() : doline,linetype,dosym,symtype,docontour=1,0,0,0,0

elif len(self.data[key][0])>10000 :

doline,linetype,dosym,symtype,docontour=0,0,0,0,1

contoursteps=min(max(int(sqrt(len(self.data[key][0])//100)),15),100)

else : doline,linetype,dosym,symtype,docontour=0,0,1,0,0

If the first condition isn't met, and the number of points is >10,000 then it defaults to a contour plot with no lines or symbols. You could easily alter the threshold or the logic to your own personal tastes. Just be aware that changes there will propagate to any other EMAN2 programs using 2-D plots :^)

To view this discussion on the web visithttps://groups.google.com/d/msgid/eman2/BY5PR02MB6657835F257C95B59D0CCC35DFF2A%40BY5PR02MB6657.namprd02.prod.outlook.com.

--

--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BY5PR02MB665719BE24E946EBDFB73166DF732%40BY5PR02MB6657.namprd02.prod.outlook.com.

Reply all

Reply to author

Forward