problem with Particle Orientation Refinement

300 views
Skip to first unread message

Andrea Dallapé

unread,
May 14, 2024, 12:36:32 PM5/14/24
to EMAN2
Hi!

I want to try GMM Particle Orientation Refinement. I tried the tutorial some months ago and it worked fine, but now I'm completely lost and I would like to ask for your help.

First of all, my version is the continuous one:

EMAN 2.99.52 ( GITHUB: 2023-05-03 19:46 - commit: c218addd4a0bb0223d3be501adde8eaceb6f0433 )

Your EMAN2 is running on: Mac OS 14.0 x86_64

Your Python version is: 3.9.16


I have a refinement from Relion (version 4). I moved the run_data.star and the Extract folder containing the .mrcs files from the workstation to another computer (Mac M2).

I tried the command to convert the relion refinement in EMAN:
e2convertrelion.py run_data.star --voltage 300 --cs 2.7 --apix 0.832 --amp 10 --skipheader 26 --onestack particles/particles_all.hdf --make3d --sym c1

At the beginning it complained about make3d and sym as arguments (I had the same problems months ago and I solved installing the continuous build, but the problem reappeared), so I run the command without the two arguments.
The error I get is :

something wrong with Extract/job189/Rawdata/FoilHole_13689793_Data_13692596_13692598_20240315_110803_fractions.mrcs image 0


I tried to remove that .mrcs file but another one appear at the next run.

What am I doing wrong?
Could you help me, please?

Thank you for your kind attention.
Best,
Andrea



Muyuan Chen

unread,
May 14, 2024, 12:44:37 PM5/14/24
to em...@googlegroups.com
The continuous build was from a year ago so you may still want to update it.

Are you sure the image it mentioned in error exist in that directory? It is often just wrong folder names, of mrc vs mrcs.

Muyuan

On May 14, 2024, at 9:36 AM, Andrea Dallapé <dallape...@gmail.com> wrote:

Hi!
--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/ccf98373-2c06-4c0f-b7e2-2861711b63ccn%40googlegroups.com.

Andrea Dallapé

unread,
May 14, 2024, 7:11:04 PM5/14/24
to em...@googlegroups.com
Hi Muyuan,

Thank you for your prompt response.

I’ll try to update the build, I’ll let you know.

Yes, first thing I checked was proper names and paths, I’m quite sure about that.

Thank you,
Andrea

Muyuan Chen

unread,
May 14, 2024, 7:16:47 PM5/14/24
to em...@googlegroups.com
Also you can run some eman programs on that file to make sure it is not corrupted. Something like:
e2proc2d.py xxx/xxx.mrcs test.hdf
Maybe it will give more meaningful error messages.

On May 14, 2024, at 4:11 PM, Andrea Dallapé <dallape...@gmail.com> wrote:



Andrea Dallapé

unread,
May 16, 2024, 6:46:37 AM5/16/24
to em...@googlegroups.com
Hi Muyuan,

thank you for the suggestion. I tried as you said:

e2proc2d.py bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs test.hdf

and I got this error:
5 images, processing 0-4 stepping by 1
Traceback (most recent call last):
  File "/Users/mac2024/eman2-sphire-sparx/bin/e2proc2d.py", line 1173, in <module>
    main()
  File "/Users/mac2024/eman2-sphire-sparx/bin/e2proc2d.py", line 608, in main
    d.read_image(infile, i, False, None, False, img_type)
  File "/Users/mac2024/eman2-sphire-sparx/lib/python3.9/site-packages/EMAN2.py", line 2953, in db_read_image
    return self.read_image_c(fsp, *parms, **kparms)
RuntimeError

Does this make sense?
Thank you,
Andrea

Steve Ludtke

unread,
May 16, 2024, 6:57:37 AM5/16/24
to em...@googlegroups.com
Run 
e2iminfo.py -as bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs
and
ls -al bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs

and post the output for us to look at. My initial suspicion would be a damaged/incomplete image file.

Andrea Dallapé

unread,
May 16, 2024, 7:01:15 AM5/16/24
to em...@googlegroups.com
Hi Steve,

thank you for your reply.

e2iminfo.py -as bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs

bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs 5 images in MRC format 420 x 420
5 total images

ls -al bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs

-rw-r--r--@ 1 mac2024  staff  1765024 May 14 19:06 bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs

Thank you for all your help.
Best,
Andrea


Steve Ludtke

unread,
May 16, 2024, 7:06:36 AM5/16/24
to em...@googlegroups.com
That was the full output from e2iminfo,py? There should have been a line for each of the 5 image. The file size looks correct if the images are 16 bit integers.

Andrea Dallapé

unread,
May 16, 2024, 7:08:22 AM5/16/24
to em...@googlegroups.com
Yes, it is the full output. I just run it again to be sure, but it is.

Best,
Andrea

Steve Ludtke

unread,
May 16, 2024, 7:10:08 AM5/16/24
to em...@googlegroups.com
That is quite odd. It should have produced a line with image statistics for each image or an error message.  Can you post the full output of e2version.py?


Andrea Dallapé

unread,
May 16, 2024, 7:15:01 AM5/16/24
to em...@googlegroups.com
Here the output of e2version.py

EMAN 2.99.52 ( GITHUB: 2023-05-03 19:46 - commit: c218addd4a0bb0223d3be501adde8eaceb6f0433 )
Your EMAN2 is running on: Mac OS 14.0 x86_64
Your Python version is: 3.9.16

After Muyuan's suggestion, I deleted the old eman folder (and the part related from the .zshrc file) and re-installed the continuos build from https://cryoem.bcm.edu/cryoem/downloads/view_eman2_version/25

Andrea

Steve Ludtke

unread,
May 16, 2024, 7:26:52 AM5/16/24
to em...@googlegroups.com
Hmm… it seems like the continuous build you’re getting is very old for some reason, and may also be partially broken (? I’m not sure why you aren’t getting output from e2iminfo). If you want to play with Muyuan’s recent developments you will definitely want to have an up to date version. We will investigate what’s going on with the continuous build, but you may want to consider just installing from source:

since this works directly from the current files on GitHub. It isn’t difficult, even if you aren’t a developer, and makes it very easy to quickly update or switch to the version from any point in history.

Andrea Dallapé

unread,
May 16, 2024, 7:31:44 AM5/16/24
to em...@googlegroups.com
I had the same impression that something was wrong with the continuos build, thank you for redirecting me to the source installation page.
I will try and let you know.

Thank you for all the help,
Andrea

Andrea Dallapé

unread,
May 17, 2024, 9:56:25 AM5/17/24
to em...@googlegroups.com
Hi Steve,

I installed Eman2 from source as you suggested, on a workstation (Ubuntu 20.04.6 LTS).

e2version.py output is this:
EMAN 2.99.55 ( GITHUB: 2024-05-17 15:15 - commit: 2aa2b781e )
Your EMAN2 is running on: Linux-5.15.0-107-generic-x86_64-with-glibc2.31 5.15.0-107-generic
Your Python version is: 3.11.9

That seems more reasonable than the previous one.

With the e2convertrelion command I got the same error:

e2convertrelion.py 213_run_data.star --voltage 300 --cs 2.7 --apix 0.832 --amp 10 --skipheader 26 --onestack particles/particles_all.hdf --make3d --sym c1

['_rlnCoordinateX #1 ', '_rlnCoordinateY #2 ', '_rlnImageName #3 ', '_rlnMicrographName #4 ', '_rlnOpticsGroup #5 ', '_rlnCtfMaxResolution #6 ', '_rlnCtfFigureOfMerit #7 ', '_rlnDefocusU #8 ', '_rlnDefocusV #9 ', '_rlnDefocusAngle #10 ', '_rlnCtfBfactor #11 ', '_rlnCtfScalefactor #12 ', '_rlnPhaseShift #13 ', '_rlnAngleRot #14 ', '_rlnAngleTilt #15 ', '_rlnAnglePsi #16 ', '_rlnOriginXAngst #17 ', '_rlnOriginYAngst #18 ', '_rlnClassNumber #19 ', '_rlnNormCorrection #20 ', '_rlnLogLikeliContribution #21 ', '_rlnMaxValueProbDistribution #22 ', '_rlnNrOfSignificantSamples #23 ', '_rlnGroupNumber #24 ', '_rlnRandomSubset #25 ']
key _rlnOriginX does not exist
key _rlnOriginY does not exist
##################
found keys:
name _rlnImageName 2
dfu _rlnDefocusU 7
dfv _rlnDefocusV 8
dfang _rlnDefocusAngle 9
psi _rlnAnglePsi 15
tilt _rlnAngleTilt 14
rot _rlnAngleRot 13
txa _rlnOriginXAngst 16
tya _rlnOriginYAngst 17
class _rlnRandomSubset 24
translation in angstrom
something wrong with Extract/job189/Rawdata/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs image 0


I hence copied the mrcs in a folder (bad) and run e2iminfo.py

e2iminfo.py -as bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs

bad/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs 5 images in MRC format 420 x 420
Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2iminfo.py", line 231, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2iminfo.py", line 179, in main
    if options.stat : d.read_image(imagefile,i)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2961, in db_read_image
    return self.read_image_c(fsp, *parms, **kparms)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte

Does this error make more sense?
I tried to completely remove the mrcs and run e2convertrelion again, and I got
Extract/job189/Rawdata/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs does not exist
five times (that I completely expected), but then 
something wrong with Extract/job189/Rawdata/FoilHole_13689792_Data_13692596_13692598_20240315_110558_fractions.mrcs image 0

so other mrcs files seem to have the same problem. I suspect something broke when I copied the file.
I'm copying again the Extract folder and I will try again, it seems the only thing I can do.

Best,
Andrea

Ludtke, Steven J.

unread,
May 17, 2024, 10:02:52 AM5/17/24
to em...@googlegroups.com
Ok, even if the problem isn't fully resolved yet, you are definitely better off using the current version. Would you be willing to directly email me 
Extract/job189/Rawdata/FoilHole_13689792_Data_13688830_13688832_20240315_110651_fractions.mrcs
?

It's only 1.7M, so should email properly. I can use some other tools to try and figure out why it's causing issues. 


---
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry 
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center


Ludtke, Steven J.

unread,
May 17, 2024, 10:45:34 AM5/17/24
to em...@googlegroups.com, Sjors Scheres
Ahh, ok, that explains it. This is a "mode 12" MRC file, which is not part of the official MRC specification. Having never seen this before, I did a little digging, and apparently the Relion developers decided the MRC file format wasn't #$@#$ up enough already, and implemented a new MRC mode without any community discussion (as far as I can see). It is, at least, vaguely documented.

From the Relion documentation:

The mode 12 is RELION’s extension proposed in 2021 to save disk space. It uses IEEE754, binary16 encoding. Motion Correction (with RELION’s own implementation, not UCSF MotionCor2), Extract, Polish and Subtract jobs can write them with the --float16 option. All RELION programs can read mode 12 MRC files. External programs that use recent versions of mrcfile Python package can read them but other programs might not be able to read them.

As of 2021 March,

  • GCTF does not support float16.

  • CTFFIND does not support float16 but RELION’s motion correction job always writes power spectra in float32, which can be used in CTFFIND.

  • Topaz does not support float16 but RELION’s wrapper performs preprocessing from float16 to float32. Thus, as long as you use Topaz via RELION’s wrapper, training and picking work fine.

We can implement support for this, but it doesn't exist right now.


---
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry 
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

Andrea Dallapé

unread,
May 17, 2024, 10:54:46 AM5/17/24
to em...@googlegroups.com, Sjors Scheres
Thank you so much Steve for going through this.
I hope you can implement the support in a further version of eman2. For now, apparently, I will have to postpone Particle Orientation Refinement and play more with the tutorials data.

@Sjors: if I put "No" on "Write output in float16" during Relion's motion correction job, can I avoid this problem? Is the disk space the only thing affected or are there other things I should keep in mind in the next steps?

Thank you!
Andrea

Ludtke, Steven J.

unread,
May 17, 2024, 1:29:04 PM5/17/24
to em...@googlegroups.com
We'll see if we can get this implemented pretty quickly. I had to double check something, but it doesn't seem like it will be a major effort to support.

---
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry 
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

Ludtke, Steven J.

unread,
May 17, 2024, 5:15:35 PM5/17/24
to em...@googlegroups.com, Sjors Scheres
Ok,
read support for mode 12 float16 MRC images has been added to EMAN2 on GitHub. If you have a source installation, you should be able to just do a "git pull" and "make install" (instructions page has more details if you aren't familiar with the process). You will need to install from source if you want this new capability for now. We have no immediate plans to support writing in this mode, since we already have a far more efficient storage scheme for reduced precision (PMC9645247).

This was only tested on the one sample image you (Andrea) sent this morning, but appears to be working correctly. Please report any problems you encounter.

cheers

---
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry 
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

Andrea Dallapé

unread,
May 17, 2024, 5:18:14 PM5/17/24
to em...@googlegroups.com, Sjors Scheres
Wow Steve! That’s amazing, thank you so much!

I will try in the next few days and I will let you know.
Thank you again.
Best,
Andrea

Andrea Dallapé

unread,
May 18, 2024, 6:39:54 AM5/18/24
to em...@googlegroups.com, Sjors Scheres
Hi Steve,

I just pulled the new version from Github and I'm running e2convertrelion.
The "something wrong with Extract/job189/Rawdata/*.mrcs image 0" error did not appear and it is running smoothly.

Thank you for the prompt add to EMAN2.
Best,
Andrea


Andrea Dallapé

unread,
May 18, 2024, 1:50:44 PM5/18/24
to em...@googlegroups.com
Hi Steve,

I was too excited to try the tool.
e2convertrelion worked perfectly, with no errors and the the reconstruction is good.

I hence run 
e2gmm_refine_iter.py r3d_00/threed_00.hdf --startres 3.3 --initpts r3d_00/threed_00_seg.pdb --sym c1

But it finished with an error:
FileNotFoundError: gmm_01/fsc_maskedtight_01.txt not found.

Before that, there were several warnings and error, like:

/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])

or

WARNING:tensorflow:You are casting an input of type complex64 to an incompatible dtype float32.  This will discard the imaginary part and may not be what you intended.

I send you a file with the complete run because the errors are very long.
If you could help me once again, I would be grateful.

Best,
Andrea
error_gmm.rtf

Muyuan Chen

unread,
May 18, 2024, 2:42:33 PM5/18/24
to em...@googlegroups.com
Your GPU is out of memory…

OOM when allocating tensor with shape[16,20000,212] and type complex64 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node gradients/loop_body/MatMul_1/pfor/MatMul_grad/MatMul_1}}]]

Make sure there are no other jobs running on that gpu. It looks like the size of particles shouldn’t be issue with a new enough gpu, but if this issue persists, maybe specifying a smaller batch size would help. 

Muyuan

Andrea Dallapé

unread,
May 18, 2024, 3:04:04 PM5/18/24
to em...@googlegroups.com
HI Muyuan,

thank you for your reply.
I have two GTX1080 Ti, so not new enough. Nothing else is running at the moment.
Is there an option to reduce the batch size, or do I have to re-extract with a smaller box size?

Thank you for your help,
Andrea

Muyuan Chen

unread,
May 18, 2024, 3:15:52 PM5/18/24
to em...@googlegroups.com
There is a —batchsize option for e2gmm_refine_iter. Default is 16. Maybe try 8 first? 

This is not the particle image size, but number of particles processed at the same time by GPU. There is also a —chunksize option that controls CPU memory usage. Check help for details.

Muyuan

On May 18, 2024, at 12:04 PM, Andrea Dallapé <dallape...@gmail.com> wrote:



Andrea Dallapé

unread,
May 18, 2024, 3:48:54 PM5/18/24
to em...@googlegroups.com
Thank you Muyuan,

Trying with --batchsize 8.

Regarding the warnings:
WARNING:tensorflow:You are casting an input of type complex64 to an incompatible dtype float32.  This will discard the imaginary part and may not be what you intended.
and
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)

should I worry or are they related to GPU out of memory?

Best,
Andrea

Muyuan Chen

unread,
May 18, 2024, 4:21:20 PM5/18/24
to em...@googlegroups.com
Don’t worry about the warnings. They shouldn’t matter much in most cases…

On May 18, 2024, at 12:48 PM, Andrea Dallapé <dallape...@gmail.com> wrote:



Andrea Dallapé

unread,
May 20, 2024, 11:47:08 AM5/20/24
to em...@googlegroups.com
Hi Steve, hi Muyuan,

I need to ask you for other details and help.
I want now to run the focus refinement. I have to be honest, but the approach with e2filtertool is not trivial. I preferred to use a mask created with ChimeraX/molmap.

I used the command 
e2proc3d mask.mrc mask.hdf:10:f

to convert the .mrc in .hdf. Can I use this mask for the focus refinement?

Thank you for your help,
Andrea

Muyuan Chen

unread,
May 20, 2024, 12:22:56 PM5/20/24
to em...@googlegroups.com
You don’t have to convert. Mrc masks works fine, as long as you are sure that they overlap with the density map in the eman2 browser.

On May 20, 2024, at 8:47 AM, Andrea Dallapé <dallape...@gmail.com> wrote:



Andrea Dallapé

unread,
May 21, 2024, 7:51:04 AM5/21/24
to em...@googlegroups.com
Hi Muyuan,

thank you for your reply.
I was looking at the result of the global refinement, before the focused refinement, and something is strange. The result is not as I expected.

After the e2convertrelion, the reconstruction (r3d_00/threed_raw.hdf) is good. But after the global refinement, the resulting map (gmm_00/threed_5.hdf) is very low resolution (even if the fsc_unmasked, masked, maskedtight are ~3.28, ~2.93, and ~2.69, respectively). The gmm_00/threed_raw_even/odd.hdf maps are (noisy of course) but much better and similar to the r3d_00/threed_raw.hdf.

Am I doing something wrong? 

For e2guess and e2_refine_iter I used r3d_00/threed_00.hdf, as explained in the tutorial. The map is low res. Should I use r3d_00/threed_raw.hdf instead?

Sorry for these misunderstandings and thank you for your patience.
Best,
Andrea

Muyuan Chen

unread,
May 21, 2024, 12:23:20 PM5/21/24
to em...@googlegroups.com
I am not entirely sure this is the case, since you don’t show any images. But if you get a reasonable reported resolution, but features look blurry, it is probably just a sharpening issue. One easy thing to try is 
e2spt_structfac.py —even gmm_00/threed_raw_even.hdf —sharp —res 2

It should generate a threed_rawsf.hdf file with sharper features. And a sf.txt file that will be used for future sharpening. If you want to update the final reconstruction of the gmm refinement, you will need to copy threed_raw_even/odd to threed_05_even and repeat the last e2refine_postprocess command.

You can also use any other map, like one simulated from pdb to compute the structure factor for sharpening. Just use e2proc3d.py —calcsf sf.txt

Muyuan

Andrea Dallapé

unread,
May 21, 2024, 1:15:01 PM5/21/24
to em...@googlegroups.com
Yes, sorry, here some images to explain better. 

Threed_raw is from r3d after e2convertrelion.
Threed_05 and threed_raw_even/odd after e2gmm_refine_iter.
Threed_raw_even/odd look good, so maybe I'm completely misunderstood. Should threed_05 be a reconstruction of even/odd?

Andrea


On Tue, May 21, 2024 at 6:43 PM Andrea Dallapé <dallape...@gmail.com> wrote:
Yes, sorry, here some images to explain better. 

Threed_raw and threed_raws from r3d after e2convertrelion.
Threed_05 and threed_raw_even/odd after e2gmm_refine_iter.
Threed_raw_even/odd look good, so maybe I'm completely misunderstood. Should threed_05 be a reconstruction of even/odd?

Andrea

threed_raw.png
threed_05.png
threed_raw_even:odd.png

Muyuan Chen

unread,
May 21, 2024, 1:23:11 PM5/21/24
to em...@googlegroups.com
Yes it is likely a sharpening issue. The suggestion in my last email should work. Threed_05 is essentially a filtered average of even/odd. It is just over-filtered due to a wrong estimation of the structure factor (Fourier radial amplitude profile). The default version in Eman is generated from a gallery of proteins, and it sometimes can be sub-optimal… it shouldn’t affect anything other than visualization though. 

On May 21, 2024, at 10:15 AM, Andrea Dallapé <dallape...@gmail.com> wrote:


To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CALnnSmdjZMKxkmxG81U6uKx%2B%2BD2wdP0GWPUauovxLWVTc%2BhT0A%40mail.gmail.com.
<threed_raw.png>
<threed_05.png>
<threed_raw_even:odd.png>

Andrea Dallapé

unread,
May 21, 2024, 1:37:09 PM5/21/24
to em...@googlegroups.com
OK, I did the command, thank you!

 will need to copy threed_raw_even/odd to threed_05_even and repeat the last e2refine_postprocess command.
I didn't understand this point, could you clarify, please?

Thank you so much,
Andrea


Muyuan Chen

unread,
May 21, 2024, 2:18:14 PM5/21/24
to em...@googlegroups.com
Do you see reasonable features in threed_rawsf.hdf?

Essentially, 
cp gmm_xx/threed_raw_even.hdf gmm_xx/threed_05_even.hdf
cp gmm_xx/threed_raw_odd.hdf gmm_xx/threed_05_odd.hdf
e2refine_postprocess.py —even  gmm_xx/threed_05_even.hdf —setsf sf.txt —tophat localwiener ….

You can find the same e2refine_postprocess command used by the refinement program, by "grep refine_post .eman2log.txt”

Muyuan

Andrea Dallapé

unread,
May 21, 2024, 2:42:06 PM5/21/24
to em...@googlegroups.com
Yes, definitely, e2spt_structfact improved drastically the quality of the map.

I did the command you wrote,

e2refine_postprocess.py --even gmm_00/threed_05_even.hdf --res 2.3124714152838948 --tophat localwiener --sym c1 --thread 32 --setsf sf.txt --align

but I get this error:
Use iteration number 5.
Tue May 21 20:27:13 2024: e2proc3d.py gmm_00/threed_05_even.hdf gmm_00/tmp0.hdf --process=filter.lowpass.gauss:cutoff_freq=0.11499999999999999
Tue May 21 20:27:16 2024: e2proc3d.py gmm_00/threed_05_odd.hdf gmm_00/tmp1.hdf --process=filter.lowpass.gauss:cutoff_freq=0.11499999999999999 --alignref=gmm_00/tmp0.hdf --align=refine_3d
refine_3d  didn't find a starting transform, using identity matrix.

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 853, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 610, in main
    data=data.align(alignername,alignref, param_dict)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Andrea

KeyboardInterruptthreed_rawsf.png

Muyuan Chen

unread,
May 21, 2024, 2:55:17 PM5/21/24
to em...@googlegroups.com
This one is a little strange, especially the error messages are somewhat broken at the c++ side. Removing —align from the command will probably skip the error. Simply deleting all tmp* file in the folder and run the same command again might work too.

On May 21, 2024, at 11:42 AM, Andrea Dallapé <dallape...@gmail.com> wrote:


Yes, definitely, e2spt_structfact improved drastically the quality of the map.

I did the command you wrote,

e2refine_postprocess.py --even gmm_00/threed_05_even.hdf --res 2.3124714152838948 --tophat localwiener --sym c1 --thread 32 --setsf sf.txt --align

but I get this error:
Use iteration number 5.
Tue May 21 20:27:13 2024: e2proc3d.py gmm_00/threed_05_even.hdf gmm_00/tmp0.hdf --process=filter.lowpass.gauss:cutoff_freq=0.11499999999999999
Tue May 21 20:27:16 2024: e2proc3d.py gmm_00/threed_05_odd.hdf gmm_00/tmp1.hdf --process=filter.lowpass.gauss:cutoff_freq=0.11499999999999999 --alignref=gmm_00/tmp0.hdf --align=refine_3d
refine_3d  didn't find a starting transform, using identity matrix.
Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 853, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 610, in main
    data=data.align(alignername,alignref, param_dict)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Andrea

KeyboardInterrupt
<threed_rawsf.png>

Andrea Dallapé

unread,
May 22, 2024, 4:23:40 AM5/22/24
to em...@googlegroups.com
Hi Muyuan,

the command worked (without removing --align or deleting the *tmp files), maybe the workstation was doing something else and it was overwhelmed?

I got this map, with a lot of artifacts around...
Do you have any suggestions?

Andrea
Screenshot 2024-05-22 at 10.21.51.png

Muyuan Chen

unread,
May 22, 2024, 11:44:49 AM5/22/24
to em...@googlegroups.com
Looks like masking issues. Maybe you ran the refine_postprocess twice without copy threed_raw to threed_05 again? Although at this point it might be easier to just re-run the same gmm_refine_iter command…

On May 22, 2024, at 1:23 AM, Andrea Dallapé <dallape...@gmail.com> wrote:


Hi Muyuan,

the command worked (without removing --align or deleting the *tmp files), maybe the workstation was doing something else and it was overwhelmed?

I got this map, with a lot of artifacts around...
Do you have any suggestions?

Andrea
<Screenshot 2024-05-22 at 10.21.51.png>

Andrea Dallapé

unread,
May 22, 2024, 3:53:33 PM5/22/24
to em...@googlegroups.com
HI Muyuan,

I'm quite sure I follow correctly your instructions. I will try to run the global refinement from scratch.
The focused refinement gave extremely good results and I there are no error in the reconstruction..

I hence followed the tutorial, continuing with the refinement from a DNN heterogeneity analysis:
e2gmm_heter_refine.py gmm_00/threed_05.hdf --mask maskforEman.mrc --maxres 3.3 --minres 80

gmm_00 being the global refinement.

I got this error:
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_batch.py", line 189, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_batch.py", line 130, in main
    midall.append(np.loadtxt(tmp[2]))
                  ^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 1338, in loadtxt
    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 975, in _read
    fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 193, in open
    return ds.open(path, mode, encoding=encoding, newline=newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 533, in open
    raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: gmm_02/tmp_mid_000.txt not found.
e2gmm_eval.py --pts gmm_02/midall_00_odd.txt --pcaout gmm_02/mid_pca.txt --ptclsin gmm_02/ptcls_00_odd.lst --ptclsout gmm_02/ptcls_odd_cls_00.lst --mode regress --ncls 4 --nptcl 5000 --axis 0

/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_eval.py", line 209, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_eval.py", line 43, in main
    pts=np.loadtxt(options.pts)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 1338, in loadtxt
    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 975, in _read
    fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 193, in open
    return ds.open(path, mode, encoding=encoding, newline=newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 533, in open
    raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: gmm_02/midall_00_odd.txt not found.
e2proc3d.py gmm_02/ptcls_odd_cls_00.hdf gmm_02/ptcls_odd_cls_00.hdf --process filter.lowpass.gauss:cutoff_freq=0.30303030303030304 --process normalize.edgemean
the image is a 3D stack - I will process images from 0 to 3
e2gmm_heter_ali.py --path gmm_02 --mask maskforEman.mrc
2024-05-22 21:27:59.112363: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)

/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
Realigning even...
2024-05-22 21:28:02.556141: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-05-22 21:28:02.556191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10498 MB memory:  -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_heter_ali.py", line 185, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_heter_ali.py", line 94, in main
    midraw=np.loadtxt(f"{path}/midall_{itr:02d}_{eo}.txt")[:,1:]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 1338, in loadtxt
    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 975, in _read
    fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 193, in open
    return ds.open(path, mode, encoding=encoding, newline=newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 533, in open
    raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: gmm_02/midall_00_even.txt not found.
e2refine_postprocess.py --even gmm_02/threed_01_even.hdf --res 3.3 --tophat localwiener --thread 32 --setsf sf.txt --align
Use iteration number 1.

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2refine_postprocess.py", line 576, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2refine_postprocess.py", line 116, in main
    hdr=EMData(evenfile,0,1)
        ^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2888, in db_emd_init
    self.read_image(*parms)
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2940, in db_read_image
    fsp, idxs = parse_infile_arg(fsp)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 1095, in parse_infile_arg
    raise Exception(f"'{fname}' is not an existing regular file!")
Exception: 'gmm_02/threed_01_even.hdf' is not an existing regular file!
e2segment3d.py gmm_02/threed_01.hdf --pdb gmm_02/model_01.pdb --process=segment.kmeans:nseg=20000:thr=3

/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2segment3d.py", line 225, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2segment3d.py", line 80, in main
    volume=EMData(args[0],0)
           ^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2888, in db_emd_init
    self.read_image(*parms)
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2940, in db_read_image
    fsp, idxs = parse_infile_arg(fsp)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 1095, in parse_infile_arg
    raise Exception(f"'{fname}' is not an existing regular file!")
Exception: 'gmm_02/threed_01.hdf' is not an existing regular file!
e2gmm_refine_iter.py gmm_02/threed_01.hdf --startres 3.3 --initpts gmm_02/model_01.pdb --mask maskforEman.mrc --masksigma --path gmm_02 --startiter 2 --niter 5

/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_iter.py", line 187, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_iter.py", line 50, in main
    er=EMData(fname,0, True)
       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2888, in db_emd_init
    self.read_image(*parms)
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2940, in db_read_image
    fsp, idxs = parse_infile_arg(fsp)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 1095, in parse_infile_arg
    raise Exception(f"'{fname}' is not an existing regular file!")
Exception: 'gmm_02/threed_01.hdf' is not an existing regular file!

at the end of batch 2/2 odd.

I saw that the error:
Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 2086, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 2013, in main
    trainset=tf.data.Dataset.from_tensor_slices((allgrds, dcpx[0], dcpx[1], xfsnp))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 830, in from_tensor_slices
    return from_tensor_slices_op._from_tensor_slices(tensors, name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 25, in _from_tensor_slices
    return _TensorSliceDataset(tensors, name=name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/data/ops/from_tensor_slices_op.py", line 33, in __init__
    element = structure.normalize_element(element)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/data/util/structure.py", line 133, in normalize_element
    ops.convert_to_tensor(t, name="component_%d" % i, dtype=dtype))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/profiler/trace.py", line 183, in wrapped
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/framework/ops.py", line 1642, in convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/framework/tensor_conversion_registry.py", line 48, in _default_conversion_function
    return constant_op.constant(value, dtype, name=name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/framework/constant_op.py", line 268, in constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/framework/constant_op.py", line 280, in _constant_impl
    return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/framework/constant_op.py", line 305, in _constant_eager_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/framework/constant_op.py", line 103, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

occurred more than once.
Sorry for continuously bother you, but can you help me?

Andrea

Muyuan Chen

unread,
May 22, 2024, 4:21:04 PM5/22/24
to em...@googlegroups.com
If the focused refinement works, it probably just means something was messed up during the global refinement at some point. Hopefully re-run it will fix it. Again, the sharpening issue should only affect visualization, and won't have much impact in the internal calculations. So even if you just pretend it is good and continue, it won't be too problematic...

As for the heter_refine, this again is the GPU memory issue, since it throws the error message when trying to copy data from your CPU to your GPU. However, the problem here is I hard coded the program to use the default batch size, without providing a --batchsz option for users... I just made a quick little change in Github, so if you pull and install again, you should be able to specify --batchsz=8 or something smaller to get past that. I did not test it since I am running other things, so let me know if it breaks anything...

Note there is some risk using a smaller batch size for the DNN training in heter_refine, since some DNN regularization relies on the statistical power of individual batches. Alternatively, you can also specify a much lower --maxres (maybe 10?) which would also reduce memory usage. This would be the maximum resolution of the DNN heterogeneity analysis and the starting resolution of the subsequent refinement. It is not ideal but the hope is you won't need the DNN heterogeneity analysis if the movement within the system is small. The focus refinement should be able to handle small scale rigid body motion already anyway.

Muyuan


Andrea Dallapé

unread,
May 22, 2024, 4:26:15 PM5/22/24
to em...@googlegroups.com
Yes, I was thinking something like that (but hoping something different). I forgot to change the --batchsize argument. With the workstation I can use now, I had to use --batchsize 4 for both global and focused refinement...
The movement I expected is quite small. I will pull the new github version, thank you for the improvement. I will try --maxres 10.

Thank you for all your help,
Andrea

Andrea Dallapé

unread,
May 25, 2024, 8:54:23 AM5/25/24
to em...@googlegroups.com
Hi Muyuan,

I need your help and suggestions once again.
After the amazing results obtained with the global and focus refinements, I decided to move on another map (more interesting).

on the e2gmm_guess_n part, I have some problems and questions.
FIrst of all, for the --thr (threshold) parameter I used 4 (and --maxres 3.0), as in the tutorial, but how should decide this value to be optimal? And for --startn, I understood that should be more or less the number of atoms (non hydrogen) in the map. From a pdb model I obtained ~60000 atoms, but I used --startn 10000 and let Eman2 decide.
I obtained 20000:
N:    dist
10000 2.7771
...
...
20000 2.1487

After that, the program went OOM:

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 2086, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 1817, in main
    train_decoder(gen_model, trainset, params, options, pts)
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 898, in train_decoder
    imgs_cpx=pts2img(pout, xf)
             ^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:

Detected at node 'loop_body/transpose_1/pfor/Transpose' defined at (most recent call last):

    File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 2086, in <module>
      main()
    File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 1817, in main
      train_decoder(gen_model, trainset, params, options, pts)
    File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 898, in train_decoder
      imgs_cpx=pts2img(pout, xf)
    File "/home/agwilson/miniconda3/envs/eman2/bin/e2gmm_refine_new.py", line 256, in pts2img
      img=tf.vectorized_map(pts2img_one_sig, (pts, angs))
Node: 'loop_body/transpose_1/pfor/Transpose'
OOM when allocating tensor with shape[32,113,20000] and type complex64 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node loop_body/transpose_1/pfor/Transpose}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__forward_pts2img_7043]
e2spa_make3d.py --input r3d_00/threed_00_tmp_model_projections.hdf --output r3d_00/threed_00_gmm.hdf --thread 32

/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2spa_make3d.py", line 449, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2spa_make3d.py", line 63, in main
    tmp=EMData(options.input,0,True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2888, in db_emd_init
    self.read_image(*parms)
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 2940, in db_read_image
    fsp, idxs = parse_infile_arg(fsp)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 1095, in parse_infile_arg
    raise Exception(f"'{fname}' is not an existing regular file!")
Exception: 'r3d_00/threed_00_tmp_model_projections.hdf' is not an existing regular file!
e2proc3d.py r3d_00/threed_00_gmm.hdf r3d_00/threed_00_gmm.hdf --process mask.soft:outer_radius=-16 --matchto r3d_00/threed_00.hdf

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 853, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 297, in main
    else : nimg = EMUtil.get_image_count(infile)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 3248, in db_get_image_count
    return EMUtil.get_image_count_c(fsp)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 1: invalid start byte
e2proc3d.py r3d_00/threed_00_gmm.hdf r3d_00/threed_00_fsc.txt --calcfsc r3d_00/threed_00.hdf

Traceback (most recent call last):
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 853, in <module>
    main()
  File "/home/agwilson/miniconda3/envs/eman2/bin/e2proc3d.py", line 297, in main
    else : nimg = EMUtil.get_image_count(infile)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/miniconda3/envs/eman2/lib/python3.11/site-packages/EMAN2.py", line 3248, in db_get_image_count
    return EMUtil.get_image_count_c(fsp)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 1: invalid continuation byte
final pdb model: r3d_00/threed_00_seg.pdb
final GMM in text file: r3d_00/threed_00_gmm.txt
final GMM in density map: r3d_00/threed_00_gmm.hdf
map-model FSC: r3d_00/threed_00_fsc.txt

and it created only the threed_00_seg.pdb file. Can I try with the global refinement, since it needs the threed_00.hds and threed_00_seg.pdb files?

Iìm a little bit concerned because I was checking the memory usage (with nvidia-smi) and I didn't see any usage of the two GTX 1080 Ti. I suppose the e2gmm does not use multiple GPUs in parallel, but why I don't see it using at least one GPU? Did I miss something during the installation (I followed the install from source guide) or do I have problem with tensorflow?

Thank you for your help,
Andrea

Muyuan Chen

unread,
May 25, 2024, 12:30:37 PM5/25/24
to em...@googlegroups.com
For the guess_n program, at long as you set a isosurface threshold at which all the proteins density of interest are visible, it should be fine. Having 10%-20% more or fewer Gaussian functions generally don’t have much impact on the result. 
There is also a —maxn that controls the largest N to be tested, and the default is 20000. You can change that but ultimately the upper bound is still the GPU memory.

As for the OOM, I would be less useful here. First, if it throws the GPU OOM error, it is certainly using the GPU. Your GPU memory monitor may not capture it, because it might only reach the peak value for a short moment before it crashes. 

The largest one I played with has ~30k atoms (and Gaussian functions). If you have a protein with 60k atoms, I assume it will also need a bigger box size at 3A. Even though you may avoid the OOM here by reducing the batch size, you may still run into problems downstream. 

I think one solution here would be get to focused refinement before using the GMM, so you only need the Gaussian representation of a smaller region of the protein. That said, I only did this maybe once with customized script, and there isn’t an established protocol for this in EMAN2 (only a sub-particle-GMM protocol for SPT). Doing this through particle subtraction in other software before importing to EMAN maybe earlier.

Of course getting a more expensive GPU will probably help. In theory there is some optimization of the program that can be done to reduce memory usage too. I haven’t worked on those large proteins for a while, so I am not very motivated to deal with that at the moment…

Muyuan

On May 25, 2024, at 5:54 AM, Andrea Dallapé <dallape...@gmail.com> wrote:



Andrea Dallapé

unread,
May 25, 2024, 2:57:09 PM5/25/24
to em...@googlegroups.com
Hi Muyuan.

Thank you for your reply.

Yes, you are right, I tried updating the nvidia-smi every second, and it uses all the GPU memory (11 GB!) in one second before it crashes...

Sorry but I have still some doubts. When you refer to the isosurface, you intend the threed_00.hdf file in a software like Chimera/ChimeraX, right? Because in ChimeraX I have a very high number for the threshold (compare to usual maps from other softwares), in the order of 1/1.5 to visualize all the surface I'm interested in. And trying different --thr values, I had the opposite impression: with higher threshold (--thr 8 or 12), the guess finished very early with a very low number of gaussians, with --thr 2 it created much more gaussians, so I'm a little bit confused..

The particle subtracted map approach seems reasonable and interesting. I can use e2convertrelion on the reconstructed volume and the subtracted folder, right? I will try in the next days, thank you for the suggestion.

Yes, I know it would be good to have a better GPU. I have a workstation with a nVidia A5000 (the same you used for the Nature Methods paper, right?), but I'm failing to install EMAN from source because of some python problems. I have to ask a colleague (now on vacation), more expert than me. I hope I can solve in the next days.

Thank you for all your help!
Best,
Andrea

Muyuan Chen

unread,
May 25, 2024, 7:40:35 PM5/25/24
to em...@googlegroups.com
Use show3d from the e2display browser instead of chimera to decide the isosurface threshold. EMAN2 sometimes write compressed images that are not read properly by chimera and you will get oddly high values. 

Yes different threshold values give you different numbers of Gaussian functions. My point is that won’t have much impact on the final resolution of the map. The extra Gaussian would get an amplitude of near zero during the refinement. So in concern of memory usage, choose a high threshold that just covers all regions of interest seems to be a better approach..

Muyuan

On May 25, 2024, at 11:57 AM, Andrea Dallapé <dallape...@gmail.com> wrote:



Steve Ludtke

unread,
May 28, 2024, 8:26:08 AM5/28/24
to em...@googlegroups.com
Just to clarify that point about Chimera. EMAN2 now writes virtually all images with some level of bit compression, which is why file sizes are dramatically lower than in Relion, etc. (see PMC9645247)  The values are stored as integers in the HDF file, which are automatically rescaled when the files are opened in EMAN2. Chimera/X can read EMAN2 style HDF files but doesn’t have support for the rescaling, so the volumes will read correctly, but the isosurface threshold values will be large integers which need to be rescaled if you plan to use them in EMAN2. You can look at the header of the image if you want to figure out the translation:

e2iminfo.py -H r3d_01/threed_03.hdf

...

stored_renderbits: 12

stored_rendermax: 36.565887451171875

stored_rendermin: -22.33937454223632



0->rendermin
2^bits-1-> rendermax

Detailed explanation in the paper referenced above. Muyuan’s suggestion of just identifying isosurface levels within EMAN2 is probably the easiest solution. :^)

Andrea Dallapé

unread,
May 28, 2024, 8:31:07 AM5/28/24
to em...@googlegroups.com
Hi Steve,

thank you for the clarification. Yes, I noted that ChimeraX's thresholds are completely insane. Muyuan's suggestion worked perfectly. I was not aware of this incongruence and I was trying the e2guess with ChimeraX values, without success. Little by little I'm understanding how things work, thank you.

Thank you for all the help and explanations.
Andrea 

Andrea Dallapé

unread,
Jul 16, 2024, 1:49:31 PM7/16/24
to em...@googlegroups.com
Hi!

I'm sorry but I would like to ask for your help once again.

e2gmm_heter_refine (DNN heterogeneity analysis) returned an expected and interesting movement around the part of the map masked. How can I obtain the particles of one of the four maps to run a refinement? 

Thank you so much for all your help,
Andrea

Muyuan Chen

unread,
Jul 16, 2024, 2:19:04 PM7/16/24
to em...@googlegroups.com
Simply, you can run 
e2proclst.py gmm_xx/ptcls_cls_00.lst --create gmm_xx/ptcls_cls_00_id_0.lst --getclass 0

However, note the gmm_heter_refine program only uses a subset of particles to show you the motion at this point, so there may be fewer particles than you need. This can be adjusted by re-running the e2gmm_eval program that ran by gmm_heter_refine. You should find the command in the terminal output or .eman2log.txt. Increase --nptcl to the largest number that you can still see the movement, so you don't waste your particles. Using the e2gmm_eval program you can also look at different eigen-vectors in case there are interesting secondary movements.

Also gmm_heter_refine does the even/odd half separately. The first frame of eigen-movement trajectory in the even half may not be the same as the odd half. So if you merge the two and run gold standard refinement you may get worse result. I ran into a similar situation a few weeks ago so I added a "nogoldstandard" mode to this. (also I forgot to push the change until a moment ago). I don't know if this is a good idea either since there would be more model bias introduced. Maybe it takes some manual work to find the frame in the odd trajectory that matches the frame in even?
Muyuan

Andrea Dallapé

unread,
Jul 16, 2024, 2:29:03 PM7/16/24
to em...@googlegroups.com
Hi Muyuan,

thank you for your prompt reply and help.

e2proclst.py gmm_xx/ptcls_cls_00.lst --create gmm_xx/ptcls_cls_00_id_0.lst --getclass 0

I suppose that --getclass in this case is the class. Regarding this, is there a way to increase the heterogeneity analysis number of classes to more than 4?


note the gmm_heter_refine program only uses a subset of particles to show you the motion at this point, so there may be fewer particles than you need

I was supposing something like that. I'm running e2spa_trajfromrefine.py and I noticed that every class has the same number of particles and I was a little bit concerned. I will try as you suggested, thank you.


Also gmm_heter_refine does the even/odd half separately. The first frame of eigen-movement trajectory in the even half may not be the same as the odd half

Yep, I saw some subtle differences in ChimeraX... if the difference is small (smaller than the difference between classes) do you think it could be a problem?


Thank you so much.
Andrea




Muyuan Chen

unread,
Jul 16, 2024, 2:37:23 PM7/16/24
to em...@googlegroups.com
Specifying a larger --ncls in e2gmm_eval will give you more classes. Note the neighboring classes can overlap too, especially when you have a small dataset and ask for a large --nptcl. 
It should be fine, at least better than not refining them I assume. I don't have a good way to align the conformational spaces yet so there would always be some differences. I might work on that sometime later...
Muyuan

Andrea Dallapé

unread,
Jul 16, 2024, 3:16:07 PM7/16/24
to em...@googlegroups.com
Sorry Muyuan,

could you help me with e2gmm_eval? Unfortunately, e2spa_trajfrommotion overwrote the .emanlog.txt and I'm not able to find the command in the terminal during the heter_refine. I tried with -h but I didn't understand how to use it...

Andrea

Muyuan Chen

unread,
Jul 16, 2024, 3:19:16 PM7/16/24
to em...@googlegroups.com
If you run 
grep gmm_eval .eman2log.txt
Do you get the e2gmm_eval command? In principle the programs won't overwrite the log file, only appending to it. 

Andrea Dallapé

unread,
Jul 16, 2024, 3:25:42 PM7/16/24
to em...@googlegroups.com
I'm getting this error:

/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:500: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  setattr(self, word, getattr(machar, word).flat[0])
/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.

  return self._float_to_str(self.smallest_subnormal)
Traceback (most recent call last):
  File "/home/agwilson/Software/miniconda3/envs/eman2/bin/e2gmm_eval.py", line 209, in <module>
    main()
  File "/home/agwilson/Software/miniconda3/envs/eman2/bin/e2gmm_eval.py", line 43, in main
    pts=np.loadtxt(options.pts)
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 1338, in loadtxt

    arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/npyio.py", line 975, in _read

    fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 193, in open

    return ds.open(path, mode, encoding=encoding, newline=newline)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agwilson/Software/miniconda3/envs/eman2/lib/python3.11/site-packages/numpy/lib/_datasource.py", line 533, in open

    raise FileNotFoundError(f"{path} not found.")
FileNotFoundError:  not found.

even if .eman2log.txt is present. If I open the file with a text editor I can see only the last trajfromrefine command

Muyuan Chen

unread,
Jul 16, 2024, 3:31:37 PM7/16/24
to em...@googlegroups.com
The "grep" command is not a part of EMAN, it is just a basic linux command that search a file. It should not produce any EMAN errors I think... It is also quite strange that the .eman2log.txt file got emptied. Make sure you are finding that in the same project file that you run the previous commands.
Can you find the e2gmm_heter_refine command you ran then? That should be in the 0_xxx.json file in the corresponding gmm_xx folder I assume? 

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

Andrea Dallapé

unread,
Jul 16, 2024, 3:36:29 PM7/16/24
to em...@googlegroups.com
Yes, the 0_gmm_heter_params.json is present.

Andrea

Muyuan Chen

unread,
Jul 16, 2024, 3:39:37 PM7/16/24
to em...@googlegroups.com
Can you paste the command in the json file here? I can only figure out the gmm_eval command from the gmm_heter_refine command you run...

Andrea Dallapé

unread,
Jul 16, 2024, 3:42:39 PM7/16/24
to em...@googlegroups.com
Yep:

{

"batchsz": 16,

"cmd": "/home/agwilson/Software/miniconda3/envs/eman2/bin/e2gmm_heter_refine.py gmm_01/threed_05.hdf --mask mask_EMAN_focus.mrc --maxres 5 --minres 50",

"expandsym": null,

"help_to_html": false,

"mask": "mask_EMAN_focus.mrc",

"maxres": 5.0,

"minres": 50.0,

"ngauss": 8000,

"niter": 5,

"path": "gmm_04",

"positionalargs": ["gmm_01/threed_05.hdf"],

"remodel": false,

"rigidbody": false

}


Muyuan Chen

unread,
Jul 16, 2024, 3:51:48 PM7/16/24
to em...@googlegroups.com
Should be something like this since the options you use are mostly default. 
e2gmm_eval.py --pts gmm_04/midall_00_even.txt --pcaout gmm_04/mid_pca.txt --ptclsin gmm_04/ptcls_00_even.lst --ptclsout gmm_04/ptcls_even_cls_00.lst --mode regress --ncls 4 --nptcl 8000 --axis 0

change even to odd for the different half set, change --ncls for more classes, change --nptcls to use more particles, and change --axis to explore other eigen-motions. Remember to backup your existing files or chane --ptclsout to another name. You may also need to filter the output maps to the target resolution with this afterwards.
e2proc3d.py gmm_04/ptcls_even_cls_00.hdf gmm_04/ptcls_even_cls_00.hdf --process filter.lowpass.gauss:cutoff_freq=0.2 --process normalize.edgemean


Andrea Dallapé

unread,
Jul 17, 2024, 7:58:55 AM7/17/24
to em...@googlegroups.com
Ok, thank you. I’m trying as you suggested.

I was not aware I could run egmm_heter_refine and only later modify the number of classes, very good to know.

So if I have 130000 particles, makes sense to use —nptcl 65000 (divide into even/odd)?

Makes sense to try different axis of eigen-vector? I never understood how this modify the analysis. I can see that it “changes” the axis of the movement, but heter_refine already gave me the movement of two ligands connected and the movements are not on the same axis…

Andrea

Muyuan Chen

unread,
Jul 17, 2024, 1:08:34 PM7/17/24
to em...@googlegroups.com
You need to play with different particle numbers yourself. With a small --nptcl, obviously you get worse SNR, but the population would be homogeneous. With more particles per class, each class may become identical and the confirmation information is lost.

The different axis does not modify the analysis, but shows you other modes of movement. It is not a real space movement axis. Consider a system with two degrees of freedom (left/right arm move, or arm and hand move), and all possible poses can be decomposed to a 2D space. Each axis here would describe one movement mode. 

Muyuan

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

Andrea Dallapé

unread,
Jul 19, 2024, 5:46:03 AM7/19/24
to em...@googlegroups.com
Sorry, this is unclear to me. How the number of particles is handled by e2gmm_eval? If I run with  --nptcls 10000 and --ncls 5, that 10000 particles are "divided" into the 5 classes, right? Even and odd, so 20000 particles are used for the analysis. But how can subsequently run the command on the other particles (let's say 100k particles total) to divide the rest of the particles in the classes? Or there is a way to put all the other particles in one of the classes, for similarity? Of course, this could lead to a bias, but at least I could divide all the particles into the most similar class (something equal to a normal 3D classification in other tools).

Second, how can I get back the particles related to a class with an axis movement (e.g. 0) with another one (e.g. 1). I have two ligands connected A and B, A is moving (movement 1) and the movement is propagated to the interaction between A and B (movement 2). I run e2gmm_eval (same parameters, only difference the --axis, 0 or 1). With --axis 0 I can see both the movements (A alone -movement 1-, and A-B -movement 2), with --axis 1 I can see only movement 2, but much better compared to --axis 0. Is there a way to check if really two classes related to A1 and A-B2 are composed of the same particles, explaining a biological movement?

Thank you so much and sorry for all these questions,
Andrea

Muyuan Chen

unread,
Jul 19, 2024, 12:36:29 PM7/19/24
to em...@googlegroups.com
The heterogeneity analysis, e2gmm_heter_refine you ran in the first place, maps particles to a low dimensional space based on their conformations. So each point in that space (midall_xx in that gmm folder) represents one particle. Then e2gmm_eval just takes that information and groups particles based on their conformation, and reconstructs individual maps. So when you specify --ncls=5 and --nptcls=10000, the program pick 5 point along the first eigen-vector of that conformation space with even spacing, then for each point, group 10000 particles closest to that point and make reconstruction from them. So if you have fewer 50000 particles total, the classes would overlap. 

By default, e2gmm_eval decomposes the conformation space by PCA, so the different movement axis should be more or less orthogonal. However, even for a distribution on 2D space, there are infinite ways to decompose it along two axes. For example, if you consider two ligands (A,B) are moving independently, each along a linear trajectory, the optimal way is to decompose the two axis as A and B, but it could also be decomposed as (A+B, A-B) or even (0.2A+0.8B, 0.8A-0.2B), and the two axes would also be orthogonal. Since the program only looks at the conformation space, which type of decomposition you get is based on the population of particles, and is often quite random. To find the exact the two vectors of A and B from the conformation space can be quite tricky, and to determine whether they are independent, it needs some statistical tests...
image.png

Muyuan


--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

Andrea Dallapé

unread,
Jul 19, 2024, 4:06:01 PM7/19/24
to em...@googlegroups.com
Thank you for the explanation.

Now I see the problem of using big nptcls and overlapping classes. Is possible in the future to clusterize the particles (using PCA and algorithms like kmeans) to get the cluster, instead of a number of particles decided by the user?
That could be useful and more straightforward.

Andrea

Muyuan Chen

unread,
Jul 19, 2024, 4:31:34 PM7/19/24
to em...@googlegroups.com
If you want classification, use —mode classify instead of regress. I find it often less useful unless the latent space shows obvious clustering.

On Jul 19, 2024, at 1:06 PM, Andrea Dallapé <dallape...@gmail.com> wrote:


Thank you for the explanation.

Now I see the problem of using big nptcls and overlapping classes. Is possible in the future to clusterize the particles (using PCA and algorithms like kmeans) to get the cluster, instead of a number of particles decided by the user?
That could be useful and more straightforward.

Andrea

On Fri, Jul 19, 2024 at 6:36 PM Muyuan Chen <g5v...@gmail.com> wrote:
The heterogeneity analysis, e2gmm_heter_refine you ran in the first place, maps particles to a low dimensional space based on their conformations. So each point in that space (midall_xx in that gmm folder) represents one particle. Then e2gmm_eval just takes that information and groups particles based on their conformation, and reconstructs individual maps. So when you specify --ncls=5 and --nptcls=10000, the program pick 5 point along the first eigen-vector of that conformation space with even spacing, then for each point, group 10000 particles closest to that point and make reconstruction from them. So if you have fewer 50000 particles total, the classes would overlap. 

By default, e2gmm_eval decomposes the conformation space by PCA, so the different movement axis should be more or less orthogonal. However, even for a distribution on 2D space, there are infinite ways to decompose it along two axes. For example, if you consider two ligands (A,B) are moving independently, each along a linear trajectory, the optimal way is to decompose the two axis as A and B, but it could also be decomposed as (A+B, A-B) or even (0.2A+0.8B, 0.8A-0.2B), and the two axes would also be orthogonal. Since the program only looks at the conformation space, which type of decomposition you get is based on the population of particles, and is often quite random. To find the exact the two vectors of A and B from the conformation space can be quite tricky, and to determine whether they are independent, it needs some statistical tests...
<image.png>


Muyuan

Andrea Dallapé

unread,
Sep 12, 2024, 9:38:47 AM9/12/24
to em...@googlegroups.com
Hi Muyuan,

I would like to ask for your advice once again.

I'm playing with e2gmm_heter_refine with--nogoldstandard. I think it is an excellent addition, thank you. 
To improve the analysis, makes it sense to increase --niter, --ngauss (default is 8000, but according to the help it is just for the initial analysis), and --batchsz (24, or 32, instead of the default 16. Does it improve the DNN training)?

I found a state much less populated than the other,s hence it is limiting for e2gmm_eval if I increase --ptcl. Could you suggest me a way to extract/remove these particles from the batch, so I can analyze better the remaining particles and clusterize all of them, create a class and refine them individually?

Thank you again for your help.
Best,
Andrea

Muyuan Chen

unread,
Sep 12, 2024, 12:54:05 PM9/12/24
to em...@googlegroups.com
Hi Andrea,

I generally don’t change those options myself. --ngauss and --batchsz are mostly there because of GPU memory limitation. If your GPU memory is sufficient, you can set --ngauss to the actual number of Gaussian in the GMM used for alignment. Smaller --batchsz cost less memory, but might be less stable. 

I think in theory, if you have a list like gmm_xx/ptcls_cls_00.lst, you can run "e2proclst.py gmm_xx/ptcls_cls_00.lst --create gmm_xx/ptcls_cls_00_03.lst --getclass 3" to get the particles associated to the 3rd map in gmm_xx/ptcls_cls_00.hdf. Then just treat it as regular particle set to refine it. 

It doesn't have a explicit option to exclude a class, but you can just do --getclass for every class, then merge them with "e2proclst.py a.lst b.lst c.lst --create abc.lst", then do another round of analysis on the combined set. 

Or just go back to the traditional classification with the references you get from the GMM analysis. It takes masks and other options, and sometimes works too.. 
e2spa_refine_multi.py ref1.hdf ref2.hdf --ptcl r3d_xx/ptcls_yy.lst

Muyuan

Andrea Dallapé

unread,
Sep 16, 2024, 3:44:35 PM9/16/24
to em...@googlegroups.com
Thank you so much for the suggestion Muyuan!

I like the multireference classification approach and I'm trying to use it to sort the particles. No good results for now, but probably I have to hone some parameters, or maybe the variability is not big enough. I'm trying with a mask. 

I noted that, using --sgd parameter (no matter the value) I get the error in the file attached. At the end of the 10th iterations (or at the end in any case, with different--niter), it crashes while searching the threed_50_00.hdf file. The three file is the same in any case.
This occurs with or without references and with --sgd. 

Best,
Andrea

error_sgd.rtf

Muyuan Chen

unread,
Sep 16, 2024, 3:51:58 PM9/16/24
to em...@googlegroups.com
Oh, I only fixed the --sgd error last month, so you probably just need to upgrade. 
That said, if you use the results from GMM analysis as references, I thought you don't need the --sgd part? That was mostly for de nove classification. If the classification gradually diverges from the given references, something else might be wrong...
Muyuan

Andrea Dallapé

unread,
Sep 16, 2024, 4:07:07 PM9/16/24
to em...@googlegroups.com
Ah! Sorry my fault, on this workstation I didn't pull the last version!
I'm running patch-by-patch refinement on another GPU, hoping that -with e2spa_trajfromrefine.py- I could solve all my troubles related to this subtle movement.

Thank you for your help!
Andrea

Andrea Dallapé

unread,
Sep 20, 2024, 10:26:39 AM9/20/24
to em...@googlegroups.com
Hi!

the workstation in which I was running patch-by-patch refinement crashed. I saw there is a --recover option, but I didn't understand what argument it wants: I tried to put the gmm folder of the crashed work, but there is a unrecognized arguments error.
Could you kindly guide me to the right argument?

Thank you so much for your help,
Andrea

Muyuan Chen

unread,
Sep 20, 2024, 12:09:52 PM9/20/24
to em...@googlegroups.com
I think you just specify --path=gmm_xx which is the existing refinement, then simply --recover. There is no guarantee how much it can recover but it is worth trying. 

Andrea Dallapé

unread,
Sep 20, 2024, 12:12:25 PM9/20/24
to em...@googlegroups.com
Hi Muyuan,

yes, I tried different things and --path gmm_xx --recover is -hopefully- working. At least it started from the last finished iteration.

Thank you,
Andrea 

Reply all
Reply to author
Forward
0 new messages