e2spt_classify.py crashing

61 views
Skip to first unread message

Olivier Le Bihan

unread,
Jan 6, 2022, 12:38:35 PM1/6/22
to EMAN2
Hi all,

I have a problem with running Multi-reference refinement using e2spt_classify.py.


## EMAN2 version:

e2version.py
EMAN 2.91 final ( GITHUB: 2021-03-08 11:36 - commit: 81caed2 )
Your Python version is: 3.7.9


## Command:

e2spt_classify.py sets/fil_spt_113_it06_re256_bin2.lst --refs=spt_113/threed_06_addnoise01.hdf,spt_113/threed_06_addnoise02.hdf,spt_113/threed_06_addnoise03.hdf --mask=spt_124/mask_spt124_modelinput_large.hdf --niter=8 --sym=c57 --tarres=40.0 --threads=12 --verbose=5 --path=spt_180


It is crashing towards the end of the first iteration, somehow at the averaging step to create even and odd averages. The alignment step worked fine.


## It only creates those files:

classmx_01.txt   init_ref_02.hdf               particle_parms_01_cls02.json
init_ref_00.hdf  particle_parms_01_cls00.json
init_ref_01.hdf  particle_parms_01_cls01.json


## Partial command output (starting where the error pops up):

X 0.422 -0.563  0.422   0
X 0.422 0.422   0.422   0
0.422   0.422   0.422   130.835 42.427  175.907 (-0.048)  7     8
X 0.422 -0.563  -0.563  1
X 0.422 0.422   -0.563  1
X 0.422 0.422   0.422   1
Pariticles in each class:  41, 72, 71
e2spt_average.py --threads 12 --path spt_180 --sym c57 --iter 1
0  threads
Traceback (most recent call last):
  File "/opt/eman2/eman2-2.9.1/bin/e2spt_average.py", line 421, in <module>
    main()
  File "/opt/eman2/eman2-2.9.1/bin/e2spt_average.py", line 338, in main
    ave.process_inplace("xform.applysym",{"averager":"mean.tomo","sym":options.sym})
AttributeError: 'NoneType' object has no attribute 'process_inplace'
e2refine_postprocess.py --even spt_180/threed_01_even.hdf --odd spt_180/threed_01_odd.hdf --output spt_180/threed_01.hdf --iter 1 --mass 500 --restarget 40.0 --threads 12 --sym c57  --automask3d mask.fromfile:filename=spt_124/mask_spt124_modelinput_large.hdf
Wed Jan  5 19:57:38 2022: e2proc3d.py spt_180/threed_01_even.hdf spt_180/fsc_unmasked_01.txt --calcfsc=spt_180/threed_01_odd.hdf
Traceback (most recent call last):
  File "/opt/eman2/eman2-2.9.1/lib/python3.7/site-packages/EMAN2db.py", line 657, in db_get_image_count
    ret = EMUtil.get_image_count_c(fsp)
RuntimeError: FileAccessException at /home/eman2/miniconda3/conda-bld/eman2_1615224802990/work/libEM/io/hdfio2.cpp:464: error with 'spt_180/threed_01_even.hdf': 'cannot access file 'spt_180/threed_01_even.hdf'' caught


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/eman2/eman2-2.9.1/bin/e2proc3d.py", line 890, in <module>
    main()
  File "/opt/eman2/eman2-2.9.1/bin/e2proc3d.py", line 308, in main
    else : nimg = EMUtil.get_image_count(infile)
  File "/opt/eman2/eman2-2.9.1/lib/python3.7/site-packages/EMAN2db.py", line 660, in db_get_image_count
    raise Exception(fsp)
Exception: spt_180/threed_01_even.hdf
Error running:  e2proc3d.py spt_180/threed_01_even.hdf spt_180/fsc_unmasked_01.txt --calcfsc=spt_180/threed_01_odd.hdf
e2spt_average.py --threads 12 --path spt_180 --sym c57 --iter 1
0  threads
Traceback (most recent call last):
  File "/opt/eman2/eman2-2.9.1/bin/e2spt_average.py", line 421, in <module>
    main()
  File "/opt/eman2/eman2-2.9.1/bin/e2spt_average.py", line 338, in main
    ave.process_inplace("xform.applysym",{"averager":"mean.tomo","sym":options.sym})
AttributeError: 'NoneType' object has no attribute 'process_inplace'
e2refine_postprocess.py --even spt_180/threed_01_even.hdf --odd spt_180/threed_01_odd.hdf --output spt_180/threed_01.hdf --iter 1 --mass 500 --restarget 40.0 --threads 12 --sym c57  --automask3d mask.fromfile:filename=spt_124/mask_spt124_modelinput_large.hdf
Wed Jan  5 19:57:39 2022: e2proc3d.py spt_180/threed_01_even.hdf spt_180/fsc_unmasked_01.txt --calcfsc=spt_180/threed_01_odd.hdf
Traceback (most recent call last):
  File "/opt/eman2/eman2-2.9.1/lib/python3.7/site-packages/EMAN2db.py", line 657, in db_get_image_count
    ret = EMUtil.get_image_count_c(fsp)
RuntimeError: FileAccessException at /home/eman2/miniconda3/conda-bld/eman2_1615224802990/work/libEM/io/hdfio2.cpp:464: error with 'spt_180/threed_01_even.hdf': 'cannot access file 'spt_180/threed_01_even.hdf'' caught

######

Is there a known bug with the program in the latest stable version?

Best regards,

Olivier Le Bihan


MuyuanChen

unread,
Jan 6, 2022, 12:53:52 PM1/6/22
to em...@googlegroups.com
It is a known bug but I don’t really have a good way to fix it. It happens when one class has zero particles. Saying that, I have not touched the program for a while, and am surprised it still runs. e2spt_refinemulti.py is better maintained and generally safer. If you have an existing single model refinement, e2spt_refinemulti_noali.py can generate initial references in a better way with —ncls than adding noise, and may avoid the issue. If you can upgrade to the continuous build version, there is a new protocol that I use more often lately. 


--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/580b5ded-e9e3-4d31-8eb1-252584f5dcc4n%40googlegroups.com.

Olivier Le Bihan

unread,
Jan 6, 2022, 1:19:12 PM1/6/22
to EMAN2
Thanks for your reply and advice.

I think I will install the latest release then. I wanted to try it anyway because of the very interesting e2spt_refine_new.py approach.

I have been using e2spt_tiltrefine with some good results but was wondering how to use the result for subsequent refinement with e2spt_refine. For that to work,  e2spt_tiltrefine would need to recreate 3D particles to be used by  e2spt_refine. Is there actually a way to do that? I mean, if I use a limited tilt range (lets say +/- 40 degrees) for e2spt_tiltrefine, how could it create refined 3D particles containing the full range of tilts? Anyway, I guess the new continuous build version is handling that problem, so I am going to upgrade.

Sorry for the stupid question but I am not very familiar with conda environments; can I have two versions of EMAN2 installed at the same time on my system? I don't see why not since the conda environments are isolated but I think I remember somewhere reading the advice to deinstall previous EMAN2 versions prior to installing a new one, that's why I am asking.

Would that work: (1) setting the path to the /bin version I want to work with and (2) source /path/to/eman2/version/bin/activate base

Best,

Olivier Le Bihan

shadow walker

unread,
Jan 6, 2022, 1:29:55 PM1/6/22
to em...@googlegroups.com
If you want to keep your previous installation, you can keep it. When you install the new eman2 binary skip conda initialization step at the end of the installation. Then you can activate the new environment by path. 

--
shadow_walker

MuyuanChen

unread,
Jan 6, 2022, 1:58:24 PM1/6/22
to em...@googlegroups.com
The e2spt_refine_new pipeline should meet the requirement you mentioned. Specifically, the “—keep” option now take 3 values separated by ‘,’. This let you throw away bad particles based on the overall 3d particle quality (false positive particle picking), 2d subtilt quality (radiation damage at later tilt images, or blocked by ice etc), and scale of 2d particle drift. Personally I found excluding by 2d subtilt quality a better way than excluding by tilt range. They basically correlate but the former also deal with stage jumping in data collection and all kind of tilt scheme. There is also a new tool e2spt_evalsubtlt.py that you can visualize the impact and decide the strategy for bad particle removal in your project.


Olivier Le Bihan

unread,
Jan 7, 2022, 7:36:17 AM1/7/22
to EMAN2
Thanks a lot for including some tips in your answer, very much appreciated!

Best,

Olivier

Mikey Grunst

unread,
Jan 5, 2026, 5:30:47 PM (8 days ago) Jan 5
to EMAN2
Dear Muyuan,

Following up on this thread - I have a few questions about --keep:

-When running e2spt_refine_new.py with the --keep option (say --keep=0.9), where can we find how many 2d/3d particles are included in the final map (for reporting purposes)? 

-Is --keep only using the best particles to update the reference (while keeping all particles for alignment)? Could you please clarify the differences between using --keep during refinement and discarding junk particles by other means?

-If you had a strong alignment from the e2spt_refine_new.py and the "--keep" option, and still wanted to remove junk particles, would a reasonable approach be to use the e2spt_refine_multi_noali.py (with a single reference)?

Thanks for the help!

EMAN 2.99.66 ( GITHUB: 2024-11-08 01:10 - commit: NOT-INSTALLED-FROM-GIT-REPO )
Your EMAN2 is running on: Linux-6.11.0-29-generic-x86_64-with-glibc2.39 6.11.0-29-generic
Your Python version is: 3.12.9

-Mikey

Muyuan Chen

unread,
Jan 5, 2026, 5:50:26 PM (8 days ago) Jan 5
to em...@googlegroups.com
The particle count is reported by the command line output of make3d commands. For reporting, you can simply multiply particle count that goes to the refinement with 0.9 (--keep). Since my understanding is they only ask for 3D particles used, not the 2D ones. 

In most of my tests, --keep does not help that much. The 3D reconstruction is already weighted by the alignment score of particles, so the "bad" particles are weighted at near 0 anyways. Of course this is also related to how many particles you have and whether your resolution is limited by particle count or quality. 

It largely depends on your definition of "junk" particles. Multi reference refinement (I use e2spt_refinemulti_new.py now) is most useful at separating particles of different structures, instead of removing actual "junk" (i.e. ice debris, pure noise etc), since there is no reason each piece of junk looks like each other. If there are not too many of non-particle junk, it's normally ok since they will be down weighted in reconstruction. Ideally, don't pick those non-particle things in the first place. If there are too many and you want to remove them, I have a script called e2classifycnn.py that should more or less work. Basically, run "e2classifycnn.py set/xx.lst", and select a few obvious non-particle things in the image stack view and click train. It should sort particles based on similarity to those things and you can specify a threshold to remove particles. It was originally designed for single particle but also works on SPT using projections. You might need to downsample the particles to something like 64 cube for it to work though. 

Muyuan

Mikey Grunst

unread,
Jan 5, 2026, 11:53:33 PM (8 days ago) Jan 5
to EMAN2
Thank you Muyuan for the clarification!

I unfortunately don't have a saved terminal output for the e2spt_refine_new.py run. But I believe the terminal outputs only explicitly state the 2d particles remaining after --keep. Yes they ask for the 3d only particles at the end.

Maybe an example would be helpful: 
If I ran in spt_01/ 10,000 input particles in e2spt_refine_new.py using p,p,t iterations with --keep = 0.95, I would report 9,500 3d particles?

If I then continued from spt_01/ in spt_02/ and ran p,t with --keep = 0.9, I would report 9,500 x 0.9 = 8,550 3d particles?

Is this percent removal multiplying across e2_spt_refine_new.py runs, across each iteration within runs, or neither?

Thank you also for your advice about weighting and this handy program! I will give this a try. The Convnet autopicker also did a pretty good job from the start.

-Mikey

Muyuan Chen

unread,
Jan 5, 2026, 11:59:38 PM (7 days ago) Jan 5
to em...@googlegroups.com
- If I ran in spt_01/ 10,000 input particles in e2spt_refine_new.py using p,p,t iterations with --keep = 0.95, I would report 9,500 3d particles?
Yes.

- If I then continued from spt_01/ in spt_02/ and ran p,t with --keep = 0.9, I would report 9,500 x 0.9 = 8,550 3d particles?
No. You can check the number of entries in particle_info_3d.json in the corresponding spt_xx folder, and they should be the same. It should always be that number multiply --keep.  The particle exclusion only happens at the reconstruction step. You don't lose particles continuing from an existing refinement, or over many iterations. It does not actually remove any particles, just set the weight to zeros in that reconstruction run. The information is not carried over to the next iteration either. 




Steve Ludtke

unread,
Jan 6, 2026, 7:43:13 AM (7 days ago) Jan 6
to em...@googlegroups.com
Particles excluded by --keep are excluded temporarily, during a single iteration. If keep=0.9, each iteration will use the "best" 90% of the particles. These sets of "best" particles is likely to overlap significantly as you approach convergence, but will not be identical from one iteration to the next due to the high noise level present.

It is critical to understand that in both CryoEM and CryoET that there are really 3 different types of particles in any given iteration. There are "good" particles, which similarity metrics show agree well with the structure, there are "bad" particles, which are physically damaged or not even the correct molecular species, and there are "statistically bad" particles, which are the fraction of particles which belong in the "good" category, but due to noise agree worse with the 3-D structure than other particles in a specific iteration.

From a scientific perspective you should do whatever you can to remove "bad" particles, as they can only degrade the structure, but "statistically bad" particles are a much trickier issue, and mathematically the "bad" and "statistically bad" particles overlap. If, for example you take two micrographs. In one the ice is thin and in the other thick. The particles from the thin ice image will each contribute more to the image. If orientations could be determined with perfect accuracy, then it would be appropriate to keep all of the particles from both micrographs and weight them according to their SNR. However, if the thick ice means the orientations are not as accurate (usually true), then these particles may actually degrade the structure. But then there are other issues. For example, the particles in thicker ice may have a more isotropic orientation distribution, and thus be critical to include.   Not a simple problem.

Generally there are 2 accepted approaches to handle this per-particle classification uncertainty in the field, neither of which is really ideal. 1) at each iteration you throw away the "worst" particles, hoping that you at least get almost all of the "bad" particles, and you just live with the contribution of the "statistically bad" particles which remain. 2) you do an N-way classification during refinement and hope that "bad" particles go in one group and the "best" particles go in another group. 

This (2) is the standard approach used in Relion refinements, and undeniably leads to higher resolution structures, even if it is mathematically _very_ shaky. The strategy is mathematically solid IF you have a system in 3 different physical states and you classify them into 3 groups, then you will have some per-particle misclassification but you should generally wind up with pretty good sets for additional refinement (or you can marginalize rather than classifying). However, if what you have is 2 different physical states and a bunch of particles which are all "bad", why should the "bad" particles look more like some random Frankenstein map than they look like one of the two "good" maps?  You'll get rid of some of the bad particles this way, but you'll still leave a good chunk of them associated with the "good" maps and you will include a decent subset of "good" particles in the "bad" set.


Mikey Grunst

unread,
Jan 12, 2026, 12:04:26 PM (24 hours ago) Jan 12
to em...@googlegroups.com
Thank you Steve and Muyuan for your suggestions and insights! It makes sense that random junk would not sort out into a single subclass. Thank you for clarifying the --keep option as well.

Best,

-Mikey

Reply all
Reply to author
Forward
0 new messages