cryoET 3D classification

105 views
Skip to first unread message

Esther Bullitt

unread,
Apr 7, 2026, 10:29:29 AMApr 7
to EMAN2
Hi,
Is there somewhere I can read about how to choose the 3D classification scheme for a cryoET dataset? — EMAN2 includes ortho-proj that uses MSA, PCA-based, and multi-reference 3D classification.

Do I try them all, or is there a way to make an informed decision?

thanks very much,
Esther

Muyuan Chen

unread,
Apr 7, 2026, 11:49:22 AMApr 7
to em...@googlegroups.com
Unfortunately I don’t think there is one... A main reason is that I keep making new programs and changing existing one. 

I often start from e2spt_sgd_new.py. It has the —ncls and —classify options that does classification, and can also load from existing aligned particles. More recently it also supports —realspace and —mask that let you focus classification on local regions. Notably it only gives structures of multiple classes but not classify all particles. For that it requires e2spt_refinemulti_new.py. This one also works with or without initial alignment, with or without mask, and it split the particles to corresponding classes. It also taks subtilt refinement results for higher resolution classification. However, its convergence rate is much slower so it is better to run it with references. There is also now e2gmm_spt_heter_refine.py, which is probably better at continuous motion of local regions. 

Muyuan



On Apr 7, 2026, at 7:29 AM, 'Esther Bullitt' via EMAN2 <em...@googlegroups.com> wrote:

Hi,
--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eman2/310239d2-6a55-4d89-93fb-373aeebaf10an%40googlegroups.com.

Steve Ludtke

unread,
Apr 7, 2026, 12:42:23 PMApr 7
to em...@googlegroups.com
You left out the GMM-based dynamics classification. That's actually probably the one I'd use as a first attempt nowadays, unless you are confident that you know your data should divide into N<5 discrete classes (and you know N).  It's use on CryoET data is still a bit experimental, but it worked pretty well in some testing I did.

Esther Bullitt

unread,
Apr 7, 2026, 2:51:41 PMApr 7
to EMAN2
The GMM-based dynamics classification looks promising.
For those interested, here is the documentation website:  https://blake.bcm.edu/emanwiki/doku.php?id=eman2:e2gmm

Thanks!

Steve Ludtke

unread,
Apr 7, 2026, 3:11:16 PMApr 7
to em...@googlegroups.com
I'd actually recommend starting with the video tutorial to explain some of the interface, though it's a few years old now, and probably needs an update, it will at least have the basics:

The earlier videos in that series also have good tutorials on tomography and single particle work (as of 2022). Most of it is still relevant.

Bullitt, Esther

unread,
Apr 14, 2026, 8:05:31 AMApr 14
to em...@googlegroups.com
  1.  $ e2version.py
EMAN 2.99.72 ( GITHUB: 2026-03-06 12:57 - commit: 1bb4904d8 )
Your EMAN2 is running on: Linux-6.5.0-28-generic-x86_64-with-glibc2.39 6.5.0-28-generic
Your Python version is: 3.12.13

  1.  e2spt_refine_new.py --ptcls=sets/extracted_tomobox.lst --ref=clip72_CS20_mask.mrc --startres=30.0 --goldstandard --sym=H1:3:111.8:8.95 --iters=p3,t2,p,t,r,d --keep=0.95 --localrefine --maxres=0.0 --minres=0.0 --parallel=thread:4 --threads=6

  1.  tomography, ~500K particles, 72 pix^3

We have been having problems with the computer hanging, and needing to be rebooted by pressing the power button (it works on smaller datasets).
The first problem was a full SSD drive, from old cached data (NOT eman2), which we deleted.

Our IT person wrote me to say:
Something is wrong with your eman2 installation or how you are running. You were running out of RAM just for test I created 200gb swap file and already over 100gb used. Eman2 suppose to create it is own cache but it does not.

He created a swap file on the SSD, but I dont know how to get EMAN2 to use it — is there a configuration file somewhere that I can edit? 

Thanks!
Sincerely,
Esther
-- 

Esther Bullitt, PhD

Professor of Pharmacology, Physiology & Biophysics

Director, Cryogenic Electron Microscopy Core

Boston University Chobanian & Avedisian School of Medicine

700 Albany St, W-302

Boston, MA. 02118

Steve Ludtke

unread,
Apr 14, 2026, 4:07:13 PMApr 14
to em...@googlegroups.com
Ok, it's hard to say with absolute certainty what's causing the excessive memory issue from this information. Is the problem happening during the orientation refinement or is it happening during the 3-D reconstruction at the end of each iteration?  

Some things to consider:
- If the problem is happening at the 3-D reconstruction stage, try adding the --m3dthread option

- Helical symmetry can sometimes cause some peculiar problems. With your specifications, it looks like you would have roughly 3*72/8.95 = 24 symmetric copies. If you have a 2 degree tilt step, this effectively means you're working with 500,000 * 60 * 24 = 720,000,000 2-D particle images. 
- This is far in excess of the 7 digit precision supported by single precision floating point numbers, and can potentially produce severe mathematical artifacts.
- This is far more particles than is likely to usefully contribute to a 3-D structure, ignoring the previous point, without subdividing into many classes. This is one of the reasons why Muyuan's picking approach focused on random sites along a filament. Once you exceed a certain number of particles there just isn't anything useful to be gained (other than noise bias). 

- It is conceivable that there is a bug somewhere, possibly relating to the helical symmetry, causing memory leaks of some sort.  To help figure out what's going on, launch the job, then run "top" in a separate window, and press "M". This will sort in order of memory usage. Watch the job as it starts to run. If you see the memory usage gradually rising (after the initial program has fully started running) then we need to know which program it is. 

- I would consider an initial run using maybe 10,000 particles to start and seeing what happens with that. eg-
head -10000 sets/extracted_tomobox.lst >sets/test_10k.lst
e2spt_refine_new on sets/test_10k.lst

- I would omit the "r" and "d" iterations at least initially. They rarely do anything very useful, and are very time consuming.


--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

Bullitt, Esther

unread,
Apr 15, 2026, 10:50:53 AMApr 15
to em...@googlegroups.com, Steve Ludtke
Hi Steve,

Below is what Raven found when running top while the job was running.

The very slow behavior begins right away with ‘gathering metadata’ which took hours to complete and although it has not crashed the computer again, it did not finish one iteration in 24 hours.


image.jpeg


  • There are 4 jobs in total with the CPU% 100 and MEM% 20 that are marked as 'R' and each of those jobs has 31 jobs of the same name each with CPU% 0 and MEM% 20 marked as 'S'
  • Each of these jobs is for the e2spt_align_subtlt.py step in e2spt_refine_new.py
  • Parameters used for the above :  --parallel=thread:4, --threads=6
  • Previous parameters used that were equally unsuccessful as the above and had the same CPU% and MEM%:  --parallel=thread:4, threads=10
  • Previous parameters used that worked with fewer particles (~10,000) and for both c1 and helical symmetry : --parallel=thread:4, --threads=10
The slowing to a halt issue has been occurring at the very beginning step when the program is gathering metadata, but the CPU% and MEM% at that step is 0 and 0.1, respectively I believe.

Info about ssd/swap before IT created the swap file(s):
image.jpeg
After IT created the swap file(s) and performed an unknown test:
image.jpeg


Currently we cannot start with a ‘clean slate’, because after killing yesterday’s new refinement job in the e2projectmanager.py task manager, htop still shows that the same amount CPU% and MEM% is being used. This is also true for after closing e2projectmanager.py entirely .

image.jpeg

Once the CPU% and MEM% return to baseline, I can provide screenshots for the exact CPU% and MEM% when fewer particles are used for the new refinement job. The setup has 128G RAM and 32 CPU cores

Thanks!
Esther Bullitt and -Raven Gonsoulin

Steve Ludtke

unread,
Apr 15, 2026, 12:12:51 PMApr 15
to Bullitt, Esther, em...@googlegroups.com
Ok, good information, but some mysteries present here...

First, general point, killing the parent process won't always immediately kill the child processes. Usually the child processes will die when they finish an iteration on their own, but if things are stuck in some fashion, you may have to kill them manually (use killall -9 to terminate them). 

The fact that you see dozens of e2parallel jobs running at once, but have requested --parallel=thread:4 means that something very strange is happening. With that option you should never see more than 5 e2parallel jobs running at once. I would have said that somehow you have a bunch of child processes from previous attempts which haven't been killed, but looking at the process report, all of them have:

-taskin=/tmp/e2tmp.344795263/0000000

which implies that they all came from the same parent job!?   I do not understand what's causing that behavior if the command-line you provided is accurate.

Could you 
grep e2spt_align_subtlt .eman2log.txt
and post the last line (assuming the last run was a failure)? 


On Apr 15, 2026, at 9:50 AM, Bullitt, Esther <bul...@bu.edu> wrote:

Hi Steve,

Below is what Raven found when running top while the job was running.

The very slow behavior begins right away with ‘gathering metadata’ which took hours to complete and although it has not crashed the computer again, it did not finish one iteration in 24 hours.


<image.jpeg>


  • There are 4 jobs in total with the CPU% 100 and MEM% 20 that are marked as 'R' and each of those jobs has 31 jobs of the same name each with CPU% 0 and MEM% 20 marked as 'S'
  • Each of these jobs is for the e2spt_align_subtlt.py step in e2spt_refine_new.py
  • Parameters used for the above :  --parallel=thread:4, --threads=6
  • Previous parameters used that were equally unsuccessful as the above and had the same CPU% and MEM%:  --parallel=thread:4, threads=10
  • Previous parameters used that worked with fewer particles (~10,000) and for both c1 and helical symmetry : --parallel=thread:4, --threads=10
The slowing to a halt issue has been occurring at the very beginning step when the program is gathering metadata, but the CPU% and MEM% at that step is 0 and 0.1, respectively I believe.

Info about ssd/swap before IT created the swap file(s):
<image.jpeg>
After IT created the swap file(s) and performed an unknown test:
<image.jpeg>


Currently we cannot start with a ‘clean slate’, because after killing yesterday’s new refinement job in the e2projectmanager.py task manager, htop still shows that the same amount CPU% and MEM% is being used. This is also true for after closing e2projectmanager.py entirely .

Raven Gonsoulin

unread,
Apr 15, 2026, 1:55:11 PMApr 15
to em...@googlegroups.com, Bullitt, Esther
  • grep e2spt_align_subtlt .eman2log.txt
image.jpeg
Context: 'spt_20' is one of the previous jobs that slowed and froze the computer. We hard reset the system and IT created the swap files. After, job 'spt_21' (same parameters as spt_20) did not seem to crash the system. However, its efficiency was very poor ('Gathering metadata...' took a couple hours and the first refinement iteration was still running overnight into the next day). 'spt_21' was manually killed before running 'spt_22' (same parameters except 10,000 particles rather than 500,000 particles) which is what is pictured below.
  • e2spt_refine_new.py --ptcls=sets/test_10k.lst --ref=clip72_CS20_mask.mrc --startres=30.0 --goldstandard --sym=H1:3:111.8:8.95 --iters=p3,t2,p,t --keep=0.95 --localrefine --maxres=0.0 --minres=0.0 --parallel=thread:4 --threads=6
"Gathering metadata..." step was pretty quick this time
"Loading 3D particles - 9997 particles from 2 tomograms" this took a long time
image.jpeg
After 20 minutes at this step, CPU% = 0 and MEM% = 0.1
Eventually proceeded to:
image.png
where CPU% and MEM% became:
image.jpeg
then iter 1 subtlt step had CPU% and MEM% :
image.jpeg
-Raven



--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to a topic in the Google Groups "EMAN2" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eman2/ctAkyrLEVcM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eman2+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eman2/C68881FF-861C-40D9-889D-D2145C1AAF31%40gmail.com.

Tim Schulte

unread,
Apr 16, 2026, 8:05:17 AMApr 16
to EMAN2
Hej there,
I think I had a similar issue on my workstation.
At least that is what I have understood from your IT person's comment:
"""
He created a swap file on the SSD, but I dont know how to get EMAN2 to use it — is there a configuration file somewhere that I can edit? 
"""
Maybe this is related to the following error that I got previously? On top of that error msg I also got an unresponsive workstation. 
"""
Fix Eman2 [Errno 28] error OSError: [Errno 28] No space left on device when running a conda program.
"""
With the help of chatgpt I got following work-around. 
Open a new bash and then try following:

sudo -E unshare -m bash
mount --bind /scratch/tmp /tmp
conda activate Eman2
Do eman2 stuff
exit

That worked for me, and I have not encountered any issues on my workstation. However, maybe consult your IT expert before trying this ;)

That way you force Eman2 to use the new temporary folder on scratch or external hard-drive. 

/Tim



On Tuesday, 14 April 2026 at 14:05:31 UTC+2 Bullitt, Esther wrote:
  1.  $ e2version.py
EMAN 2.99.72 ( GITHUB: 2026-03-06 12:57 - commit: 1bb4904d8 )
Your EMAN2 is running on: Linux-6.5.0-28-generic-x86_64-with-glibc2.39 6.5.0-28-generic
Your Python version is: 3.12.13

  1.  e2spt_refine_new.py --ptcls=sets/extracted_tomobox.lst --ref=clip72_CS20_mask.mrc --startres=30.0 --goldstandard --sym=H1:3:111.8:8.95 --iters=p3,t2,p,t,r,d --keep=0.95 --localrefine --maxres=0.0 --minres=0.0 --parallel=thread:4 --threads=6

  1.  tomography, ~500K particles, 72 pix^3

We have been having problems with the computer hanging, and needing to be rebooted by pressing the power button (it works on smaller datasets).
The first problem was a full SSD drive, from old cached data (NOT eman2), which we deleted.

Our IT person wrote me to say:
EMAN2 crashing a linux machine

Steve Ludtke

unread,
Apr 16, 2026, 10:04:28 AMApr 16
to em...@googlegroups.com
Ok, we're mixing up terms here.

swap is what happens when the computer runs out of memory during a job, then the computer shuffles some less used memory temporarily to disk in a swap file. While this was once a fairly common thing to do on Linux boxes, for most modern machine configurations and uses it is no longer really useful, and should not be necessary if you're running jobs appropriate for your machine. It will make performance of the swapping job abysmal, and if you swap to SSD, you may prematurely kill the drive.

caching on the other hand is something that a job may do to improve performance. For example, if your data files live on a large but slow RAID array, and you have a 5 GB/s SSD in your computer, if the data needs to be accessed multiple times, it makes sense to copy the data to the SSD, then access it from there. There is also temporary caching during parallelism to allow different processes to communicate with each other.

When you run an EMAN2 command with --parallel=thread:N, it will, by default, use /tmp for caching. On many modern linux machines, /tmp is mounted as a ramdisk. That is, it is a chunk of your system memory pretending to be a hard drive. As long as /tmp is only used for very small files, this works fine. Your mount command is shifting /tmp to a scratch hard drive, which is one way to solve the problem

An alternative solution, which is how EMAN2 was actually designed to be used is

--parallel=thread:N:/scratch/tmp

which specifies where it should put the scratch space (instead of /tmp).

This will not solve Esther's problem which was exhausting RAM, unfortunately.  But a very good discussion!


--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

Steve Ludtke

unread,
Apr 19, 2026, 5:02:33 PMApr 19
to em...@googlegroups.com, Bullitt, Esther
Hi Raven,
sorry, I just realized this reply was buried under another message and reply. I thought I had replied to either you or Esther again about this issue, but I don't see the reply if so...

The command you're using seems to be some sort of enhanced "top" not the standard one, so I'm not 100% sure it's displaying what I think it is. I don't see anything here at least which seems to be using a lot of RAM.

I see that you are using 10k particles this time, which should help. For 10,000 particles, assuming a 2 degree tilt, that means that your subtilt data is 10000*60*144*144*4 = 50 GB of data. IF you have the data on fast storage, like an M.2 SSD for the refinement, then it would take about 10 s to read this much data, and you shouldn't have severe performance issues. However, if you had the data on, say, an NFS filesystem mounted remotely on a 1 Gb network connection, just reading the image data (for the 10k set) would take ~10 minutes. If you were working with 500k particles, that would rise to ~8 hours just to read the data one time.  So, answers on some of these questions depend on a lot of specific technological issues.

Having said that, I do not understand at all how you are getting so many processes running at once. Since they have sequential PIDs (process id's), that does imply that they were all launched from the same parent process, but I don't see anything in e2spt_refine_new which would cause this to happen.

You're just running this command straight from the command line, not using some sort of BQS or 'mpirun' or something, which might be automatically launching many copies of the job, right?   This is very mysterious behavior.



On Apr 15, 2026, at 12:54 PM, 'Raven Gonsoulin' via EMAN2 <em...@googlegroups.com> wrote:

  • grep e2spt_align_subtlt .eman2log.txt
<image.jpeg>
Context: 'spt_20' is one of the previous jobs that slowed and froze the computer. We hard reset the system and IT created the swap files. After, job 'spt_21' (same parameters as spt_20) did not seem to crash the system. However, its efficiency was very poor ('Gathering metadata...' took a couple hours and the first refinement iteration was still running overnight into the next day). 'spt_21' was manually killed before running 'spt_22' (same parameters except 10,000 particles rather than 500,000 particles) which is what is pictured below.
  • e2spt_refine_new.py --ptcls=sets/test_10k.lst --ref=clip72_CS20_mask.mrc --startres=30.0 --goldstandard --sym=H1:3:111.8:8.95 --iters=p3,t2,p,t --keep=0.95 --localrefine --maxres=0.0 --minres=0.0 --parallel=thread:4 --threads=6
"Gathering metadata..." step was pretty quick this time
"Loading 3D particles - 9997 particles from 2 tomograms" this took a long time
<image.jpeg>
After 20 minutes at this step, CPU% = 0 and MEM% = 0.1
Eventually proceeded to:
<image.png>
where CPU% and MEM% became:
<image.jpeg>
then iter 1 subtlt step had CPU% and MEM% :
<image.jpeg>
-Raven


You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eman2/CAL5b9-6HAsnAzfx%3DsW9zikEZF5iNPr31CFCZpsKij5d_akmwXw%40mail.gmail.com.

Steve Ludtke

unread,
Apr 19, 2026, 7:07:47 PMApr 19
to em...@googlegroups.com, Bullitt, Esther, Muyuan Chen
Ok, I stand corrected. I discovered where there might be a large number of processes running temporarily. When it goes through the "collecting metadata" and "Loading 3D particles" steps, during processing the metadata for the header it actually does have a step where it runs on 32 cores at once while it builds the relationships between all of the 2-D and 3-D particles.  That would explain why there is a point where there are MANY (32) processes showing up in top. 

However, this, in and of itself, should not be exhausting your RAM. I suspect the reason he hardcoded the number of threads at 32 here was because it just didn't matter that much for what he's doing in that step?  Muyuan would have to answer to that.

Anyway, with 500,000 3-D particles -> 30,000,000 2-D images (assuming 60 tilts), my suspicion is that just keeping track of all of the particle metadata may be exhausting the RAM on this machine. 

Did you determine the maximum amount of RAM it used when processing the 10k particle data set? 

If you run with a larger, but still <500k particle set, do you see an improvement in the reconstruction?

How much overlap are you using with helical picking? Since you are imposing helical symmetry, you really get no benefit from overlapping particles. It just costs you (a lot) of processing time.  ie - if you have 500,000 particles, but used a lot of overlap, rather than using the first ~50k overlapping particles, you should use 50k non(or minimally)-overlapping particles.

If you REALLY believe that using 500k particles is going to give you a better structure, I suggest processing the data in whatever size chunks will run (25k should work?), then you can simply average together all of the maps at the end without running into any mathematical inaccuracy due to the large 2-D particle count.

It's also worth mentioning that GSFSC isn't a valid test when you use overlapping helical particles...

Muyuan Chen

unread,
Apr 19, 2026, 8:13:05 PMApr 19
to Steve Ludtke, em...@googlegroups.com, Esther Bullitt
Yes, the gather metadata step runs in parallel by default. It only reads particle header (and only a few entries of the header) but not the images, so I assume the memory usage would be manageable. It’s been there for a while and I haven’t run into a case that causes memory issues at this step. That said, even with the parallelization, the speed here is largely limited to hard drive I/O. I hardcoded 32 because the cpu is never fully used by the workers anyway. It’s a few times faster than without it, but never 32x faster…

Maybe some file corruption so one of the workers cannot finish the job? Normally it has better error messages but I don’t know why it’s like this here.

There is a —spliteo option in build set step that splits half set by particle location. Should avoid overlapping affecting FSC.

Still, splitting particles is probably a good idea. It would help identifying the problem if there is a file corruption too…

On Apr 19, 2026, at 4:07 PM, Steve Ludtke <slud...@gmail.com> wrote:

Ok, I stand corrected. I discovered where there might be a large number of processes running temporarily. When it goes through the "collecting metadata" and "Loading 3D particles" steps, during processing the metadata for the header it actually does have a step where it runs on 32 cores at once while it builds the relationships between all of the 2-D and 3-D particles.  That would explain why there is a point where there are MANY (32) processes showing up in top. 

Steve Ludtke

unread,
Apr 19, 2026, 9:36:20 PMApr 19
to Muyuan Chen, em...@googlegroups.com, Esther Bullitt
The issue is that the machine is running out of RAM somewhere in the job, not necessarily in the collecting metadata step. However, this is a much larger data set than I've ever tried (500,000 3D particles selected from filaments and using helical symmetry)

Muyuan Chen

unread,
Apr 19, 2026, 10:36:50 PMApr 19
to Steve Ludtke, em...@googlegroups.com, Esther Bullitt
But if it got past the gather metadata step and then died, shouldn’t the subprocesses gone already? Also during the subtomogram alignment, each worker should only read the 3D and 2d particle it is working on, not metadata of all particles. I have ran large datasets with many particles, but only on cluster nodes with sufficient memory…

On Apr 19, 2026, at 6:36 PM, Steve Ludtke <slud...@gmail.com> wrote:

The issue is that the machine is running out of RAM somewhere in the job, not necessarily in the collecting metadata step. However, this is a much larger data set than I've ever tried (500,000 3D particles selected from filaments and using helical symmetry)

Steve Ludtke

unread,
Apr 19, 2026, 11:45:06 PMApr 19
to Muyuan Chen, em...@googlegroups.com, Esther Bullitt
I was trying to help them debug, and had them do a 'top' and saw a large number of processes, which I wasn't expecting. I believe now that this was just the metadata gathering, and unrelated to the OOM crash.

Bullitt, Esther

unread,
Apr 20, 2026, 9:59:46 AMApr 20
to Steve Ludtke, Muyuan Chen, em...@googlegroups.com
Hi Steve and Muyuan,

I think the memory issue is okay now, with the swap file designated in the command, and the lower number of particles (10K).

Is it normal for this to take 2.5 days to run to completion with 128 GB RAM and 32 cores (using 16 of them to test things out?)?

e2spt_refine_new.py --ptcls=sets/test_10k.lst --ref=clip72_CS20_mask.mrc --startres=30.0 --goldstandard --sym=H5:1:111.8:8.95 --iters=p3,t2,p,t --keep=0.95 --localrefine --maxres=0.0 --minres=0.0 --parallel=thread:16:/ssd_scratch/eman_tmp --threads=16

The result is bad, with a large hollow center and a thin shell of protein, but at least it seems to be running smoothly (translucent is the mask, purple is the refinement result)
Any suggestions are welcome.
image.png
Thank you!
Sincerely,
Esther

Steve Ludtke

unread,
Apr 20, 2026, 11:53:50 AMApr 20
to Bullitt, Esther, Muyuan Chen, em...@googlegroups.com
Hi Esther,
need to know more details about the project to offer resolution suggestions or comment on reasonableness of timing:

- What is your tilt-sequence (tilt range, step, total dose)
- A/pix at the sampling you're using
- What sort of storage is the project folder on: 
- local spinning platter single drive
- local SSD (M.2 or SATA?)
- local RAID (number of drives in RAID?)
- remote NFS drive (network speed?)
- How much overlap are you using for your particles?
- Exact CPU model on the machine 

grep "model name" /proc/cpuinfo|head -1


- Maybe consider starting out without imposing helical symmetry to see what you get? It is possible you may wind up with missing-wedge alignment issues if you do this, but it could be useful to try... 


On Apr 20, 2026, at 8:59 AM, Bullitt, Esther <bul...@bu.edu> wrote:

Hi Steve and Muyuan,

I think the memory issue is okay now, with the swap file designated in the command, and the lower number of particles (10K).

Is it normal for this to take 2.5 days to run to completion with 128 GB RAM and 32 cores (using 16 of them to test things out?)?

e2spt_refine_new.py --ptcls=sets/test_10k.lst --ref=clip72_CS20_mask.mrc --startres=30.0 --goldstandard --sym=H5:1:111.8:8.95 --iters=p3,t2,p,t --keep=0.95 --localrefine --maxres=0.0 --minres=0.0 --parallel=thread:16:/ssd_scratch/eman_tmp --threads=16

The result is bad, with a large hollow center and a thin shell of protein, but at least it seems to be running smoothly (translucent is the mask, purple is the refinement result)
Any suggestions are welcome.

Raven Gonsoulin

unread,
Apr 20, 2026, 1:36:08 PMApr 20
to em...@googlegroups.com
Hi everyone,

- Tilt range, step, total dose : 120, 3, 122
- A/pix : 1.55
- Project folder storage : EHD
- Particles were picked using the convolutional neural network (convnet.py) so I do not believe there was designated overlap of particles (unless particle overlap/spacing is automatically dictated by box size in convnet.py, in which case the box size is 72)
- I won't have access to the machine until tomorrow, but I can check if I have the CPU info saved somewhere else in the meantime

-Some important context is the structure we are interested in resolving is a complex of filaments caused by small peptide binding. In particular, we are interested in where the peptide is binding to the filaments (and possibly between filaments, linking them together). I have tried using c1 symmetry previously, but helical symmetry did seem to slightly improve the resolution. I have been looking into the advanced tomography tutorial (https://blake.bcm.edu/emanwiki/doku.php?id=eman2:e2tomo_p22) because I suspect that the peptide is introducing breaks in the symmetry, though the filaments themselves should keep their helical symmetry. Because of the heterogeneity of the macromolecular complexes, I prioritized having more particles for 3D classification. However, I can instead proceed with filament picking/extraction/refinement if that would be better.  

Best,
Raven Gonsoulin
PhD Candidate
Boston University | Graduate Medical Sciences
Pharmacology, Physiology, and Biophysics



Steve Ludtke

unread,
Apr 20, 2026, 2:27:46 PMApr 20
to em...@googlegroups.com

On Apr 20, 2026, at 12:35 PM, 'Raven Gonsoulin' via EMAN2 <em...@googlegroups.com> wrote:

Hi everyone,

- Tilt range, step, total dose : 120, 3, 122
That is a LOT of dose for subtomogram averaging if targeting SPT at subnanometer resolution. The historically determined total dose limits for single particle analysis also apply for subtomogram averaging. eg- ~30-40 e-/A^2 for subnanometer resolution and maybe 15-20 e-/A^2 for higher resolutions. Effectively this means the higher tilts in these tomograms are useful only in 3-D localization, and need to be removed for STA. Assuming you did dose-symmetric acquisition, this would mean using only ~ +-20 degree tilt data...  

Unlike single particle analysis, where you can use dose-weighting with higher dose movies to exclude the high resolution content from the higher dose frames, this is largely useless for STA, since the _only_ information at high resolution at high tilt is coming from the high dose frames, basically turning the downweighting into a low-pass filter. While this does preserve low resolution information at high tilt, you already have plenty of low resolution information from the other particles, so it's pretty pointless.

- A/pix : 1.55
- Project folder storage : EHD
Umm?  EHD?  External hard drive? Meaning a USB-attached drive?

- Particles were picked using the convolutional neural network (convnet.py) so I do not believe there was designated overlap of particles (unless particle overlap/spacing is automatically dictated by box size in convnet.py, in which case the box size is 72)
Ok, that's fine then. 500,000 is a LOT of particles...

- I won't have access to the machine until tomorrow, but I can check if I have the CPU info saved somewhere else in the meantime
ok


-Some important context is the structure we are interested in resolving is a complex of filaments caused by small peptide binding. In particular, we are interested in where the peptide is binding to the filaments (and possibly between filaments, linking them together). I have tried using c1 symmetry previously, but helical symmetry did seem to slightly improve the resolution. I have been looking into the advanced tomography tutorial (https://blake.bcm.edu/emanwiki/doku.php?id=eman2:e2tomo_p22) because I suspect that the peptide is introducing breaks in the symmetry, though the filaments themselves should keep their helical symmetry. Because of the heterogeneity of the macromolecular complexes, I prioritized having more particles for 3D classification. However, I can instead proceed with filament picking/extraction/refinement if that would be better.  
Hmm. So, a lot is going to depend on the level of heterogeneity present. eg- how many different classes would you have to use to achieve structures self-consistent to some target resolution? If the filaments have consistent symmetry and are self-consistent within the length of one box, then this effectively becomes a discrete classification problem. How many potential binding sites might be present in the length of a single box-size? 

If the helical core is sufficient to dominate the particle alignment, then individual refinements with a fraction of the particles all against the same starting model should produce pretty self-consistent orientations, which should still be usable with post-hoc classification like the GMM. However, if there are TOO many discrete classes, you might have to do some sort of targeted classification. Further, if the helix IS impacted by the binding, the problem may become very tricky indeed...

Bullitt, Esther

unread,
Apr 20, 2026, 3:24:09 PMApr 20
to em...@googlegroups.com
model name  : Intel(R) Core(TM) i9-14900KF


Steve Ludtke

unread,
Apr 20, 2026, 3:40:19 PMApr 20
to em...@googlegroups.com
That's a very strange CPU for scientific computing... It is not a 32 core processor. It's an 8P+16E core processor with support for 32 core hyperthreading. It has 8 performance cores and 16 efficiency cores. In essence you really have only 8 fast cores on the machine. How much you can get out of the "efficiency" cores is hard to guess. The number of threads a CPU supports is virtually a useless measure for scientific computing.

Bottom line is, it is almost impossible to predict effective performance. You could try running a job with 8, 16 and 24 threads and see how long each takes to run...


On Apr 20, 2026, at 2:24 PM, 'Bullitt, Esther' via EMAN2 <em...@googlegroups.com> wrote:

Core(TM) i9-14900KF

Steve Ludtke

unread,
Apr 20, 2026, 4:09:52 PMApr 20
to em...@googlegroups.com
Let me reiterate this question:

- Project folder storage : EHD
Umm?  EHD?  External hard drive? Meaning a USB-attached drive?

Bullitt, Esther

unread,
Apr 21, 2026, 8:54:01 AM (14 days ago) Apr 21
to em...@googlegroups.com
Hi,

I am having trouble extracting particles for cryoEM subtomo averaging.
This is a different computer from the questions with Raven.

1) output from e2version.py
EMAN 2.99.47 ( GITHUB: 2023-03-04 13:31 - commit: 3f313008c3185410fe859663e763dffb9c0b6fcc )
Your EMAN2 is running on: Linux-4.18.0-553.89.1.el8_10.x86_64-x86_64-with-glibc2.28 4.18.0-553.89.1.el8_10.x86_64
Your Python version is: 3.9.16
2) exact command you ran to produce the failure, and if possible, the output from that command
e2spt_extract.py tomograms/ts_10__bin4.hdf --boxsz_unbin=512 --label=curve_00 --newlabel=test_bin4_nooverlap --threads=24 --maxtilt=100 --padtwod=2.0 --shrink=4.0 --tltkeep=1.0 --rmbeadthr=-1.0 --curves=0 --curves_overlap=0.0 --compressbits=8
3) if it relates to a single particle reconstruction, box size in pixels and microscope voltage
It is cryoET, and the tomograms are bin4, info files have individual particles as well as curve data,
I am requesting bin4 particles from the curves, and size in original tilt series would be 512
4) if running programs from the GUI, are you using e2workflow.py (deprecated) or e2projectmanager.py ?
e2projectmanager.py

Error message:

NOT Writing notes, ppid=-2
Reading from tomograms/ts_10__bin4.hdf...
CTF information exists. Will do phase flipping..
Extracting from tilt series tiltseries/ts_10.lst...
### scale by 1.0
Reading particle location from a tomogram...
Generating particles along curves...
Traceback (most recent call last):
  File "/spshared/apps/EMAN2/2.99/bin/e2spt_extract.py", line 1063, in <module>
    main()
  File "/spshared/apps/EMAN2/2.99/bin/e2spt_extract.py", line 162, in main
    do_extraction(a, options)
File "/spshared/apps/EMAN2/2.99/bin/e2spt_extract.py", line 347, in do_extraction
    bxs=np.vstack(bxs)
  File "<__array_function__ internals>", line 5, in vstack
  File "/spshared/apps/EMAN2/2.99/lib/python3.9/site-packages/numpy/core/shape_base.py", line 283, in vstack
    return _nx.concatenate(arrs, 0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

In case it is useful, here is the curves section in the info/ts_10_info.json file:

"curves": [[1822.0,-526.0,148.0,0.0,0.0],
[1722.9313133254313,-560.9654188263182,148.0,0.0,0.0],
[1623.8626266508631,-595.9308376526365,148.0,0.0,0.0],
[1524.7198412774264,-630.5771967094447,149.35429421728077,0.0,0.0],
[1425.359920638713,-664.2885983547224,154.67714710864038,0.0,0.0],
[1326.0,-698.0,160.0,0.0,0.0]
],

Thanks,

Raven Gonsoulin

unread,
Apr 21, 2026, 10:05:57 AM (14 days ago) Apr 21
to EMAN2
Yes, the project folder is located on a 20T external HDD

-Raven

Steve Ludtke

unread,
Apr 21, 2026, 11:25:27 AM (14 days ago) Apr 21
to em...@googlegroups.com
Hi Raven,
Ok, again, details matter. Not all external hard drives are the same. You were asking about processing speed, and the difference in speed could mean 100x difference  in processing time. Rather than try and sort out the technical details, you can get some information about your storage speed by:

cd <project folder>
dd if=/dev/zero of=x bs=1G count=10 oflag=direct
produces:
10737418240 bytes (11 GB, 10 GiB) copied, 14.9565 s, 718 MB/s

dd if=x of=/dev/null bs=1G count=10 iflag=direct
produces:
10737418240 bytes (11 GB, 10 GiB) copied, 27.3744 s, 392 MB/s

This shows than my external RAID array mounted via NFS with a 10 Gb ethernet connection can write at 718 MB/s and read at 392 MB/s.

If I do the same on my fast internal M.2 SSD, I get:
10737418240 bytes (11 GB, 10 GiB) copied, 2.18365 s, 4.9 GB/s
10737418240 bytes (11 GB, 10 GiB) copied, 1.48944 s, 7.2 GB/s

About 7x faster writes and 20x faster reads.

So, try this on whatever drive you're using.

I'll add that if it IS a single spinning platter USB connected hard-drive, that you are not only costing yourself a _lot_ of time, but you are also risking your entire project. Individual spinning platter hard drives are NOT reliable. They are fine for carrying your data from the microscope to the computer, or for putting on a shelf with your raw data for emergencies. Unfortunately, they have a _very_ high failure rate in general, particularly if you carry them around, and when they fail, 90% of the time they fail in a (nearly) unrecoverable way.

IF it is an external, USBC connected SSD (not spinning platter), this is different, and you can generally get pretty decent data rates and have fairly low failure rates (you still need backups). However, drives like this are quite pricey, eg (https://www.amazon.com/SanDisk-Portable-Backwards-Compatible-Resistance/dp/B0DN6DK3X4/ref=sr_1_1_sspa)

It is very, very important that when you do your data processing you have the data stored on the fastest available internal storage on the computer, and have a strategy for backups. This is a lesson that every student/postdoc learns at some point, when their external hard drive suddenly fails and loses them weeks or months of work...


--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

Steve Ludtke

unread,
Apr 21, 2026, 11:39:25 AM (14 days ago) Apr 21
to em...@googlegroups.com
Hi Esther, 
my usual suggestion when seeing a 3-year old install of EMAN2 is to first suggest updating to something newer to see if that fixes the problem, but I'm not sure that's what's going on here.

the "curve" code in EMAN2 isn't among the better tested workflows because it's pretty infrequently used. Just looking quickly at your output, it seems like you're asking for a 512 box size with 2x padding and no overlap, so 1k x 1k extracted subtilts, which are then downsampled by 4x to 256.  Looking at your curve, it looks like the total curve length (at full sampling) is just under 500 pixels.  That is, the box is bigger than the entire curve?  The program could easily fail in that situation as there is at most 1 box to create... That's more of a particle picking exercise than a path-tracing exercise.

Anyway, not 100% sure that's the problem, but that's my first guess.

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.

Steve Ludtke

unread,
Apr 21, 2026, 11:43:44 AM (14 days ago) Apr 21
to em...@googlegroups.com
Just so I don't get yelled at by some reader, IF you are working at a facility which has centrally managed storage, with high-bandwidth connections to the workstations, then that's fine. It doesn't absolutely have to be internal storage on the machine. However, even most centers with high-speed central storage cannot get close to the bandwidth provided by a gen 5 M.2 SSD directly on the machine, so local scratch storage can still be the way to go  :^)

Raven Gonsoulin

unread,
Apr 21, 2026, 4:21:12 PM (13 days ago) Apr 21
to em...@googlegroups.com
Wow, thank you for these very in-depth explanations. We do keep several backups for the spinning platter hard drives and have not run into any failures yet, thankfully. It seems that the primary issue is with the computer itself as I checked processing speeds of the external drives, ssd, and raid but all seem to have similar slow speeds of 100-200 MB/s. For now, we are moving the data to internal storage of a newer machine that should handle the RAM problems. Thank you very much for your help!

-Raven


You received this message because you are subscribed to a topic in the Google Groups "EMAN2" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eman2/ctAkyrLEVcM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eman2+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eman2/67BD1932-49BD-49D1-B198-66CB877F43B8%40gmail.com.
Reply all
Reply to author
Forward
0 new messages