Questions About Classification

黄振

unread,

Nov 1, 2022, 3:20:58 AM11/1/22

to EMAN2

Dear EMAN2 Community,

I am trying to do classification to differentiate between good particles and bad particles, and my expected result is that there would be one or two classes containing lots of bad particles while the other classes contains screened particles, so that I can get better result and higher resolution structure after refinement.

However, I did not get a desirable result by using the e2spt_refinemulti_new.py and e2spt_classify_byproj.py. I am still confused about the classification methods and here are my questions:

1) When I use the e2spt_refinemulti_new.py to do classification on my refined result to exclude bad particles, the particles are divided into five classes but the number of particles in each class is almost equivalent. So I am wondering if the parameters I set are inappropriate.

My command is:

e2spt_refinemulti_new.py --ptcls spt_07/aliptcls3d_05.lst --niter 10 --maxres 15 --nref 5 --loadali3d --parallel thread:40

My result is:

截屏2022-11-01 15.06.57.png

They look almost the same.

2) I have tried the e2spt_classify_byproj.py to do PCA-based classification. It seems that the result is better than the result using e2spt_refinemulti_new.py, because the number of particles in each class is distinguished. In addition, when I open the classes_sec_05_00.hdf, I find that the three results presented are markedly different.

However, I feel confused about the HDF files the program has generated, like classes_sec_xx_00.hdf. And I don't know how to review and refine the result. (I cannot open the aliptcls_xx_00_xx.hdf). Could someone explain it in detail?

My command is:

e2spt_classify_byproj.py --path=spt_07 --ncls=5 --sym=c1 --shrink=1 --threads=40 --write3d --verbose=9 --lp=25 --saveali

My result is:

截屏2022-11-01 15.18.38.png

Thanks for your kind reply in advance!

Zhen Huang

ZheJiang University

Steve Ludtke

unread,

Nov 1, 2022, 9:57:37 AM11/1/22

to em...@googlegroups.com

The point of traditional "classification" is to identify groups of particles which are similar to each other. This is true regardless of which of the software packages you use, as they all tend to implement iterative classification in a similar way. So, it depends on what you mean by "bad particles" on whether classification is a good solution. It has become common practice in Relion to classify your data, then throw away all of the particles in the "bad" classes, but in most cases, this discards large numbers of particles containing useful information about the solution variability of your particle along with the "junk". The key thing about true "bad particles", ie particles which are either not the particle of interest at all, or are severely corrupted by something like contamination, is that they don't look like each other. Each one is unique. So there is really no reason for them to fall together into a single class.

Now, if by "bad particles" you mean something like dissociated large subunits in an 80s ribosome prep, then yes, that type of "bad particle" can, indeed, be extracted by classification. True junk is fairly effectively eliminated through use of the various "keep" parameters in EMAN2 refinements, which allow you to discard the worst particles (temporarily) during refinement. In this situation "worst" means dissimilar by some metric to the model being refined. Unlike classification, this method will discard outliers which don't need to look like each other in any way.

Having said that, emerging methods like the GMM tool in EMAN2 can map your particles to a manifold where clustering algorithms could actually be used to do some level of elimination of "bad" particles, particularly in cases where the particles themselves are reasonably structurally homogenous. The GMM tool is now capable of working with tomography data (in the original publication it only worked with single particle data, http://www.ncbi.nlm.nih.gov/pmc/articles/pmc8363932/).

I am currently running weekly tutorial sessions on EMAN2, which are getting posted to YouTube each week. We just finished the basic single particle analysis tutorial, and will be doing subtomogram averaging next couple of weeks, followed by the GMM tools. I will introduce the new GUI tool for the GMMs then. The current GMM tool is focused on running various scripts in conjunction with Jupyter Labs for some things. The new GUI tool is fully integrated, unless you need to do something out of the box, but I haven't actually written the tutorial for it yet.

----

Ok, I realize I haven't actually answered your original question. e2spt_classify_byproj performs subtomogram classification by running PCA on sets of 3 orthogonal slices through each (aligned) subtomogram. You can control how thick the slices are. This does not eliminate problems with the missing wedge, of course, but it does tend to reduce them a bit to permit PCA. Both the GMM and the e2spt_refine_multi approaches do a better job with missing wedge. Indeed the inability of PCA to deal with missing information well was actually a major motivation for the GMM development.

The typical outputs from e2spt_classify_byproj in the spt_XX folder will look like:

spt_01/alisecs_02_04.hdf 19997 images in HDF5 format (swap) 252 x 84

spt_01/classes_02_04.hdf 3 images in HDF5 format (swap) 84 x 84 x 84

spt_01/classes_basis_02_04.hdf 4 images in HDF5 format (swap) 252 x 84

spt_01/classes_sec_02_04.hdf 3 images in HDF5 format (swap) 252 x 84

alisecs contains the orthogonal slices for each individual aligned particle

classes_basis contains the results of the PCA analysis on alisecs

classes_sec contains the slices through the class-averages

classes contains the actual 3-D class-averages

You can optionally ask it to save the individual aligned particles as well, if you like, but there is relatively little point in doing that.

The program ALSO generates a new file in sets/ for each class, numbered in a way that corresponds to the results above. This is probably what you're looking for...

--------------------------------------------------------------------------------------
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry 
Deputy Director, Advanced Technical Cores                   and Molecular Biology
Academic Director, CryoEM Core
Co-Director CIBR Center

On Nov 1, 2022, at 2:20 AM, 黄振 <zhenhu...@gmail.com> wrote:

Dear EMAN2 Community,

I am trying to do classification to differentiate between good particles and bad particles, and my expected result is that there would be one or two classes containing lots of bad particles while the other classes contains screened particles, so that I can get better result and higher resolution structure after refinement.

However, I did not get a desirable result by using the e2spt_refinemulti_new.py and e2spt_classify_byproj.py. I am still confused about the classification methods and here are my questions:

1) When I use the e2spt_refinemulti_new.py to do classification on my refined result to exclude bad particles, the particles are divided into five classes but the number of particles in each class is almost equivalent. So I am wondering if the parameters I set are inappropriate.

My command is:

e2spt_refinemulti_new.py --ptcls spt_07/aliptcls3d_05.lst --niter 10 --maxres 15 --nref 5 --loadali3d --parallel thread:40
My result is:

<截屏2022-11-01 15.06.57.png>

They look almost the same.
2) I have tried the e2spt_classify_byproj.py to do PCA-based classification. It seems that the result is better than the result using e2spt_refinemulti_new.py, because the number of particles in each class is distinguished. In addition, when I open the classes_sec_05_00.hdf, I find that the three results presented are markedly different.

<WechatIMG1856.jpeg>

However, I feel confused about the HDF files the program has generated, like classes_sec_xx_00.hdf. And I don't know how to review and refine the result. (I cannot open the aliptcls_xx_00_xx.hdf). Could someone explain it in detail?
My command is:

e2spt_classify_byproj.py --path=spt_07 --ncls=5 --sym=c1 --shrink=1 --threads=40 --write3d --verbose=9 --lp=25 --saveali
My result is:

<截屏2022-11-01 15.18.38.png>

Thanks for your kind reply in advance!
Zhen Huang
ZheJiang University

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/aaa6e759-65bc-40ef-a88e-03c6f6746a4cn%40googlegroups.com.
<截屏2022-11-01 15.06.57.png><WechatIMG1856.jpeg><截屏2022-11-01 15.18.38.png>

Reza Khayat

unread,

Nov 2, 2022, 9:03:33 AM11/2/22

to EMAN2

Hi Steve,

Can you please post a schedule for the YouTube STA tutorials? Thanks.

Best wishes,
Reza

Ludtke, Steven J.

unread,

Nov 2, 2022, 10:41:43 AM11/2/22

to em...@googlegroups.com

I haven't broadly publicized these tutorial sessions for active participation, trying to keep the number of participants low enough for active discussion. Having said that, I've had limited success in getting people to actually volunteer to discuss their own results publicly, so I suppose I can at least open it up to people on this list for the remaining tutorials. The point of these sessions is to discuss the results of following the tutorials, so limited time is spent actually going through the process of running the tutorials. The expectation is that each user will complete the appropriate tutorial in advance of the discussion session.

The recorded videos are being posted here as they occur:

EMAN2 Tutorial Discussions

youtube.com

These are the tutorials. Note that the subtomogam averaging tutorial has been heavily revised to discuss new software developed over the last 2 years:

attachment:EMAN2_GUITutorial.pdf of EMAN2/Tutorials - EMAN Wiki

blake.bcm.edu

attachment:EMAN2_SingleParticleTutorial.pdf of EMAN2/Tutorials - EMAN Wiki

blake.bcm.edu

EMAN2/e2TomoSmall - EMAN Wiki

blake.bcm.edu

The planned (and past) schedule is Monday mornings 9:30 - ~10:30, Houston time (CST):

10/17/2022 Introduction lecture to single particle analysis and micrograph quality assessment

10/24/2022 Discussion of the single particle tutorial through 2-D class averaging

10/31/2022 Discussion of the single particle tutorial through 3-D refinement, including results of the new refinement program not in the tutorial

11/07/2022 Discussion of the new subtomogram averaging tutorial, session 1

11/14/2022 Hiatus due to travel

11/21/2022 Discussion of the subtomogram averaging tutorial, session 2

11/28/2022 Thanksgiving hiatus

12/05/2022 Discussion of the new gui tool for GMM variability analysis (new tutorial isn't posted yet)

12/12/2022 May or may not be necessary?

Finally, if you wish to participate in-person, this is the Zoom link. Please don't cross post to other mailing lists, but you can directly share with acquaintances.

Join our Cloud HD Video Meeting

bcm.zoom.us

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/ca45306c-895f-4abe-a757-8074f229c1ban%40googlegroups.com.

favicon.ico

zoom.ico

David Boyer

unread,

May 24, 2023, 9:39:20 PM5/24/23

to EMAN2

Hello all,

I am also running e2spt_refinemulti_new.py and have a question. I ran the following command:

e2spt_refinemulti_new.py --ptcls spt_class000/aliptcls3d_05.lst --niter 5 --nref 3 --maxres 15 --parallel=thread:96 --threads=96 --path spt_class000/K3/ --loadali3d

My question is if with --loadali3d (and without --skipali), does the program only do local alignments?

The reason I ask is because thus far I have done the following:

-I started processing my subtomograms by running the new initial model generator and asking for 5 classes.

-I then ran 5 independent refinements of all my subtomograms against each of the 5 initial models. One of the 5 refinements (class000) clearly went further than the other 4 and I want to investigate it further.

-I am fairly certain I have heterogeneity in my particle set, with particles belonging to different structures present

-Therefore, I tried the above classification approach

I am wondering if my above approach was appropriate for being able to classify all the particles that do belong to the same structure (class000) into the same class and separating them from the other particles that may come from an unknown number of other structures. I presume that local alignment may not allow for such a coarse classification of my particles.

Thanks in advance,

David

Muyuan Chen

unread,

May 24, 2023, 9:53:48 PM5/24/23

to em...@googlegroups.com

Yes it only does local alignment with —loadali3d. You can also try it without —loadali3d, but it will be slow and may have worse convergence. It may be safer to do single model refinements after the classication to get the orientation right.

Muyuan

On May 24, 2023, at 6:39 PM, David Boyer <davb...@g.ucla.edu> wrote:

Hello all,

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/f8f1f01a-bccf-4259-b6cb-7cbea455b862n%40googlegroups.com.

David Boyer

unread,

May 25, 2023, 4:29:40 PM5/25/23

to EMAN2

Thank you for your answer.

I have another question: does e2spt_refinemulti_new.py use the --keep idea at all? I notice that it still prints out the lines:

Excluding particles on 3D particle score. Now 100.0% left
Excluding particles on 2D score. Now 100.0% left

but there is no --keep option. I presume --keep was deemed unhelpful for multi-reference refinement/classification?

Thanks,

David

Muyuan Chen

unread,

May 25, 2023, 4:36:05 PM5/25/23

to em...@googlegroups.com

No, —keep is not used. It just seems that to have particles not in any class is strange. I haven’t really tested with the option myself…

Muyuan

On May 25, 2023, at 1:29 PM, David Boyer <davb...@g.ucla.edu> wrote:

Thank you for your answer.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/f50ce374-5c8b-4121-83e1-27821be34bdfn%40googlegroups.com.

David Boyer

unread,

May 25, 2023, 5:23:12 PM5/25/23

to em...@googlegroups.com

Thank you.

I have another question. I am now running alignment and classification for all particles against all five of the initial models generated. My command is below:

e2spt_refinemulti_new.py sptsgd_03/output_cls0.hdf sptsgd_03/output_cls1.hdf sptsgd_03/output_cls2.hdf sptsgd_03/output_cls3.hdf sptsgd_03/output_cls4.hdf --ptcls=sets/tomobox_3.lst --niter 15 --maxres 15 --parallel=thread:96 --threads=96 --path Class3D/InitialModel_K5/

If I want to stop and restart the job for any reason (e.g., to adjust the --maxres, switch workstations), how would I do so?

Thanks,

Davdi

You received this message because you are subscribed to a topic in the Google Groups "EMAN2" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eman2/f3YFCEo0mJI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/6BB06BB0-9FB3-43F1-8E50-3D9FD38EDEF1%40gmail.com.

Muyuan Chen

unread,

May 25, 2023, 5:35:29 PM5/25/23

to em...@googlegroups.com

I only wrote resuming options for spt_refine_new, not for the multi version. I normally only add the option when I encounter a crash myself.

Saying that, I think you can just use the maps of each class at the last iteration as reference and it should work almost as well.

On May 25, 2023, at 2:23 PM, David Boyer <davb...@g.ucla.edu> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOMZPBYmPjCWDDqGeuk3Od5EBcbw8rwb%3Ds%3D5_EgfrQR4Wg%40mail.gmail.com.

David Boyer

unread,

Jun 1, 2023, 8:08:58 PM6/1/23

to em...@googlegroups.com

Got it, thanks.

I recently ran a multi-reference refinement with 3 classes and now I would like to refine each class individually using the corresponding particles that were assigned to each class from the multi-reference refinement. Below is a command that would take the particles and reference from class00 (the first of the three classes from the previous multi-reference refinement) and perform goldstandard refinement.

e2spt_refine_new.py --ptcls=spt_class000/K3_global/aliptcls3d_07_01.lst --ref=spt_class000/K3_global/threed_07_00.hdf --startres=50.0 --goldstandard --sym=c1 --iters=p3,t,p2,t,r --keep=0.95 --parallel=thread:96 --threads=96 --ppid=-2 --path spt_class000/K3_global/class00

My question is, does this refinement take any orientations from the previous refinements, or just the particle IDs? I notice there is an option for both --loadali3d and --loadali2d, but I wasn't sure if it was typical to use that for goldstandard refinement, or if it would trigger local searches only?

Thanks,

David

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/0E269859-B97B-4996-AE3B-1E385E6F8D1C%40gmail.com.

Muyuan Chen

unread,

Jun 1, 2023, 8:26:20 PM6/1/23

to em...@googlegroups.com

Make sure to use the particles and reference from the same class. i.e. aliptcls3d_07_01.lst and threed_07_01.hdf. If you do gold standard refinement, you probably don't need to loadali2d or 3d. But the local search will not trigger unless you specify --localrefine.

Muyuan

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOOJ7MkX2uJUFKPAWkNgByFM6QF-ABcsJf9QBi5%2Bkp-9Kg%40mail.gmail.com.

David Boyer

unread,

Jun 1, 2023, 8:45:27 PM6/1/23

to em...@googlegroups.com

The particles/reference mismatch was a typo, thanks for catching!

I have another question. How do the --goldcontinue and --continuefrom options work together in e2spt_refine_new.py? I can see in the tutorial that --goldcontinue can be used, in conjunction with specifying the appropriate reference, 3d, and 2d orientations, as well as new --iters, to continue from the last iteration. What would be an example of how the --continuefrom option would be used? If one wanted to continue from an iteration other than the last one?

David

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAO_0xqEuPcQHHy5wM1P%2Bi_k7NqmGFuumcJ_ycxMCUtR06pogiA%40mail.gmail.com.

Muyuan Chen

unread,

Jun 1, 2023, 8:59:11 PM6/1/23

to em...@googlegroups.com

--continuefrom is mainly used to rescue a crashed refinement run. In newer versions it can be used in combination with e2spt_gathermeta.py to inherit alignment information from other particles.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOML7ss8KMvPg1Z%3DzeZAMdE-bZrWMJVj%2Bxy2fVOYUPrjUg%40mail.gmail.com.

David Boyer

unread,

Jun 1, 2023, 10:23:09 PM6/1/23

to em...@googlegroups.com

Another question about continuing a refinement run. (This is for a e2spt_refine_new.py job, so it may be a bit off topic for this thread which started with classification questions). I notice that despite my previous iteration having a gold-standard resolution of ~15A, when I continue refinement, the resolution heads back down to ~30 A. In looking at the json file, it shows that the startres is still set to 50. Do I have to manually change the startres to be 15A? I assumed the program would automatically recognize the resolution from the iteration it is picking up from. The command for the initial and continued refinements are below:

Initial Refinement

e2spt_refine_new.py --ptcls=spt_class000/K3/aliptcls3d_05_00.lst --ref=spt_class000/K3/threed_05_00.hdf --startres=50.0 --goldstandard --sym=c1 --iters=p3,t,p2,t,r,d --keep=0.95 --maxres=0.0 --minres=0.0 --parallel=thread:96 --threads=96 --ppid=-2 --path spt_class000/K3/class00

Continued Refinement

e2spt_refine_new.py --ptcls=spt_class000/K3/class00/aliptcls3d_06.lst --ref=spt_class000/K3/class00/threed_09.hdf --goldcontinue --sym=c1 --iters=r,p,t,r,p,t --keep=0.95 --parallel=thread:96 --threads=96 --ppid=-2 --path spt_class000/K3/class00/continue_09 --loadali3d --loadali2d spt_class000/K3/class00/aliptcls2d_09.lst

Also, I noticed that the continued refinement re-starts the numbering of the output files from 00, so I made a new directory for the output of the continued refinement so it would not overwrite the files from the initial refinement. Is there any way to get the program to output the files to continue the numbering scheme from its parent refinement?

Thanks,

David

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAO_0xqF6uVDc_Vo2%2BTYNuHrzR6ywGhSUZXkrtHmkRxkFo2gBEw%40mail.gmail.com.

Muyuan Chen

unread,

Jun 1, 2023, 10:28:53 PM6/1/23

to em...@googlegroups.com

Yes you need to set --startres to the resolution from the previous run manually. It does not read the number from FSC files automatically.

--goldcontinue assumes you start a new directory. Only --continuefrom works in the same directory, since it expects a crashed refinement...

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOM9oD-%2BHU3Mp8j9L8qJ6H2Z%3DYBfjwSLrj%3D2YjnEohGKWA%40mail.gmail.com.

David Boyer

unread,

Jun 1, 2023, 10:33:15 PM6/1/23

to em...@googlegroups.com

Ok, I believe I have a much clearer picture now. Thank you!

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAO_0xqFZAEJv4Fp%2BfisS%3DDoaufS%2BfRSQEXSJweEQQvubqF06nQ%40mail.gmail.com.

David Boyer

unread,

Jul 20, 2023, 9:37:05 PM7/20/23

to em...@googlegroups.com

Hi again,

I recently encountered a crash using e2spt_refinemulti_new.py (more specifically, my job ran for too long and the cluster system cancelled it. I've fixed that now.). I am "re-starting" from the last iteration and wanted to make sure I did things as you might recommend. Essentially, I used e2proclst.py to gather all the last iteration's aliptcls3d lst files into one lst file, then I started the program again using the last iteration's references as input.

Initial Command

e2spt_refinemulti_new.py InitialModel/K5/output_cls0.hdf InitialModel/K5/output_cls1.hdf InitialModel/K5/output_cls2.hdf InitialModel/K5/output_cls3.hdf InitialModel/K5/output_cls4.hdf --ptcls=sets/tomobox_2_bin2.lst --niter 15 --maxres 15 --parallel=threads:64 --threads 64 --path Class3D/all_bin2_R1_K5/

Continue Command

e2spt_refinemulti_new.py Class3D/all_bin2_R1_K5/threed_01_00.hdf Class3D/all_bin2_R1_K5/threed_01_01.hdf Class3D/all_bin2_R1_K5/threed_01_02.hdf Class3D/all_bin2_R1_K5/threed_01_03.hdf Class3D/all_bin2_R1_K5/threed_01_04.hdf --ptcls Class3D/all_bin2_R1_K5/all.lst --niter 14 --maxres 15 --parallel=threads:64 --threads 64 --path Class3D/all_bin2_R1_K5/ct01/

As you can see, I only made it through one iteration (~17,000 subtomograms (3d box size 84, 2d box size 168) and 5 classes will do that I guess!). I did not use --loadali3d because I still want to do global searches. But I guess my question is, there really is no use to take the alignment info for the 3d/2d particles from previous iterations when "re-starting"? I guess this is because there are no "priors" on the orientations?

Thanks!

Muyuan Chen

unread,

Jul 20, 2023, 9:49:33 PM7/20/23

to em...@googlegroups.com

If you use —loadali3d, it will read the prior orientation from the particle list and also do a local alignment search. Without it, the program will do global search at every iteration, so the orientation from the previous iteration will not be used even when provided. So using the original list from sets/ would be the same as the list you gathered using e2proclst. Also since it will align each particle to all reference, the process is slow.

On Jul 20, 2023, at 6:37 PM, David Boyer <davb...@g.ucla.edu> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOMS2g%2BBgRbqzMM_JxTheEHxfy2Q5qK8WujFEP2nvpZhFQ%40mail.gmail.com.

Ludtke, Steven J.

unread,

Jul 20, 2023, 9:56:12 PM7/20/23

to em...@googlegroups.com

Typically when I've looked at data sets like this, I first try doing a single model refinement, to get all of the particles into a fairly self-consistent orientation irrespective of class, then do a e2spt_refinemulti_noali or e2gmm to classify (very fast) without alignment, then do individual refinements on each of the extracted subsets to get the orientations right. It's a much more computationally efficient process, and works well for many, but not all, problems. If your particles won't align in a single pretty self consistent orientation with a single reference, then it may not work and the much more computationally intensive full multimodel refinement may be necessary.

--------------------------------------------------------------------------------------

Steven Ludtke, Ph.D. <slu...@bcm.edu> Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology Dept. of Biochemistry

Deputy Director, Advanced Technology Cores and Molecular Pharmacology

Academic Director, CryoEM Core
Co-Director CIBR Center

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOMS2g%2BBgRbqzMM_JxTheEHxfy2Q5qK8WujFEP2nvpZhFQ%40mail.gmail.com.

David Boyer

unread,

Jul 21, 2023, 4:33:59 PM7/21/23

to em...@googlegroups.com

Hi Muyuan,

Thanks for clarifying - I was thinking using the original list from sets/ would be the same, but now I am sure.

Steve,

I understand your suggestion and I will give it a try - it is quite possible all the species have a similar shape and your strategy will work.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/8C0C44E2-8798-4797-A24B-00B4AA2C4C10%40bcm.edu.

David Boyer

unread,

Jul 28, 2023, 9:27:01 AM7/28/23

to EMAN2

Hi,

I am trying to do classification without alignment after a gold-standard refinement job. Any ideas what went wrong?

I used the following command:

e2spt_refinemulti_new.py Class3D/all_bin2_R1_K5_cputest2/class_00/threed_08.hdf --ptcls=Class3D/all_bin2_R1_K5_cputest2/class_00/aliptcls3d_06.lst --maxres=15.0 --sym=c1 --niter=100 --nref=3 --parallel=threads:64 --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --loadali3d --skipali

The program had an error during the second iteration:

Traceback (most recent call last):
File "/home/davboyer/cryoem/miniconda3/envs/eman2/bin/e2spt_refinemulti_new.py", line 267, in <module>
main()
File "/home/davboyer/cryoem/miniconda3/envs/eman2/bin/e2spt_refinemulti_new.py", line 196, in main
save_lst_params(ali3dpms[i], ali3d[i])
File "/home/davboyer/cryoem/miniconda3/envs/eman2/lib/python3.9/site-packages/EMAN2.py", line 2812, in save_lst_params
if len(lst)==0: raise(Exception,"ERROR: save_lst_params with empty list")
TypeError: exceptions must derive from BaseException

And here is the tail end of the log file:

######################
#### iteration 2
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_00.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 2 --parallel threads:64 --skipali --maxres 15.0
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_01.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 2 --parallel threads:64 --skipali --maxres 15.0
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_02.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 2 --parallel threads:64 --skipali --maxres 15.0
class 0 - 3442 particles
66.8% particles changed classes
class 0 - 33.21% particle overlap; 0.00% particle changed orientation

Muyuan Chen

unread,

Jul 28, 2023, 10:39:43 AM7/28/23

to em...@googlegroups.com

Remove the reference (Class3D/all_bin2_R1_K5_cputest2/class_00/threed_08.hdf) should work. You set —nref=3 but give it only one reference. Either give it multiple references, or omit references and set only —nref.

Muyuan

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/087e4c69-c22a-41db-9b2c-e5197730444an%40googlegroups.com.

David Boyer

unread,

Jul 28, 2023, 12:52:54 PM7/28/23

to em...@googlegroups.com

Hmm, I tried and it actually gave me the same error. Any ideas?

Command:

e2spt_refinemulti_new.py --ptcls=Class3D/all_bin2_R1_K5_cputest2/class_00/aliptcls3d_06.lst --maxres=15.0 --sym=c1 --niter=100 --nref=3 --parallel=threads:64 --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --loadali3d --skipali

Error:

Traceback (most recent call last):
File "/home/davboyer/cryoem/miniconda3/envs/eman2/bin/e2spt_refinemulti_new.py", line 267, in <module>
main()
File "/home/davboyer/cryoem/miniconda3/envs/eman2/bin/e2spt_refinemulti_new.py", line 196, in main
save_lst_params(ali3dpms[i], ali3d[i])
File "/home/davboyer/cryoem/miniconda3/envs/eman2/lib/python3.9/site-packages/EMAN2.py", line 2812, in save_lst_params
if len(lst)==0: raise(Exception,"ERROR: save_lst_params with empty list")
TypeError: exceptions must derive from BaseException

Log:

######################
#### iteration 1
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_00.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 1 --parallel threads:64 --skipali --maxres 15.0
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_01.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 1 --parallel threads:64 --skipali --maxres 15.0
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_02.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 1 --parallel threads:64 --skipali --maxres 15.0
class 0 - 1153 particles
class 1 - 1147 particles
class 2 - 1142 particles
e2spa_make3d.py --input Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls2d_01_00.lst --output Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_00.hdf --keep 1 --parallel threads:64 --outsize 84 --pad 168 --sym c1 --no_wt
e2proc3d.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_00.hdf Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_00.hdf --process filter.lowpass.gauss:cutoff_freq=0.06666666666666667 --process normalize.edgemean
e2spa_make3d.py --input Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls2d_01_01.lst --output Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_01.hdf --keep 1 --parallel threads:64 --outsize 84 --pad 168 --sym c1 --no_wt
e2proc3d.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_01.hdf Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_01.hdf --process filter.lowpass.gauss:cutoff_freq=0.06666666666666667 --process normalize.edgemean
e2spa_make3d.py --input Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls2d_01_02.lst --output Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_02.hdf --keep 1 --parallel threads:64 --outsize 84 --pad 168 --sym c1 --no_wt
e2proc3d.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_02.hdf Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/threed_01_02.hdf --process filter.lowpass.gauss:cutoff_freq=0.06666666666666667 --process normalize.edgemean

######################
#### iteration 2
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_00.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 2 --parallel threads:64 --skipali --maxres 15.0
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_01.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 2 --parallel threads:64 --skipali --maxres 15.0
e2spt_align_subtlt.py Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliptcls3d_00.lst Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align/aliref_02.hdf --path Class3D/all_bin2_R1_K5_cputest2/class_00/skip_align --iter 2 --parallel threads:64 --skipali --maxres 15.0
class 0 - 3442 particles

66.5% particles changed classes
class 0 - 33.50% particle overlap; 0.00% particle changed orientation

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/12A5E41A-CE79-4FCA-AF97-0AFDA18E2668%40gmail.com.

Muyuan Chen

unread,

Jul 28, 2023, 1:02:30 PM7/28/23

to em...@googlegroups.com

Can you check the results from the 1st iteration and see if there is anything abnormal in the density maps? It is quite strange that the program assigned all particles to the first class even without a reference. This does not happen normally and it is hard to guess what went wrong from the error message alone.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOPHzK8DbO%2BaFrg5TDvSRgga_sXgtGsZ3c7gr0u9sWo0%3Dw%40mail.gmail.com.

David Boyer

unread,

Jul 28, 2023, 6:02:22 PM7/28/23

to em...@googlegroups.com

Hi,

The only thing I notice is that the references for iteration 1 seem to be all 100% identical (density overlaps perfectly in chimera). On the other hand, the starting references seem to have some (subtle) variation in them.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAO_0xqGiEv8upTegZBb9QrbirguC%2B4P7JKA_o8ksCMYkr2dDkg%40mail.gmail.com.

Muyuan Chen

unread,

Jul 28, 2023, 6:07:57 PM7/28/23

to em...@googlegroups.com

Oh. Can you update your Eman to the continuous build? It might be a problem that we’re solved earlier.

On Jul 28, 2023, at 3:02 PM, David Boyer <davb...@g.ucla.edu> wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjONpWFBiPPx_Fv92s0w9L5k4RrOFbFJt0HnTPfbu3kJjEA%40mail.gmail.com.

David Boyer

unread,

Jul 29, 2023, 6:18:58 AM7/29/23

to em...@googlegroups.com

Using the continuous build allows the program to run to completion (in this case 100 iterations). However, I noticed that the particles split equally among the three classes. This is expected at the beginning, but the split remained pretty much equal the entire job. In the first few iterations there were <1% particles that changed classes and after that 0% of particles changed classes. I suppose this might not be unusual because the refinement I am taking the particles from only went to 15 A (and hence I set the --maxres to 15 for the skip align job) and maybe the particles look mainly similar at that resolution, but I wanted to get your thoughts whether this is surprising or not?

Things I can try next:

-run e2spt_refine_new on the three sets of particles output from the skip align job (~1,100 particles per class) to see if one set is better than the others

-try e2gmm to classify particles instead of e2_refinemulti_new w/ --skipali and then run e2spt_refine_new refinements on each class

-try e2spt_refinemulti_new w/ local orientation refinement to try to encourage separation instead of --skipali and then run subsequent e2spt_refine_new on each class

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/4B1EF2E8-616C-452F-BF2D-2F2EE86057AC%40gmail.com.

Muyuan Chen

unread,

Jul 29, 2023, 1:39:04 PM7/29/23

to em...@googlegroups.com

It depends on what are you expecting…

if you believe some particles are not your target proteins, you may be able to simply get rid of them by specifying a smaller —keep in the single model refinement. The PCA based refinement in theory may also help. You can also just take the central slices of the particles and use some single particle methods.

If you expect large scale structural differences, maybe go back to the initial model generation step and specify a larger —ncls. You can also specify a reference through —ref so it starts from the current result. If some class converge to something different, use that as reference for classification later.

If there are flexible domains in your protein (lower occupancy in some regions in density map), craft a mask to target that domain and use it for classification (—maskali in e2spt_refinemulti_new). It also maybe helpful to specify a lower resolution (30A?) to see if there is some larger scale differences.

Muyuan

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjONNYP5zZG0ne1jHi-b8%3D%2BgOj-kmvoChGKWfBVDwzo2thg%40mail.gmail.com.

David Boyer

unread,

Jul 30, 2023, 3:01:01 AM7/30/23

to em...@googlegroups.com

Helpful ideas - will definitely keep exploring.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/D3E2150D-8D1C-4744-A5F3-26974ED5BEE0%40gmail.com.

Ludtke, Steven J.

unread,

Jul 31, 2023, 7:59:09 AM7/31/23

to em...@googlegroups.com

Note that the e2gmm GUI is now compatible with e2spt refinements, and it can be very useful in classifying continuous or discrete variability. There is a little bit of a learning curve to get started, but it can be very powerful, and reveal relationships that classification completely misses.

https://eman2.org/e2gmm

Week 7 - EMAN2 Tutorial Discussion (e2gmm)

youtu.be

---
Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOPLSjOhUqMNk_WcVoLDNkVqBN5fSsM2E8VK7W6_J1QnQA%40mail.gmail.com.

David Boyer

unread,

Aug 11, 2023, 4:02:37 PM8/11/23

to em...@googlegroups.com

Hi Steven,

I'm getting close to trying out gmm - will follow up soon I'm sure.

I have a different question at the moment. I was pursuing your advice to try to align all particles to one consensus model before then pursuing strategies for dealing with heterogeneity. So I (1) performed ab initio with all particles and one class, (2) gold-standard refinement with all particles against that model, and (3) ran 20 rounds of e2spt_refinemulti_new w/ local search with the below command, taking the orientations from the 8 iteration gold standard refinement in step (2):

e2spt_refinemulti_new.py --ptcls=Class3D/all_bin2_R1_K1/aliptcls3d_06.lst --maxres=10.0 --sym=c1 --niter=20 --nref=3 --parallel=mpi:192:/eisenberg1/scratch/davboyer/ --path Class3D/all_bin2_R1_K1/K3_localrefine --loadali3d --loadali2d Class3D/all_bin2_R1_K1/aliptcls2d_08.lst

In the last few iterations, I noticed that particles were beginning to partition into different classes, as opposed to all classes having equal particle numbers in the first ~15 iterations. I wanted to continue the job to see if the partitioning would continue. I used e2proclst.py to gather the aliptcls3d from all three classes into a new file, but I wasn't sure what to do with the ali2dptcls since for every class, the ali2dptcls file contains all the 2d particles it seems. And if I combine them, I get a .lst file with 3x the number of 2d particles. My goal is to continue from the latest 3d and 2d orientations in an effort to mimic continuing the run as if it never stopped.

I'm curious why the 2d lst files contain all particles while the 3d lst files contain only the particles for a given class. I'm probably missing something simple...

Thanks in advance!

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/BDA30893-3048-4E24-88A4-82FAEDCA1121%40bcm.edu.

Ludtke, Steven J.

unread,

Aug 15, 2023, 9:58:27 AM8/15/23

to em...@googlegroups.com, Muyuan Chen

Hi David, sorry for the slow reply. Son is heading off to college for the first time today and things have been a bit hectic. I do see your point here. Personally, I would probably just use the resulting maps as a set of starting models for a new run rather than "continuing", but you're right that it seems like there should be a way to continue. May need to ask Muyuan about what he intended here...

---

Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjONT%3Dpz-9OVhnPM3ZP8-fPWRKMsCu%3Dw8cWNwCn2R9-i1fw%40mail.gmail.com.

David Boyer

unread,

Aug 17, 2023, 6:09:17 PM8/17/23

to em...@googlegroups.com

Hi Steven,

No problem, I hope everything went well!

In the meantime, I started playing with e2gmm.py. I have a few questions on that.

I initially ran a job (17k subtomos, ~25 tilts each) using pretty default parameters. I had 124 total gaussian blobs (I think ~80 positive and ~40 negative). And several other ~default parameters (shown in the attachment pg. 1). I ran into the problem you call out in the YouTube video of having too many points along a straight line - so I followed your advice and re-ran "Train GMM" with 40 iterations and 0.3 Model Perturb (as opposed to 10 and 0.02 (?)). The points have spread out quite well now (see attachment), but there are still a few areas where there is a straight line in the plot.

I'm wondering, should I try to train the same run further? And, in general, if I click "Train GMM" more than once for a given run - does that start from scratch or does it start from where a previous training left off? For instance, in my example, I think the training may not need another 40 iterations (perhaps 50 iterations with model perturb 0.3 would have done the job). Should I train for 10 more iterations, or 50?

Further, you mention that you may have the model perturb set high to get the points spread out at the beginning, but then lower the model perturb (?) and run more iterations to improve things. I wasn't sure how to go about doing that sort of iterative process. Is that all in the same "run", starting with high model perturb and # of iterations and then running more iterations with a lower model perturb? Or with different runs?

Thanks in advance,

David

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/C4E4EA5E-F142-44DA-BA0A-1CF3396414AD%40bcm.edu.

GMM_Notes.pptx

Ludtke, Steven J.

unread,

Aug 17, 2023, 8:02:38 PM8/17/23

to em...@googlegroups.com

Hi David

On Aug 17, 2023, at 5:08 PM, David Boyer <davb...@g.ucla.edu> wrote:

Hi Steven,

No problem, I hope everything went well!

In the meantime, I started playing with e2gmm.py. I have a few questions on that.

I initially ran a job (17k subtomos, ~25 tilts each) using pretty default parameters. I had 124 total gaussian blobs (I think ~80 positive and ~40 negative).

The ratio of negative to positive blobs is a bit higher than I would usually target. Don't have a really good feel for the impact of this parameter, but I would usually target more like 10-20%. It may be harmless other than speed.

And several other ~default parameters (shown in the attachment pg. 1). I ran into the problem you call out in the YouTube video of having too many points along a straight line - so I followed your advice and re-ran "Train GMM" with 40 iterations and 0.3 Model Perturb (as opposed to 10 and 0.02 (?)). The points have spread out quite well now (see attachment), but there are still a few areas where there is a straight line in the plot.

The number of iterations to converge well seems to vary widely by project, and sometimes just from one run to the next due to how it was initialized. I've seen cases where 100-150 iterations were needed to give something I was really happy with, but most of the time 20-50 seems to be fine.

I'm wondering, should I try to train the same run further? And, in general, if I click "Train GMM" more than once for a given run - does that start from scratch or does it start from where a previous training left off?

As long as you DON'T hit the Resolution or Train Neutral buttons again, the GMM training will continue from where it left off. It is important not to change any of the parameters other than the number of iterations, though, when doing this, or you may run into a crash. If you run Train GMM with 50 iterations, and hit the button again with 50 iterations, it will run for another fifty, or a total of 100. You could probably also change the regularization parameters, but you can't mess with the masks or the selection of Translation/Amplitude, etc. or it will fail.

If you press Train Neutral, you will be starting again from scratch, but if doing this intentionally, I would suggest creating a new GMM rather than overwriting an existing one.

For instance, in my example, I think the training may not need another 40 iterations (perhaps 50 iterations with model perturb 0.3 would have done the job). Should I train for 10 more iterations, or 50?

If you run 50 iterations and are still getting strong lines, I would go for another 50. There are certain combinations of parameters which tend to make this sort of training failure more likely, but it's hard to describe all of the various parameters, and I'm not sure we understand all of the possible causes ourselves. Generally speaking it's an indication of having too many degrees of freedom available for the variability present in the system.

Further, you mention that you may have the model perturb set high to get the points spread out at the beginning, but then lower the model perturb (?) and run more iterations to improve things. I wasn't sure how to go about doing that sort of iterative process. Is that all in the same "run", starting with high model perturb and # of iterations and then running more iterations with a lower model perturb? Or with different runs?

That is one of the few parameters you can alter between "Train GMM" presses.

Thanks in advance,

David

On Tue, Aug 15, 2023 at 6:58 AM Ludtke, Steven J. <slu...@bcm.edu> wrote:

Hi David, sorry for the slow reply. Son is heading off to college for the first time today and things have been a bit hectic. I do see your point here. Personally, I would probably just use the resulting maps as a set of starting models for a new run rather than "continuing", but you're right that it seems like there should be a way to continue. May need to ask Muyuan about what he intended here...

David Boyer

unread,

Aug 18, 2023, 4:36:26 PM8/18/23

to em...@googlegroups.com

Thanks Steven - very helpful!

I was also thinking, using the little intuition I have, that I a higher positive:negative ratio might be better. I'll play with this.

I'm wondering, is it possible to run Train GMM from the command line? My workstation is good for using the GUI and short- to medium-load jobs, but it might be nice to leverage one of our cluster nodes for Train GMM.

David

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to a topic in the Google Groups "EMAN2" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eman2/f3YFCEo0mJI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eman2+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/6049BB15-C1CD-4531-92ED-DB443CA6AC6E%40bcm.edu.

Ludtke, Steven J.

unread,

Aug 18, 2023, 5:35:07 PM8/18/23

to em...@googlegroups.com

On Aug 18, 2023, at 3:36 PM, David Boyer <davb...@g.ucla.edu> wrote:

Thanks Steven - very helpful!

I was also thinking, using the little intuition I have, that I a higher positive:negative ratio might be better. I'll play with this.

I'm wondering, is it possible to run Train GMM from the command line? My workstation is good for using the GUI and short- to medium-load jobs, but it might be nice to leverage one of our cluster nodes for Train GMM.

Maybe? But unless you have a very high end single GPU on the node, it won't help a lot. ie - it's GPU limited, and can use only a single GPU device (though some configurations merge several GPUs into a single "virtual" GPU.

If you look at ".eman2log.txt" in the project directory, you should find the e2gmm_refine_point commands it's actually running. The same command is used for multiple steps, so you'll want to look at the last two commands after using the button. However:

It won't work easily if the number of batches > 1. If it is, you have to run multiple commands (one per batch).

The _aug file is generated internally by e2gmm.py, and I'm not sure if there is a good way to trigger this right now if you do the runs outside the program...

David

David Boyer

unread,

Aug 18, 2023, 7:36:01 PM8/18/23

to em...@googlegroups.com

Got it - I'll keep things simple for now then.

I ran 60 iterations with perturb 0.3 and was able to eliminate lines. What do you think of the result and would you advise running more iterations with a lower perturb value?

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to a topic in the Google Groups "EMAN2" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eman2/f3YFCEo0mJI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eman2+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/22A5289D-FDD8-4423-B668-2179B4B3EC49%40bcm.edu.

David Boyer

unread,

Aug 18, 2023, 9:17:50 PM8/18/23

to em...@googlegroups.com

I also have a follow-up question.

I went ahead and built 4 sets using k-means clustering (w/ axes 2-5). (The first two sets seen on the right were manually made from the cluster of particles at the top of the plot). I then hit "build map" for all sets. Sets 2-5 have a lot of particles (~25 tilts per subtomo) and they give artificially high resolution estimates and correspondingly very "oversharpened"-looking maps. The pixel size right now is 3.032 and box size is 84 (everything is bin2 at the moment), resolution of 2.2 is even beyond the Nyquist. Any idea what's going on? Thinking of running gold-standard refinement after saving each set to see what things look like - open to other suggestions as well.

Thanks!

Ludtke, Steven J.

unread,

Aug 18, 2023, 11:19:57 PM8/18/23

to em...@googlegroups.com

That looks pretty good, ie - it looks like it has found some order to the variability. If you drag the cursor around the edge of the ring where the population seems higher, does the dynamic model seem to change in an interesting way?

You're also only showing us the PCA axes (0/1). More features might be present if you look at axes 2 vs 3 vs 4 vs 5 in various ways.

---

Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

On Aug 18, 2023, at 8:17 PM, David Boyer <davb...@g.ucla.edu> wrote:

I also have a follow-up question.

I went ahead and built 4 sets using k-means clustering (w/ axes 2-5). (The first two sets seen on the right were manually made from the cluster of particles at the top of the plot). I then hit "build map" for all sets. Sets 2-5 have a lot of particles (~25 tilts per subtomo) and they give artificially high resolution estimates and correspondingly very "oversharpened"-looking maps. The pixel size right now is 3.032 and box size is 84 (everything is bin2 at the moment), resolution of 2.2 is even beyond the Nyquist. Any idea what's going on? Thinking of running gold-standard refinement after saving each set to see what things look like - open to other suggestions as well.

It looks like you are using the last release version? The SPT pipeline hasn't changed a lot since then, but e2gmm has been very heavily developed this year. At the time of the last release we had just started doing the reconstructions, and I think the "resolutions" might have been computed in pixels? Even now, the resolution column is not a proper "gold standard" resolution, as there is no independent refinement of the orientations in the two halves of the data. It does give some relative indication of the amount of variability in the various classes. If you want a true resolution from one of the subsets you need to run a regular SPT refinement using the subset. Regardless, as with any of the tools we're actively developing, you'll get a bunch of bugfixes and several new features if you upgrade to a current snapshot version.

Thanks!

<image.png>

On Fri, Aug 18, 2023 at 4:35 PM David Boyer <davb...@g.ucla.edu> wrote:

Got it - I'll keep things simple for now then.

I ran 60 iterations with perturb 0.3 and was able to eliminate lines. What do you think of the result and would you advise running more iterations with a lower perturb value?

<image.png>

You received this message because you are subscribed to the Google Groups "EMAN2" group.

To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjONvrXzJQ5uAK-Y-MjNbY78zeJmRK2cbKU0vNx9dmCxbww%40mail.gmail.com.

David Boyer

unread,

Aug 19, 2023, 5:29:56 PM8/19/23

to em...@googlegroups.com

Hi Steven,

I updated to EMAN 2.99.52 ( GITHUB: 2023-08-19 13:08 - commit: 040cadc1f ). I had been using the continuous build on our cluster as per Muyuan's suggestion to deal with a different problem, but forgot to do so on my own workstation.

However, I still see the same problem with the resolution estimates during build map. I know these resolution estimates and maps aren't definitive, but they do seem helpful for analyzing gmm results. I wonder if there is something else I might be doing wrong?

Also, I do indeed see clustering in other plots comparing different axes. For instance, the one below. However, I see another line popping up in this view. So I should perhaps run some more training iteration (I'm at 60 total iterations with --perturb 0.3 at the moment). The variability when hovering over different regions does seem interesting, i.e., there do appear to be different morphologies of the structure around a more or less consistent central core. I put a second plot below as well that shows a pretty similar pattern. None of the other axis views are that radically different from the patterns I've sent here so far. From what I can tell, the variability seems legitimate and worth pursuing further until proven otherwise. Very cool program btw!

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/7000FC6E-4515-41B1-AFF9-C99949EEBAAC%40bcm.edu.

Ludtke, Steven J.

unread,

Aug 19, 2023, 6:39:14 PM8/19/23

to em...@googlegroups.com

On Aug 19, 2023, at 4:29 PM, David Boyer <davb...@g.ucla.edu> wrote:

Hi Steven,

I updated to EMAN 2.99.52 ( GITHUB: 2023-08-19 13:08 - commit: 040cadc1f ). I had been using the continuous build on our cluster as per Muyuan's suggestion to deal with a different problem, but forgot to do so on my own workstation.

However, I still see the same problem with the resolution estimates during build map. I know these resolution estimates and maps aren't definitive, but they do seem helpful for analyzing gmm results. I wonder if there is something else I might be doing wrong?

if the resolution number is coming up past Nyquist that would seem to imply a problem with the A/pix value somewhere. If you use e2iminfo -H or the Info window in the browser to look at the per-particle tilt series, does the A/pix value show correctly?

If the value is showing up at exactly Nyquist (or possibly Nyquist*sqrt(2)) then the FSC may just not be falling below the 0.5 threshold. Again, these values date back to the old pre-gold standard refinements everyone was doing in the 2000's, where model-bias was allowed to influence the resolution estimates. While the particles are split even-odd, they are not split even-odd in the same way as they were when the orientations were refined. This can produce a significant model bias. This will be exacerbated by the classification pulling in more similar particles. Don't read too much into it. If you want a resolution do a new gold standard refinement with an extracted subset of particles.

Also, I do indeed see clustering in other plots comparing different axes. For instance, the one below. However, I see another line popping up in this view. So I should perhaps run some more training iteration (I'm at 60 total iterations with --perturb 0.3 at the moment). The variability when hovering over different regions does seem interesting, i.e., there do appear to be different morphologies of the structure around a more or less consistent central core. I put a second plot below as well that shows a pretty similar pattern. None of the other axis views are that radically different from the patterns I've sent here so far. From what I can tell, the variability seems legitimate and worth pursuing further until proven otherwise. Very cool program btw!

<image.png>

<image.png>

I wouldn't worry too much about little bits of lines like that involving a few particles. It does look like you are getting some decent separation. Note that in the newer version, when making sets manually, instead of only using a fixed radius can also use a fixed number of particles. It can also generate sets along lines. K-means and other classification methods are good because they can span higher dimensional spaces, but manual classification may better help associate visible features in the latent space with structural changes. Specifying a number rather than a radius will give reconstructions with more comparable "resolutions".

David Boyer

unread,

Aug 19, 2023, 8:32:14 PM8/19/23

to em...@googlegroups.com

The a/pix seems to be correct in the particle tilt series. Result of e2iminfo.py -H on a 3D particle below. The Nyquist is ~6 A and the build map result consistently says 2.2 A. Although it is not the case for sets 0 and 1 which are much smaller.

I looked at the header info of all images in the set_maps.hdf, set_even.hdf, set_odd.hdf files. I was wondering if because I did both quick map and build map, there may have been a confusion somewhere. I don't think I found anything that explains too much. But for some of the maps, I get a weird combination of box size and pixel size (box size 56 and apix 6.064). Is this because quick map may use a different binning/box size combination (i.e., bin X may not give 1/X new box size)?

Also, I checked the ratio of particles with class "0" or "1" in the sets I created to make sure there wasn't anything strange. There's pretty much 50/50 class 0/1, so I don't think that's the problem.

Also, in the particles.lst file in the gmm folder, I notice all the values for "defocus" are 0. Is this because the defocus values are retrieved from the tomogram json files in the info directory (or somewhere else?)?

HostEndian: little
ImageEndian: big
all_int: 0
apix_x: 3.0320000648498535
apix_y: 3.0320000648498535
apix_z: 3.0320000648498535
changecount: 0
class_ptcl_idxs: [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74]
class_ptcl_src: particles/98__tomobox_2_bin2.hdf
datatype: 2
file_twod: particles/98__tomobox_2_bin2.hdf
is_complex: 0
is_complex_ri: 0
is_complex_x: 0
is_fftodd: 0
is_fftpad: 0
maximum: 4.902544021606445
mean: -0.005161418579518795
mean_nonzero: -0.005161418579518795
minimum: -5.806265354156494
model_id: 0
npad: 1
nx: 84
ny: 84
nz: 84
origin_x: 42.0
origin_y: 42.0
origin_z: 42.0
ptcl_source_coord: [416.0, -144.0, -32.0]
render_compress_level: 1
sigma: 0.9999999403953552
sigma_nonzero: 0.9999999403953552
source_n: 0
source_path: 98__tomobox_2_bin2.hdf
square_sum: 592718.75
stored_renderbits: 8
stored_rendermax: 4.250835418701172
stored_rendermin: -4.284306526184082
stored_truncated: 53

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to a topic in the Google Groups "EMAN2" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eman2/f3YFCEo0mJI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eman2+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/AEC65793-8943-4138-B4F5-D11341A51DB1%40bcm.edu.

Ludtke, Steven J.

unread,

Aug 19, 2023, 9:10:48 PM8/19/23

to em...@googlegroups.com

Were the strange numbers "quick" reconstructions or regular reconstructions?

---

Steven Ludtke, Ph.D. <slu...@bcm.edu>                      Baylor College of Medicine
Charles C. Bell Jr., Professor of Structural Biology        Dept. of Biochemistry
Deputy Director, Advanced Technology Cores                  and Molecular Pharmacology
Academic Director, CryoEM Core
Co-Director CIBR Center

You received this message because you are subscribed to the Google Groups "EMAN2" group.

To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOP_UmbpXKJ71aVyVBY2%2Bjf8xteS6mZGBvjaEKkNvDom6g%40mail.gmail.com.

David Boyer

unread,

Aug 19, 2023, 9:19:32 PM8/19/23

to em...@googlegroups.com

The quick.

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/D74C097C-2415-4DE8-900C-5D03949AD03E%40bcm.edu.

Steve Ludtke

unread,

Aug 19, 2023, 9:25:08 PM8/19/23

to em...@googlegroups.com

Probably a bug related to the downsampling then...

-----------------------------------------
Steven Ludtke, slud...@gmail.com

To view this discussion on the web visit https://groups.google.com/d/msgid/eman2/CAE6qjOPQv8qwEyOneVFAjYm3B6MMc6BBQz-EkfE_092%2B8E7Z2Q%40mail.gmail.com.

Reply all

Reply to author

Forward