Removing bad particles

177 views
Skip to first unread message

ashuthe...@gmail.com

unread,
Jan 26, 2017, 12:40:32 PM1/26/17
to EMAN2
Hi all,

I am new to both, the theory and practice of cryo em, therefore apology in advance in I am asking very trivial questions.

I am working towards the structure of a 300kD protein. Using ~20K particles, I made 2d classes in Isac and used those classes to get an initial model with viper. I am using this initial model to refine in eman2 against ~50K particles. However, by doing so I am not able to discard bad particles. One of the eman2 tutorials which I am following suggests to mark bad particles at the 2-D classes level. How do I do this when I am using isac for 2-D classes? Also, what is the best way to discard bad particles, visual inspection or there are some numerical criteria also for this purpose?

Thanks a lot.

Ashu

Paul Penczek

unread,
Jan 26, 2017, 12:46:45 PM1/26/17
to em...@googlegroups.com
Hi

Have you tried to follow tutorial to be found here:

--
--
----------------------------------------------------------------------------------------------
You received this message because you are subscribed to the Google
Groups "EMAN2" group.
To post to this group, send email to em...@googlegroups.com
To unsubscribe from this group, send email to eman2+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/eman2

---
You received this message because you are subscribed to the Google Groups "EMAN2" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eman2+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ashuthe...@gmail.com

unread,
Jan 26, 2017, 1:05:53 PM1/26/17
to EMAN2
Thanks Pawel ! I didn't try tutorial from here. Will follow this.

Thanks,
Ashu

Paul Penczek

unread,
Jan 26, 2017, 1:12:23 PM1/26/17
to em...@googlegroups.com
You will find there very specific step by step instructions all GUI driven (including Isac and viper), finally maximum likelihood structure refinement and many hints how to deal with bad particles. 

Regards,
Pawel

Steve Ludtke

unread,
Jan 28, 2017, 12:48:59 AM1/28/17
to em...@googlegroups.com
I should add that if your goal is to run a 3-D refinement with EMAN, you don't use class-averages, you use the original particles. You can use a starting model from viper if you like, of course, but the refinement itself needs to use the EMAN CTF phase-flipped particles.

Also, if you follow any of the EMAN tutorials from the last year, you will see that we have a new bad-particle removal process which has worked extremely well in general. Again, though, you have to be refining from particles, not class-averages for this to work.

----------------------------------------------------------------------------
Steven Ludtke, Ph.D.
Professor, Dept. of Biochemistry and Mol. Biol.                Those who do
Co-Director National Center For Macromolecular Imaging            ARE
Baylor College of Medicine                                     The converse
slu...@bcm.edu  -or-  ste...@alumni.caltech.edu               also applies
http://ncmi.bcm.edu/~stevel

On Jan 26, 2017, at 12:05 PM, ashuthe...@gmail.com wrote:

ashuthe...@gmail.com

unread,
Jan 28, 2017, 7:46:27 PM1/28/17
to EMAN2

Hi Steve,

Thanks for your response. I am sorry again if I am asking something really basic and trivial.

Using ~20K particles, I generated 2D classes in Isac and used those class-averages to generate an initial model in viper. The initial model looks good. Then I used ~50K EMAN CTF phase-flipped particles to refine the initial model. The refinement goes fine, though the resolution is stuck to ~10A. Therefore, I was wondering that since all the particles are being used during refinement, is there a way to not include bad particles for this purpose. I am following the Eman tutorial of Oct 2015. And if I understood correctly, the tutorial describes how to discard bad particles by making class averages and then discarding bad class averages. (Is that right? ) e2version.py output is following:

EMAN 2.12 (CVS 2015/10/19 09:00:04)
Your EMAN2 is running on:  Linux-2.6.32-642.11.1.el6.x86_64-x86_64-with-redhat-6.8-Santiago 2.6.32-642.11.1.el6.x86_64 x86_64
Your Python version is:  2.7.3



I have another question. The above mentioned refinement I am submitting on a local system. However, if I do CTF correction and phase-flipping using the ctf_auto.py module which is available on a cluster, the only file written in the sets folder is all_ptcls.lst. Although it generates individual lp filtered files, but it doesn't make a list of them. And if I use all_ptcls.lst to refine the same initial model against ~50K particles, I just get background, straight rods ! e2version.py output of the cluster is:
EMAN 2.12 (GITHUB: BUILD_DATE)
Your EMAN2 is running on: Linux-3.10.0-327.el7.x86_64-x86_64-with-redhat-7.2-Maipo 3.10.0-327.el7.x86_64
Your Python version is: 2.7.3

What could be going wrong here ? As now I have ~100K particles and I would like to use all the particles for refinement which I want to do on the cluster, not on the local system.

Thanks so much

Steve Ludtke

unread,
Jan 28, 2017, 8:43:26 PM1/28/17
to em...@googlegroups.com
Hi. "Bad particles" fall into different types. Say you have a lot of ice contamination, or a second molecular species present in your set of boxed particles. These sort of particles can generally be classified in 2-D reasonably well, and then the particles in the identified bad classes discarded. That is the method described in the 2015 tutorial. This method does nothing about getting rid of "pure noise" bad particles, or other types of bad particles with few features in common. In 2-D classification, these particles generally get randomly classified with other views (though turning off the normproj checkbox in 2-D class-averaging can help).

The 2016 summer tutorial (and the new EMAN2.2 tutorial) adds a new method for identifying bad particles, which is basically automatic and gets rid of pretty much any type of bad particle with a high degree of accuracy, UNLESS your particle is too small (<100-200 kDa) or your ice is way too thick. The problem historically with identifying "noise" particles is that as you get closer to focus, the high resolution data gets better, but the particles look more like pure noise. This means that using any single similarity metric to identify bad particles based on weak contrast will start throwing away your best data along with the stronger noise.

The new approach uses a multi-resolution approach (explained in the tutorial) which we have found to do an extremely good job at getting rid of particles which don't contribute to a high resolution refinement, without producing model bias. This approach did not become available until summer, 2016, and was not automated until just before the new 2.2 release version.

Indeed, 2.2 has not been official announced yet because so far we have only managed to get the optimized Linux 64 bit binaries on the website. If you are compiling from source, you can, of course, download from GitHub. We expect to get the rest of the binaries completed over the next week, and then there will be a formal announcement. There are a LOT of improvements since the 2.12 release, which was over a year ago. The new bad particle mechanism is perhaps the least of these.

As to the question about e2ctf_auto, it seems very likely that the version you have on the cluster was checked out from GitHub (unfortunately e2version doesn't work correctly in this case, so we can't tell exactly when that copy was downloaded). It is very likely that this version predates the completion of e2ctf_auto. When you download nightly snapshots, some of the programs you get may still be works-in-progress. We don't hide any of our code, so when someone starts working on a new program it appears almost immediately in github. It won't normally get included in the bin/ folder until it's considered complete, but there will normally be a 'beta testing' period, where the new program is in bin, but not 100% functional yet...


----------------------------------------------------------------------------
Steven Ludtke, Ph.D.
Professor, Dept. of Biochemistry and Mol. Biol.                Those who do
Co-Director National Center For Macromolecular Imaging            ARE
Baylor College of Medicine                                     The converse
slu...@bcm.edu  -or-  ste...@alumni.caltech.edu               also applies
http://ncmi.bcm.edu/~stevel

ashuthe...@gmail.com

unread,
Jan 28, 2017, 10:48:23 PM1/28/17
to EMAN2
Hi Steve,

Thanks so much for your detailed reply. I am working on a 275kD protein. It's a monomer and doesn't have any molecular symmetry. Moreover, I have severe ice problems, therefore given the flexibility of molecule (as per the prediction from sequence) and ice contamination, swarm wasn't working very effectively. Therefore, I ended up picking ~100K particles manually. Since all the particles are manually picked, there are likely to be very few ice contamination. However, I am concerned with probable similar/partially similar projection contaminants. 

I saw the eman 2.2 tutorial now, going through it now.

Thanks a lot !
Ashu

Steve Ludtke

unread,
Jan 28, 2017, 11:21:48 PM1/28/17
to em...@googlegroups.com
I should also add that there is a new e2boxer in EMAN2.2. It currently has 3 different autopickers, one of which is neural network based. This one, in particular, seems to do a very nice job on tricky boxing problems. You train it with both good particles and various regions which do not contain particles. This allows you to train it to skip things like ice contamination. Alas we don't have a tutorial for it yet, but you may be able to figure it out.  Of course if you've already picked them all, this may not be a priority :^)

Note that, particularly for smaller particles or those with flexibility these things are key:
1) the ice must be as thin as you can possibly make it without disturbing the particles
2) you should be using a K2 detector in counting mode. While this may not produce images which are a lot better than other direct detectors, for a tricky project it can be the difference between success and failure
3) If you have some micrographs where the particles have marginal contrast, _throw them away_!  Do not be tempted to keep them "because they may help". What they will actually do is make any flexibility analysis more ambiguous and make your structures look worse. 

The new bad particle detection method also can do a good job at identifying particle sets which have marginal contrast. When you make the plot shown in the tutorial, a data set with very good contrast will show some separation between the good and bad lobes in the plot. A marginal data set will show a kink in the distribution but there may not be a clear separation between the regions. If you see a horizontal distribution with no clear bend and no clear separation, that's an indication that you are in a dangerous territory, and you will need to be very cautious in interpretation. In this domain you may be able to get a single reliable map out still, but dynamics analysis may not be very trustworthy.

----------------------------------------------------------------------------
Steven Ludtke, Ph.D.
Professor, Dept. of Biochemistry and Mol. Biol.                Those who do
Co-Director National Center For Macromolecular Imaging            ARE
Baylor College of Medicine                                     The converse
slu...@bcm.edu  -or-  ste...@alumni.caltech.edu               also applies
http://ncmi.bcm.edu/~stevel

Reply all
Reply to author
Forward
0 new messages