Q: I refined my structure in (EMAN1, BSOFT, SPIDER, ...) and got subnanometer resolution, but when I refine using the new "gold standard" in EMAN2.1, I'm only getting 15 Å. My structure looks right, and I think I see alpha helices, what's wrong with EMAN2.1 !?!?
A: I have gotten variations of this question from several different people recently. Not all were subnanometer, and not all cases were as significant as this, and most people were more tactful than I've been to myself above, but it's a very important question right now, so I have a pretty extensive answer:
{preach mode on} Note that there have been several controversial structures published recently in the cyro-EM community (one specific one will probably come to mind), and these are beginning to give the entire field a pretty negative public perception. Despite the fact that probably 90% of published structures are correct, people outside the field are beginning to question whether anything done by cryo-EM is trustworthy. While many other fields probably also need to have a similar 'housecleaning", in general I think the biological sciences need to do everything they can to verify the accuracy of their scientific conclusions before publishing, rather than just trying for the splashy press release you think will help you get funded. {preach mode off}
When you have a structure, which appears correct, and can be reasonably interpreted at the resolution you believe you have achieved, it can be very disturbing when some new-fangled algorithm tells you there is something wrong. This could be the gold standard, tilt validation or even Molprobity scores. In this rather lengthy post, I'm not going to explain the gold standard again, but rather discuss some specific reasons why your resolution may not behave as you expect.
In the original IP3R structure we solved ~12 years ago, we convinced ourselves we were correct, and by all of the measures in common use at that time, we were. However, it was later demonstrated (by ourselves thankfully) to be complete rubbish. When we published the second (correct overall) structure of IP3R at ~10 Å resolution, we also had docked crystal structures, and came up with biological stories that made excellent logical sense, however, when reproducing and validating, we showed conclusively that it was really only valid to a lower (17 Å) resolution, due to internal flexibility. While we self-corrected here, it was still very painful to effectively have to retract some portion of our earlier results. Don't get me wrong, I believe the vast majority of our published structures and interpretations have been fully correct and reasonable over the years, but there have been a few where we found that we had pushed too hard. In each case where I've gone back and checked, the new validation methods have correctly identified the previous cases which were correct and those which later proved to be overinterpretations. In many cases, the 'gold standard' in fact produced almost exactly the same resolution as we had originally published. In other cases, the old method produced a slight resolution exaggeration which would have had no impact on the interpretation of the map. In a few cases, the resolution exaggeration was significant, and the old method produced a significant exaggeration. This means that the single particle structures published (by everyone, not just us) in the 2000's are highly variable. A majority of structures really do clearly have approximately the resolution they claim, whereas others are clearly overstated, in some cases significantly. I believe the tide has finally turned now, and it will become increasingly difficult to publish anything that does not have fairly robust validation and use a well justified resolution criterion (as it should be). The fact that many people now have microscopes capable of producing easy subnanometer structures and occasional sub 5 Å resolution structures, even with these robust criteria certainly helps :^)
Let's consider 4 scenarios:
1) Homogeneous data of similar quality, doubling particles: Say you collect 10,000 particles on a particular scope, then go back and collect another 10,000 on a grid of identical quality, with similar alignment and similar range of defocuses. Regardless of the size of the particle, refining the first 10,000 particles, then refining the full set of 20,000 particles will produce only a modest resolution improvement. Resolution improvement with fixed data quality requires a worse than exponential increase in the number of particles. That is, doubling the number of particles in this situation will give some improvement, but it's going to be fairly marginal. If you have a data set where this behavior is not observed. ie - doubling the particle count seems to provide a dramatic improvement in the structure, you should be VERY suspicious, and you need to figure out why (see case 3 for a possible explanation). The argument that there is a nonlinear process going on, and the structure simply can't 'click in' correctly with fewer particles is a difficult one to make. While this sort of thing may be possible, it would only happen with very small numbers of particles (10's or 100's, not thousands).
2) Heterogeneity: Say you have 20,000 particles, but the structure is moving internally with ~15 Å motions. Further say that the power spectrum of this data extends well-past 10 Å. If you refine such a data set with "old" methods, you will almost always achieve a structure which is assessed to be better than 10 Å resolution. However, if you repeat the refinement several times with different starting models, you will find that you may produce several different structures, each of which assesses at better than 10 Å resolution, BUT that only agree among themselves to ~15 Å. What is the correct interpretation in this situation ? The "gold standard" says that you are only entitled to interpret such data at 15 Å. However, can you make an argument that there is some validity to each of these separate structures ? Unfortunately this is on very shaky ground, since each of the structures ostensibly incorporates the same raw data. What you CAN do is subclassify the data, in which case you CAN achieve a "gold standard" resolution of better than 15 Å, since the data in each subset is more homogeneous. Doing this may require collecting larger data sets, of course.
In this situation halving the number of particles by subclassification actually IMPROVES the resolution of each structure. On the other hand, if you simply randomly split the data in half, each half will produce a structure with marginally worse resolution than the original set.
3) Data quality variations: As an extreme example, consider collecting 10,000 particles on an old 200kev LaB6 scope, then collecting 10,000 more on a good 300kev FEG scope. If you reconstruct the first 10,000 particles alone, then add the second 10,000 you will naturally find the second structure (with 20,000) has much better resolution than the first. If you then refine ONLY the second set of 10,000. ie - completely exclude the lower quality data, you will generally find that the structure does not degrade at all. Due to the ~Gaussian falloff of the envelope function, and the ~exponential falloff of the SSNR, using data of mixed quality is almost always pointless. That is, even if you have only 10,000 good particles, and 50,000 'less good' particles, using only the 10,000 good particles will almost always produce a better structure than using the full set.
People REALLY hate to hear this one, and fight against it all the time. I cannot tell you how many times I've been asked by people how they can combine data collected on two different microscopes. Try to think like a crystallographer. You spend a lot of time screening your crystals on a tabletop machine in your lab. When you finally get that really good crystal, of course you're going to send it to the beamline, and it would be silly to try and merge your table-top data with the beamline data. Once you have good grids, image them on the best scope you can get access to. Data collected on a 'lesser' scope will be useless if you later use a better scope.
4) Preferred orientation/anisotropic resolution: Some particles with preferred orientation are harmless, meaning the preferred orientation distribution still manages to fill Fourier space reasonably uniformly. However, others can produce strange effects. For example, if you look at Ribosome structures in the EMDB, you will find that some (but not all !) of them appear to have a funny 'directional smearing' in their structures. This is due to a preferred orientation problem which sometimes occurs with ribosomes, where specific regions in Fourier space are less well sampled than others. That is, in one direction in Fourier space, you may have 50,000 particles contributing to the structure, and in another direction there may only be 1000 particles contributing. In this situation, doubling the number of particles will have an unusual effect, visually on the map. The directions that already had 50,000 particles contributing will often see almost no benefit from suddenly having 100,000 particles, as you may have already reached the alignment-limited resolution of the data. However, the direction that had only 1000 particle contributions will now have 2000, and the improvement may be quite significant. That is, doubling the number of particles will have only a slight effect on the FSC measured resolution, but the visual quality of the structure (due to filling in the 'bad' directions) may improve significantly.
Even with isotropic data, it's worth remembering that there is generally some limiting resolution for any given data set. That is, there is a resolution past-which you cannot move regardless of how many particles you have acquired, due to limitations in alignment accuracy. However, even in these cases, adding more particles can still be worthwhile, as it will improve the quality of the reconstruction even if the resolution ceases to improve. That is, additional data will reduce the noise level in the map near the limiting resolution and improve interpretations of the structure.
---
If you believe that you have a subnanometer resolution structure, but the gold standard says you don't, the first test to try is:
Use exactly the same data, and exactly the same refinement command you used to produce the subnanometer resolution structure. Take your starting model, and phase randomize it beyond, say, 20 Å. This will preserve the quaternary structure, but scramble the high resolution details. In a well-behaved data set, the correct subnanometer structure will very quickly re-emerge, since the quaternary structure dominates the particle alignment (note that you are using the full data here, not 1/2 as in the gold standard). This is a test of model-bias, not resolution, and a variant of this was often performed before the concepts were merged into the "gold standard". The phase randomization can be achieved with:
e2proc3d.py old_model.hdf new_model.hdf --process filter.lowpass.randomphase:cutoff_freq=0.05
repeat this process with at least 2 or 3 different starting models, then compute FSCs among the various refined results. If all of the refinements produce basically the same high resolution structure, then you can move forward with more confidence. (if you find this in a case where using 1/2 the data produces a substantially worse result, I would very much like to see it as a test-case). If, on the other hand, you find that you get different 'high resolution' structures out, then you have either a case of heterogeneity (which can be identified via improvement by subclassification) or model bias (meaning the apparent high resolution is just incorrect).
Anyway, hope this long discussion helps someone, and will help convince you that the solution to EMAN2.1 giving you a worse resolution value isn't to switch to some other program (even an older version of EMAN) which gives you a better looking number. Just keep in mind that even if the software doesn't do it automatically, you can do gold standard resolution tests in EMAN1, BSOFT, SPIDER, etc. manually without much effort.
cheers