> Well, I don't think anything right now is great at high-resolution
> taxonomic classification, so I think pplacer would do just as good (or
> better) as something like RDP. No one in our group has done a through
> investigation of how well pplacer is working using GG, but from a few
> test cases it certainly seems better than something like RDP.
>
> I guess you are worried that pplacer might not know the exact
> placement due to so much data. Also it seems that the large number of
> gaps and thus the location of informative positions would be a problem
> with any tree building program and yet GG and SILVA are making these
> large trees.
Yes, they are making these large trees. It's not always clear that their
overall goal is phylogenetic accuracy. Members of the Knight group have
made it clear to me that they aren't so bothered about accuracy, as long
as it doesn't change their UniFrac estimates. You can see this in, e.g.
http://pynast.wordpress.com/2010/04/06/pynast-1-1-released-better-alignments-and-a-15x-speed-increase/
That makes perfect sense to me-- we really just need to scale the method
to the application like I said for 16s estimates.
> Considering the large demand for decent taxonomic classification using
> 16S and that most people seem happy settling for answers from the
> quick and dirty RDP classification it seems like it might be worth
> pursuing this a bit more? I have built a pipeline (similar to the one
> outlined in Steve Kembels recent paper) that takes 16S reads aligns
> them to the GG alignment using PyNAST, does the trimming etc, so that
> pplacer can be used. Conor (previous email) in our group has been
> using this quite a bit for taxonomic classification so he can probably
> comment more on successes on that front.
Yes, we also have something like that (not surprisingly). However, the
direction we have headed is to break a big alignment into
sub-alignments, and first "bin" reads into the sub alignments using a
naive Bayes classifier. Then placement. We're currently validating this
approach, but are as usual chasing our tail with taxonomic things.
> Also, I know that Jonathan Eisen's group is working on a taxonomic
> classification using pplacer on protein coding genes (for WGS
> metagenomics). However, maybe it would be worth discussing a brief
> project for 16S data. If interested we could chat about this offline.
Yes. I'm not saying that using pplacer on big trees and alignments is
the worst thing ever, but I wouldn't say that using it on the SILVA tree
can be recommended without reservation.
Once when I congratulated Morgan Price on FastTree 2.1, he commented
"yeah, we can build these big trees now, so now we need to figure out if
they mean anything."
I should make it clear in my note of caution, though, that I think that
the GG and SILVA folks are doing great work on a hard problem.