refpkg without tree stats

32 views
Skip to first unread message

deth...@stanford.edu

unread,
Mar 19, 2018, 6:13:27 PM3/19/18
to pplacer users
Hi Folks,

I'm new to pplacer but excited about what it looks like it can do.  I'm hoping to build a package using the Silva 16S alignment and reference tree, but it seems that taxit expects a tree stats file.  Since I didn't build the reference tree myself I don't have this info.

1. Is this optional, as it seems like it might be from taxit create documentation?
2. If so, what am I missing or how am I limited without this info?
3. Is there a way to 'cheat'...i.e. back-calculate or estimate any info/values from the tree and/or seqs that might not be as good as the actual RaxML output, but would be better than nothing?

I've also seen some discussion in this group that pplacer can't realistically work with a database the size of Greengenes...and Silva is even larger.  What would be a reasonable ballpark for the number of seqs that pplacer can work with directly on a reasonably powerful server?

Les Dethlefsen
Stanford University


Erick Matsen

unread,
Mar 19, 2018, 6:48:36 PM3/19/18
to pplace...@googlegroups.com
Hello Les--

I'm honored that you'd consider using pplacer.

However, I'm afraid that pplacer is not a good fit for Silva. You need these stats files, but worse pplacer won't be able to handle such a big tree. If you can't build a tree on it using RAxML or FastTree, then you can't place on it. (It's worse than that, in fact, but that's an upper bound.)

Sorry!

Erick

--
You received this message because you are subscribed to the Google Groups "pplacer users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pplacer-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Frederick "Erick" Matsen, Associate Member
Fred Hutchinson Cancer Research Center
http://matsen.fredhutch.org/

Les Dethlefsen

unread,
Mar 21, 2018, 11:07:14 PM3/21/18
to pplace...@googlegroups.com
Thanks for your response, Erick!  

I’m not entirely surprised…but I’m still looking into the possibility of using SEPP with Silva, which I know you're familiar with.  And I think SEPP only needs the RAxML info file, or the analogous info from FastTree, in order to pass it into pplacer.  I.E., whatever info pplacer needs about the tree model parameters is sufficient for SEPP.

As a fallback option, I could use a dramatically smaller dataset, such as the Living Tree Project (LTP) that’s part of Silva…perhaps augmented with sequences from particular clades that are important to me but poorly represented in the LTP at present.  That would be a small enough dataset I could actually run FastTree on it myself, and then if I follow your suggestions from the pplacer documentation, I assume the resulting FastTree logfile will be adequate for SEPP and pplacer.

But I’m not yet ready to give up on the more ambitious plan of SEPP + Silva.  I’ve reached out to the Silva folks to see if they could provide an actual RAxML file from their backbone tree estimation a number of years back, which as I recall involved 20,000 hours on some European supercomputer.  In the event they don’t have the actual file, but have recorded the model parameters from the run, can you help me understand what information pplacer needs?

Is it only the alpha parameter for the gamma distribution of rates, and the 6 base rates for the time-reversible transitions/transversions?

Les Dethlefsen
Relman Lab
Stanford University
deth...@stanford.edu

You received this message because you are subscribed to a topic in the Google Groups "pplacer users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pplacer-users/FerKJUvCP78/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pplacer-user...@googlegroups.com.

Erick Matsen

unread,
Mar 22, 2018, 11:39:03 AM3/22/18
to pplace...@googlegroups.com
Hello Les--


If you are using SEPP than this is a completely separate question. If I recall correctly, SEPP places on subtrees of the primary tree, which should be totally feasible. You should check in with the author Siavash about that. And yes, all pplacer needs is the info/log file from one of those two programs.

Yes, pplacer just needs those quantities and you could insert them directly into a reference package.


Thank you,

Erick

Les Dethlefsen

unread,
Mar 22, 2018, 1:17:02 PM3/22/18
to pplace...@googlegroups.com
Thanks again, Erick!

Best wishes,
Les 

Reply all
Reply to author
Forward
0 new messages