RAxML-EPA

336 views
Skip to first unread message

Yaqin Guo

unread,
Jan 4, 2021, 4:45:15 AM1/4/21
to raxml
HI Everyone,
I am  a beginner of this topic. I have a question that how to run RAxML-EPA on CIPRES? I need to use EPA to build a tree. But I couldn't find this option on CIPRES web portal. Anyone could help me out?  Thanks in advance!

Grimm

unread,
Jan 4, 2021, 10:17:40 AM1/4/21
to raxml
Hej Yaqing,

maybe be more specific in your question: "I need to use EPA to build a tree". Berger et al.'s evolutionary placement algorithm uses a pre-defined (backbone) tree, and "only" optimises the probable position of one query (sequence) at the time within that backbone tree. It doesn't optimise the positions of the queries to each other, not does it re-optimises the tree. The input tree is static, we test where we would expect the additional leaf for each individual query.

One can test multiple queries at a time (thousands in fact), and there is a guide tree included in the output that comprises all queries at once. See the attached PDF for an iTol-enhanced example of an RAxML EPA output tree: the red bubbles at the branches represent the number of individual queries assigned to a branch; here's the related paper, paywalled; free-access preprint here; data can be found at figshare.

But the EPA placement output tree is not a phylogenetic tree but a (simplifying) graphical representation of the individual placements with the highest probability. In case there is one alternative: pending on the signal in the queries and the resolution capacity/decisiveness of the used backbone tree, you may have cases in which two or more placements get the same probability, and just one placement will be represented in the tree-graph output. The full information (EPA result) about the best and alternative placements is included in the jplace file.

Cheers, Guido
circular_nobranch_lenght-1.pdf

Pierre Barbera

unread,
Jan 4, 2021, 10:22:03 AM1/4/21
to ra...@googlegroups.com

Dear Yaqin Guo,

EPA does not actually construct a tree, but rather finds the most likely placements of a given set of query sequences, on a given tree. It does not actually extend the given tree by the given sequences in some sort of iterative tree building, but rather tests each query sequence against the existing tree individually.

If you have some other use for EPA, CIPRES does offer EPA-ng as an available tool, which is the re-implementation of the old EPA algorithm that is present in RAxML, and I would recommend you use that. We will be happy to assist you with that either here or on the dedicated phylogenetic placement google group (https://groups.google.com/forum/#!forum/phylogenetic-placement).

All the Best,
Pierre

On 04.01.2021 10:45, Yaqin Guo wrote:
HI Everyone,
I am  a beginner of this topic. I have a question that how to run RAxML-EPA on CIPRES? I need to use EPA to build a tree. But I couldn't find this option on CIPRES web portal. Anyone could help me out?  Thanks in advance!
--
You received this message because you are subscribed to the Google Groups "raxml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raxml/67029e0c-664c-4b4f-8fcc-6066e076b5f1n%40googlegroups.com.

Yaqin Guo

unread,
Jan 6, 2021, 4:22:04 AM1/6/21
to raxml
Thanks Guido.Thanks very much for your explanation and sharing. Your paper is impressive. 

Yaqin Guo

unread,
Jan 6, 2021, 4:29:33 AM1/6/21
to raxml
Dear Pierre,
Thanks for your reply. Yes, I did understand EPA didn't construct a tree. But I would like to use this algorithm to do species delimitation. I noticed that RAxML-HPC v.8 on XSEDE (8.2.12) at the CIPRES Science Gateway could enter the backbone tree.  Could I use this to place my query sequences into it? What's this difference with EPA-ng at the CIPRES?Screen Shot 2021-01-06 at 10.26.04 AM.png
Thans again,
Yaqin

Pierre Barbera

unread,
Jan 6, 2021, 8:26:36 AM1/6/21
to ra...@googlegroups.com

Dear Yaqin,

Backbone tree in the case of RAxML is something different: it is used to specify a partially resolved tree, such as for example a taxonomy, to constrain the treesearch to all possible resolutions of that backbone tree. So no use to you. (also please consider using the much improved, and actually maintained, RAXML-NG for running tree searches)

For sure if you want to do placement on CIPRES, EPA-NG is the way to go. Going from placement to species delimitation is a bit more advanced though, we recently had a publication on this:  https://onlinelibrary.wiley.com/doi/full/10.1111/1755-0998.13255
Basically the approach there is to 1) place queries on the tree 2) extract the queries that landed on a given branch 3) make a tree out of the queries 4) run phylogenetic species delimitation (mPTP) on that tree. However be warned that because of the nature of query sequences (often very short, often from genes with disputed value in terms of species delimitation) there is some limitation here depending on the data used. SCRAPP also is definitely not available on CIPRES, and you would have to install and run it yourself somewhere, which of course we will be happy to assist you with.

Also I just noticed that CIPRES has the wrong citation for EPA-NG, the correct one is here: https://doi.org/10.1093/sysbio/syy054

Happy Placement,
Pierre

Yaqin Guo

unread,
Jan 6, 2021, 10:50:15 AM1/6/21
to raxml
Dear Pierre,

Thanks a lot for your reply. 

I would like to use EPA_NG to try it first. But I am also confused. It's not very clear that what I need to input, please see the snapshot:Screen Shot 2021-01-06 at 4.34.39 PM.png
Could you please check for me what do I need for input, especially RAxML info file.  Be aware this is at CIPRES. 

One more, thanks for providing advanced method. I really appreciate your efforts. But I need to read and understand more. I will try this later.

Thanks again,
Yaqin

Pierre Barbera

unread,
Jan 6, 2021, 1:42:46 PM1/6/21
to ra...@googlegroups.com

Dear Yaqin,

the binary file you can leave empty. ref-msa is the reference alignment of the tree.

The "Info file" can either be the .info file created by raxml 8.x, or the .bestModel file coming from raxml-ng when creating the tree. Its used to pass the model parameters associated with the tree to EPA-NG. If you don't have this file anymore, you can re-run raxml(-ng) to purely recalculate the likelihood: https://github.com/Pbdas/epa-ng#setting-the-model-parameters

The query file is, as you surmised, the fasta of the query sequences.

For a bit of an overview (though its meant for using EPA-NG in the command line) maybe this page will help: https://github.com/Pbdas/epa-ng/wiki/Full-Stack-Example (as well as the readme I linked before).

Pierre

Reply all
Reply to author
Forward
0 new messages