RAxML-EPA: likelihood weight ratio vs accumulated likelihood weight ratio

303 views
Skip to first unread message

Pam

unread,
Nov 1, 2015, 8:16:55 PM11/1/15
to raxml
Hi,
I am trying to understand which value should be used when interpreting the uncertainty of the placement, the likelihood weight ratio or the accumulated likelihood ratio.

In the RAxML_classificationLikelihoodWeights file generated after EPA, for each query there will be a likelihood weight ratio at the 3rd column and an accumulated likelihood weight ratio at the 4th column. While for every query, the final accumulated likelihood weight ratio will also reach >0.9 before moving on to the next query, each insertion for that query often has likelihood weight ratio that is only ~0.1. So when saying the program places a query onto a certain branch with an uncertainty of X, should the X be the likelihood weight ratio or the accumulated likelihood weigth ratio? 

I don't fully understand what does the accumulated likelihood weight ratio indicate? Based on the output data and my understanding of the corresponding paper for EPA, what the program does is it would place the query onto various branches, calculate the likelihood weight ratio and then try another placement if the accumulated likelihood weight ratio has not reached 0.9. At first I thought the accumulated value is a way to narrow down the branching region on the reference tree that the query should go into. But for some of the queries, I see the insertion happening at many different reference branches that are very distant to each other, which means the accumulated ratio is just a simple add up of all the trials.

I would be really appreciated if you can explain it for me, I've run out of ways in solving it myself.

Pam

Alexandros Stamatakis

unread,
Nov 2, 2015, 3:01:16 AM11/2/15
to ra...@googlegroups.com
Hi Pam,

> I am trying to understand which value should be used when interpreting
> the uncertainty of the placement, the likelihood weight ratio or the
> accumulated likelihood ratio.
>
> In the RAxML_classificationLikelihoodWeights file generated after EPA,
> for each query there will be a likelihood weight ratio at the 3rd column
> and an accumulated likelihood weight ratio at the 4th column. While for
> every query, the final accumulated likelihood weight ratio will also
> reach >0.9 before moving on to the next query, each insertion for that
> query often has likelihood weight ratio that is only ~0.1. *So when
> saying the program places a query onto a certain branch with an
> uncertainty of X, should the X be the likelihood weight ratio or the
> accumulated likelihood weigth ratio? *

That should be the likelihood weight ratio.

> I don't fully understand what does the accumulated likelihood weight
> ratio indicate? Based on the output data and my understanding of the
> corresponding paper for EPA, what the program does is it would place the
> query onto various branches, calculate the likelihood weight ratio and
> then try another placement if the accumulated likelihood weight ratio
> has not reached 0.9.

No, it actually calculates the likelihoods and likelihood weights for
all possible placements. Then, only the placements with the highest
likelihoow weigths are printed out until an accumulated likelihood
weight ratio threshold is reached. I think you are confusing the
likelihood weights with the heuristics for accelerating placement
computations described in the paper.

> At first I thought the accumulated value is a way
> to narrow down the branching region on the reference tree that the query
> should go into.

That's correct, the idea is to use say a .95 accumulated likelihood
weight to determine the region of the reference tree into which a read
falls into, that is, find the LCRA for the subtree containing 95% of the
accumulated likelihood weight.

> But for some of the queries, I see the insertion
> happening at many different reference branches that are very distant to
> each other, which means the accumulated ratio is just a simple add up of
> all the trials.

In this case this might indicate that there are some reads in your
sample that are very distant from the reference sequences. i.e., either
some sort of contamination or something new.

I'd maybe try to BLAST the sequences that scatter all over the tree in a
first step.

Hope this helps,

Alexis

>
> I would be really appreciated if you can explain it for me, I've run out
> of ways in solving it myself.
>
> Pam
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org
Message has been deleted

Pam

unread,
Nov 2, 2015, 6:57:40 AM11/2/15
to raxml

Hi Alexis,

Thanks for your answers, even though I am not sure that I understand them completely. 


So for a case like the following, how would you interpret the placement?

  • The printed out likelihood weights ratio for all the placements of a query is all <= 0.1, and all the placements are close to each other around a certain subtree

Not to mention that the accumulated likelihood weight ratio is of course be >= 0.95. Does the result mean that there are over 0.95 certainty that the query should be placed within this subtree, but the exact placement is uncertain because all the likelihood weight ratio is very low (<=0.1)?

Alexandros Stamatakis

unread,
Nov 3, 2015, 2:54:16 AM11/3/15
to ra...@googlegroups.com
Hi Pam,

> Thanks for your answers, even though I am not sure that I understand
> them completely.
>
>
> So for a case like the following, how would you interpret the placement?
>
> * The printed out likelihood weights ratio for all the placements of a
> query is all <= 0.1, and all the placements are close to each other
> around a certain subtree
>
> Not to mention that the accumulated likelihood weight ratio is of course
> be >= 0.95. Does the result mean that there are over 0.95 certainty that
> the query should be placed within this subtree, but the exact placement
> is uncertain because all the likelihood weight ratio is very low (<=0.1)?

Exactly :-)

Alexis

Alexandros Stamatakis

unread,
Nov 3, 2015, 2:57:11 AM11/3/15
to ra...@googlegroups.com
Hi Pam,


> In the RAxML_classificationLikelihoodWeights file, the first
> placement for every query is always the one with the highest likelihood
> weight ratio (LWR) and the lowest accumulated likelihood weight
> ratio(ALWR). On the contrary, the last placement is always the one with
> the lowest LWR and the highest ALWR. I can understand the last placement
> will have the highest ALWR because it is accumulated. But it is always
> the first placement, the one with the highest LWR that will be used as
> the final placement.

That's all correct.

> So what is the reason for keep trying to place the
> query that will only result in a lower and lower LWR? Simply to have the
> ALWR reaching the 0.95 threshold?

Yes, to get a notion how widely a read will be placed around the tree.
Since we are calculating the likelihood score for the insertion of each
read into all branches of the reference tree, it's easy (computationally
cheap) to obtain these scores, essentially they come for free.

Alexis
Reply all
Reply to author
Forward
0 new messages