Hi Pam,
> I am trying to understand which value should be used when interpreting
> the uncertainty of the placement, the likelihood weight ratio or the
> accumulated likelihood ratio.
>
> In the RAxML_classificationLikelihoodWeights file generated after EPA,
> for each query there will be a likelihood weight ratio at the 3rd column
> and an accumulated likelihood weight ratio at the 4th column. While for
> every query, the final accumulated likelihood weight ratio will also
> reach >0.9 before moving on to the next query, each insertion for that
> query often has likelihood weight ratio that is only ~0.1. *So when
> saying the program places a query onto a certain branch with an
> uncertainty of X, should the X be the likelihood weight ratio or the
> accumulated likelihood weigth ratio? *
That should be the likelihood weight ratio.
> I don't fully understand what does the accumulated likelihood weight
> ratio indicate? Based on the output data and my understanding of the
> corresponding paper for EPA, what the program does is it would place the
> query onto various branches, calculate the likelihood weight ratio and
> then try another placement if the accumulated likelihood weight ratio
> has not reached 0.9.
No, it actually calculates the likelihoods and likelihood weights for
all possible placements. Then, only the placements with the highest
likelihoow weigths are printed out until an accumulated likelihood
weight ratio threshold is reached. I think you are confusing the
likelihood weights with the heuristics for accelerating placement
computations described in the paper.
> At first I thought the accumulated value is a way
> to narrow down the branching region on the reference tree that the query
> should go into.
That's correct, the idea is to use say a .95 accumulated likelihood
weight to determine the region of the reference tree into which a read
falls into, that is, find the LCRA for the subtree containing 95% of the
accumulated likelihood weight.
> But for some of the queries, I see the insertion
> happening at many different reference branches that are very distant to
> each other, which means the accumulated ratio is just a simple add up of
> all the trials.
In this case this might indicate that there are some reads in your
sample that are very distant from the reference sequences. i.e., either
some sort of contamination or something new.
I'd maybe try to BLAST the sequences that scatter all over the tree in a
first step.
Hope this helps,
Alexis
>
> I would be really appreciated if you can explain it for me, I've run out
> of ways in solving it myself.
>
> Pam
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
raxml+un...@googlegroups.com
> <mailto:
raxml+un...@googlegroups.com>.
> For more options, visit
https://groups.google.com/d/optout.
--
Alexandros (Alexis) Stamatakis
Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson
www.exelixis-lab.org