Ancestral reconstruction

828 views
Skip to first unread message

Christian A

unread,
Mar 6, 2012, 3:16:59 AM3/6/12
to ra...@googlegroups.com
Hi,

I searched the internet and this forum, but I cannot find any real information about the marginal ancestral reconstruction feature that was added in version 7.3.0. Can someone tell me a few details about it, how does the reconstruction work, and is it already working neatlessly? Specifically, I'd like to know if it is possible to reconstruct ancestral states for all inner nodes, given the ML tree and the sequence data. Which options do I have to use in Raxml to do that? Any help or link would be appreciated!

Thanks,
Christian

Alexis

unread,
Mar 6, 2012, 6:26:33 AM3/6/12
to raxml
Hi Christian,

It seems to be stable, but please use the latest GIT version since I
just fixed a bug in there.
The usage should be described in the previous thread on here about
ancestral states.

For background reading, see Section 4.4 pp119-126 of Ziheng Yang's
textbook on computational molecular evolution.

Alexis

Christian A

unread,
Mar 6, 2012, 7:07:39 AM3/6/12
to ra...@googlegroups.com
Thanks Alexis,

sorry that I missed your previous post, but I did search for ancestral reconstruction and did not find anything in the forum here. I'll try it, thanks for coding & answering :)

Christian

Alexis

unread,
Mar 7, 2012, 3:24:34 AM3/7/12
to raxml

Christian A

unread,
Apr 10, 2012, 7:34:44 AM4/10/12
to ra...@googlegroups.com

Hi Alexis,

sorry for re-opening this thread here, but I used the ancestral reconstruction feature for a few datasets and I got some say questionable results. Imagine some deep nodes near the root of the tree (see example with A, B, and C, all of which are actually internal nodes).


     -------  B
     |
A---
     |
     -------- C

Now, we found that the ancestral reconstruction of these three nodes gives weird results. All three branches have a clear non-zero length in the ML tree. However, when comparing the reconstructed sequences, either A and B or A and C have identical reconstructed sequences, while the remaining node has indeed a (as expected) different reconstructed sequence. However, one would expect three different sequences for all these nodes, given the ML tree and the non-zero branch lengths. It appears as if the ancestral state algorithm "decides" for one the sequences, regardless of the estimated branch length from the ML tree.

I can give you a real example if you like, but maybe this is a good start to figure out why that happens.


Thanks,
Christian

Alexandros Stamatakis

unread,
Apr 10, 2012, 5:02:46 PM4/10/12
to ra...@googlegroups.com
Hi Christian,

Where is your example tree rooted?

What you display there is an unrooted tree, however for ancestral state reconstruction the tree needs to be rooted.

I can't really help you with an abstract description as the one you provided.

Please send me (via email directly to me) an input dataset and tree where you observed, this a priori, indeed weird
behavior.

Cheers,

Alexis

>>> thanks for coding& answering :)

--
Dr. Alexandros Stamatakis
www.exelixis-lab.org

Alexis

unread,
May 31, 2012, 8:17:32 AM5/31/12
to ra...@googlegroups.com, Alexandros...@gmail.com
Hi Christian,

I looked at your files. 

Actually the RAxML ancestral reconstruction algorithm outputs two files:

1. RAxML_marginalAncestralStates
2. RAxML_marginalAncestralProbabilities

File 1. just contains the ancestral sequences that have been obtained from the marginal ancestral probabilities.
It will always contain the character with the highest probability, e.g., 'C' if P(C) = 0.9 and sometimes 
'?' if all marginal ancestral probabilites for that site only differ by less than 0.000001.

So what you are seeing in file RAxML_marginalAncestralStates (the ancestral sequences for two different nodes being identical) 
is a result of this discretization of the probabilities contained in  RAxML_marginalAncestralProbabilities.

If you check the marginal ancestral probability vectors in file  RAxML_marginalAncestralProbabilities for the identical sequences in file RAxML_marginalAncestralStates you will see 
that they are indeed (slightly different).

Thus, if you need a more fine-grain/detailed ancestral sequence, you should use the ancestral probabilities and not the ancestral states that have been inferred from these.

Cheers,

Alexis

Adam Witney

unread,
Jan 21, 2013, 6:14:06 AM1/21/13
to ra...@googlegroups.com, Alexandros...@gmail.com
Hi Alexis,

Sorry for resurrecting this old thread, but I have come across the same thing trying to estimate ancestral states on my tree. When i look at the RAxML_marginalAncestralStates file at the node that splits to two of my strains then the estimated sequence is often identical to one of my strains, but the branch length is not zero.

You describe below how the RAxML_marginalAncestralStates file is determined from the RAxML_marginalAncestralProbabilities file by selecting the bases with the highest marginal probabilities. But then say if I want a more detailed / fine grained ancestral sequence then I need to look at these probabilities. But I am not sure how this will help me? I look at the 4 state probabilities and look for the highest, thus coming up with the same answer as in the RAxML_marginalAncestralStates file. Are you saying that if the ancestral sequence is identical as the descendant sequence then the data does not provide sufficient resolution to distinguish them?

Thanks for any help

Adam

Alexandros Stamatakis

unread,
Jan 21, 2013, 6:33:01 AM1/21/13
to ra...@googlegroups.com
Hi Adam,

> Sorry for resurrecting this old thread, but I have come across the same
> thing trying to estimate ancestral states on my tree. When i look at
> the RAxML_marginalAncestralStates file at the node that splits to two of my
> strains then the estimated sequence is often identical to one of my
> strains, but the branch length is not zero.

Yes, but I am afraid that this is just a property of ancestral state
reconstruction, it just displays the character for which the signal is
strongest, if if the br-len is > 0.

> You describe below how the RAxML_marginalAncestralStates file is determined
> from the RAxML_marginalAncestralProbabilities file by selecting the bases
> with the highest marginal probabilities.

Exactly.

> But then say if I want a more
> detailed / fine grained ancestral sequence then I need to look at these
> probabilities.

Yes.

> But I am not sure how this will help me? I look at the 4
> state probabilities and look for the highest, thus coming up with the same
> answer as in the RAxML_marginalAncestralStates file.

Well, you may think about just quantifying the difference between state
probabilities and using some cutoff, say that A has 0.4 and C 0.3 you
may want to use the respective ambiguous IUPAC character state M for
instance.

I am afraid that this is just a property of ancestral state
reconstruction.

> Are you saying that if
> the ancestral sequence is identical as the descendant sequence then the
> data does not provide sufficient resolution to distinguish them?

It's not really identical, the prob for A in the ancestral state will be
something like 0.9 while in the extant sequence it will always be 1.0.

Nontheless, the most likely state at the ancestor will still be A.

Maybe you will have to just move further up in the tree to see some
changes.

Don't know if this helps, I am afraid that this is just the way it is.

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Adam Witney

unread,
Jan 21, 2013, 7:13:28 AM1/21/13
to ra...@googlegroups.com

Thanks for your quick response Alexis, this does clarify it for me a lot.

Actually what I am trying to achieve is to estimate the number of actual changes on the branch, and was going to use the ancestral states to estimate this. But could i also simply take the branch length and multiply it by the total number of sites in the alignment to give me the same estimate? Or is there a better way to do this?

Thanks again

Adam

Alexandros Stamatakis

unread,
Jan 21, 2013, 2:18:01 PM1/21/13
to ra...@googlegroups.com
Hi Adam,

The branch length actually is an estimate of the mean number of changes
per site along the branch.

If you want something more elaborate you will have to look into trait
evolution methods and the corresponding bibliography.

Alexis

Adam Witney

unread,
Jan 21, 2013, 2:53:40 PM1/21/13
to ra...@googlegroups.com, Alexandros Stamatakis
On 21/01/2013 19:18, Alexandros Stamatakis wrote:
> Hi Adam,
>
> The branch length actually is an estimate of the mean number of changes
> per site along the branch
Yes this was my logic to multiplying by the number of sites, to give the
mean number of changes over the whole alignment on each branch.

> If you want something more elaborate you will have to look into trait
> evolution methods and the corresponding bibliography.

Thanks , I will take a look.

Thanks again for your help

Adam
Reply all
Reply to author
Forward
0 new messages