Identical sequences in a dataset: to remove or to keep?

409 views
Skip to first unread message

Guan

unread,
Apr 16, 2008, 12:03:55 PM4/16/08
to beast-users
Dear Everyone,

I have a dataset with several taxa with identical sequences. I assumed
that these sequences would be clustered together at a polytomic node
after BEAST analysis. However, I have actually obtained a completely
resolved tree (consensus tree) and all these identical sequences were
placed together as a resolved bifurcate clade with different node
ages.

There seemed to be no problem in the alignment with 34 taxa, 491
characters. The model is GTR+gamma+Inv. The dataset was partitioned
using SRD06 model. Relaxed lognormal clock. Dates were calibrated with
two pairs of sequences at the bird-mammal and toad-mammal branching
points.

I plan to just keep only one of the identical sequences in the dataset
in future analysis, but wonder what could have happened?

Not ran enough generations of MCMC chain?

Any explanation or comments?

Thanks,
Guan

Paolo Zanotto

unread,
Apr 16, 2008, 12:58:09 PM4/16/08
to zhu...@gmail.com, beast-users
Hi Guan,

To my mind it depends on the taxonomic scaling (level), doesn't it?.

I guess that if those samples come from a single population under
study (this may become problematic for clonal organisms), then
identical sequences are informative on the dynamics and by just
leaving "haplotypes" you will loose an important component of a
"truer" growth signal.

However, If the "identical ones" belong to distinct taxonomic ranks,
then you may be mixing scales ("levels of description") and it would
be better to adjust the scaling and, say, use a single sequence from
each species, if you are at the genus level, etc.

Best,
Paolo

alexei....@gmail.com

unread,
Apr 18, 2008, 10:58:18 PM4/18/08
to beast-users
BEAST is a method for sampling all trees that have a reasonable
probability given the data. One of the assumptions underlying the
BEAST program is that there is a binary tree that has generated the
data. Just because (for example) three taxa have identical sequences
doesn't mean that they are equally closely related in the true tree -
it just means that there were no mutations (in the sampled part of the
genome) down the ancestral history of those three taxa. In this case,
BEAST would sample all three trees with equal probability ((A,B),C),
(A,(B,C)), ((A,C),B).

This is what is happening in your analysis, with the identical
sequences. Because you are summarizing the BEAST output as a single
tree (presumably using TreeAnnotator which picks a specific tree from
the trace that is representative and annotates it with posterior
probabilities of clades) you will see some particular resolution of
the identical sequences, based on the selected representative tree.
But the posterior probability for that particular resolution will
probably be low, since many other resolutions have also been sampled
in the chain.

One of the results of the way that BEAST analyzes the data is that you
get an estimate of how closely related these sequences are, even if
the sequences are identical. This is possible because BEAST is
essentially determining how old the common ancestor of these sequences
could be given that no mutations were observed in the ancestral
history of the identical sequences, and given the estimated
substitution rate and sequence length.

So in general: you should ignore *all* splits in the consensus tree
that have low support. In terms of the identical sequences, the only
node with the possibility of significant support would be the common
ancestor of the identical sequences. If this is the case then you can
confidently report the age of this node, but should not try to make
any statements about relationships or divergence times within the
group of identical sequences.

I hope this helps.

Cheers
Alexei

Guan

unread,
Apr 19, 2008, 12:33:16 AM4/19/08
to beast-users
Dear Alexei and Paolo,

Many thanks for your replies that are very helpful and have cleared my
major concerns.

Also kudos to the BEAST team - you have been doing a wonderful job for
the community.

Best wishes and have a great weekend!
Guan

nate...@gmail.com

unread,
Jul 26, 2022, 3:36:50 PM7/26/22
to beast-users
Alexei! Reading in 2022! Thank you so much for this wonderful explanation! 
All the best

Reply all
Reply to author
Forward
0 new messages