ascertainment (coding) bias

marcelo_weksler

unread,

Jun 7, 2012, 11:29:00 AM6/7/12

to ra...@googlegroups.com

Hi Alexis:

There is a topic that asked a question on this issue but I could not find an answer; what is the default option in raxml for ascertainment (coding) bias when the MK or GTR models are applied to morphological characters? I am especially worried on the lack of variable or autapomorphic characters, in morphological matrices; can we modify the coding (ascertainment) bias; is it worth the worry?

Below is how Mrbayes deals with it in a bayesian framework;

Thanks for the great software; best, marcelo weksler

Mrbayes manual:
Page 30:
Typically morphological data matrices do not include all types of characters. Specifically, morphological data matrices do not usually include any constant (invariable) characters. Sometimes, autapomorphies are not included either, and the matrix is restricted to parsimony-informative characters. For MrBayes to calculate the probability of the data correctly, we need to inform it of this ascertainment (coding) bias. By default, MrBayes assumes that standard data sets include all variable characters but no constant characters. If necessary, one can change this setting using lset coding.

page 39:
A problem with some binary data sets, notably restriction sites, is that there is an ascertainment (coding) bias such that certain characters will always be missing from the observed data. It is impossible, for instance, to detect restriction sites that are absent in all of the studied taxa. MrBayes corrects for this bias by calculating likelihoods conditional on the unobservable characters being absent (Felsenstein, 1992). The ascertainment (coding) bias is selected using lset coding. There are five options: (1) there is no bias, all types of characters could, in principle, be observed (lset coding=all); (2) characters that are absent (state 0) in all taxa cannot be observed (lset coding= noabsencesites); (3) characters that are present (state 1) in all taxa cannot be observed (lset coding=nopresencesites); (4) characters that are constant (either state 0 or 1) in all taxa cannot be observed (lset coding=variable); and (5) only characters that are parsimony-informative have been scored (lset coding=informative). For restriction sites it is typically true that all-absence sites cannot be observed, so the correct coding bias option is noabsencesites.

page 40:
When is the correction for ascertainment bias important? This is strongly dependent on the size of the tree (the sum of the branch lengths on the tree). The larger the tree, the less important the correction for ascertainment bias becomes. In our experience, when there are more than 20-30 taxa, even the most severe bias (only informative characters observed) is associated with an insignificant correction of the likelihood values.

Alexandros Stamatakis

unread,

Jun 11, 2012, 6:28:04 AM6/11/12

to ra...@googlegroups.com

Hi Marcelo,

There has been a previous discussion about this, I believe that what the
MrBayes manual refers to is also known as Lewis correction.

This is *not* implemented in RAxML, albeit it has been on my list for ages.

I am not sure if it is worth the worry, but it may well be, I seem to remember
a discussion with J. Huelsenbeck where he mentioned that I should definitely
implement that.

It would be a great service to the user community if you could do your analyses
with MrBayes and RAxML and let us know on here if it makes a difference, at least in your case.

My honest answer is: I don't know.

Alexis

--
Dr. Alexandros Stamatakis
Research Group Leader HITS, Heidelberg
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University of Arizona at Tucson
www.exelixis-lab.org

marcelo_weksler

unread,

Jun 15, 2012, 8:48:27 AM6/15/12

to ra...@googlegroups.com, Alexandros...@gmail.com

Thanks Alexis, I will analyze a dataset I have that I did score for invariable and autapomorphic morphological characters, and will let you know of any interesting results.

Reply all

Reply to author

Forward