Re: Sampling bias

78 views

Skip to first unread message

Luiz Max de Carvalho

unread,

Jun 20, 2012, 11:33:05 AM6/20/12

to beast...@googlegroups.com

Hi Bernd,

I'm also concerned with sampling bias and I think that your question has no answer yet, at least there's no empirical evidence[to my knowledge] . I'm planning on doing some research about the effects of sampling bias on the estimation of phylogeographic parameters in BEAST and ways to cope with it. So far I contacted Nuno Faria (PhD candidate, KU Leuven University, Belgium) and Joe Parker (Postdoc, University of London), trying to set up a working group on the issue. Additionally I'm conducting some literature search to see what people actually do about it. As you said, there's random stratified sampling [ which, in my opinion, would be best under an homogeneity assumption] and there's prior modification to incorporate the effects of using a biased sample (see Faria et, 2012, Journal of General Virology).

In this way, I think that (i) there's no solid answer to your doubts and (ii) if you're interested, we could investigate it together.

Best,

Luiz, PAHO/WHO Reference Lab on Foot-and-Mouth Disease, Brazil.

On Wednesday, June 20, 2012 7:58:39 AM UTC-3, Bernd Haenfling wrote:

My question is more about experimental design rather than program usage

Has anybody explored the effect of sampling bias on mcrca estimation?

I have seen published statements that the data set was pruned to avoid unequal taxonomic repetition, but no reference was given. Intuitively this seem sensible but is there evidence?

In my case this is relevant for two reasons

1) I'm planning to use calibration points from a sister group (also estimated using BEAST).

The question is now whether I should add the entire data of this sister group to my data or whether it is sufficient to include the minimum number of taxa which represent the nodes for which mcrca estimates exist.

2) In my own data set a number of species are represented with multiple haplotypes extracted from large phylogeographic data sets whereas other species are only represented by a single individual. Should I prune the data set so that each species and clade is equally represented?

Thanks
Bernd

Reply all

Reply to author

Forward

0 new messages