Segregating sites, model selection and parameter interpretation

896 views

Skip to first unread message

carooo...@gmail.com

unread,

Jan 14, 2015, 2:42:06 PM1/14/15

to dadi...@googlegroups.com

Dear Ryan and dadi users,

I am working on mosquito cryptic species and trying to find the best demographic model for each population. I have populations that vary from 12 to 72 individuals. I strongly suspect from other indices like a negative genome-wide Tajima’D and a skewed allele frequency spectrum that the different populations are in expansion, so I was expecting to find a model that confirm this trend.

I have tried all the model implemented in Dadi.Demographic1D with different initial parameters and I have several questions regarding segregating sites, model selection and parameters interpretation. I have read almost all related topics on this forum but did not find answers to my questions.

1. Segregating sites : In the manual you indicate that « As a rule of thumb, we often choose our projection to maximize the number of segregating sites in our final fs (assessed via fs.S()), although we have not formally tested whether this maximizes statistical power. » I have tried different sample sizes and the number of segregating sites varies a little bit but it doesn’t seem to affect the model selection. Is it better to adapt the number of samples according to the maximum of segregating sites for each population or can I project down to the same sample size for almost all the populations? For the group of 72 individuals, the best fs.S() is for a sample size of 104…I have never seen such a high sample size in the different analyses I have found... does it sound correct?

2. Model selection : For some populations, all the model implemented in Dadi.Demographic1D give about the same LL. The AIC are also very close. But when I look a the residual plots (see document attached), they seem to show a trend for all the model except one (the bottlegrowth). Can I choose this model based on the value of the LL (the highest) and the residual plot to make simulations with ms with this model?

3. Parameters : If I choose the bottlegrowth model I was wondering how to interpret the parameters. nuB is equal to 5.7 and nuF is equal to 1.5. My understanding is that the population had an instantaneous size change whereby the population size has been multiplied by 5.7. And then, the population started to growth exponentially to increase again the population size by 1.5 times the ancient population. Doest it make sense? I have seen a recent publication where they had the same kind of parameters for a bottlegrowth model: the first parameter nuB was > 1, meaning expansion, and the nuF parameter was < 1, corresponding to a bottleneck. To me the model looked more like a growth-bottleneck model. Is it possible to interpret the parameters in this manner?

Sorry for the long email and thank you very much for your help and this forum that helps me a lot with Dadi,

Best,

Caroline

DADI models.pdf

Gutenkunst, Ryan N - (rgutenk)

unread,

Jan 21, 2015, 5:59:48 PM1/21/15

to dadi...@googlegroups.com

Hi Caroline,

My apologies for the slow reply. Responses are below.

On Jan 14, 2015, at 12:42 PM, carooo...@gmail.com wrote:

Dear Ryan and dadi users,

I am working on mosquito cryptic species and trying to find the best demographic model for each population. I have populations that vary from 12 to 72 individuals. I strongly suspect from other indices like a negative genome-wide Tajima’D and a skewed allele frequency spectrum that the different populations are in expansion, so I was expecting to find a model that confirm this trend.

I have tried all the model implemented in Dadi.Demographic1D with different initial parameters and I have several questions regarding segregating sites, model selection and parameters interpretation. I have read almost all related topics on this forum but did not find answers to my questions.

1. Segregating sites : In the manual you indicate that « As a rule of thumb, we often choose our projection to maximize the number of segregating sites in our final fs (assessed via fs.S()), although we have not formally tested whether this maximizes statistical power. » I have tried different sample sizes and the number of segregating sites varies a little bit but it doesn’t seem to affect the model selection. Is it better to adapt the number of samples according to the maximum of segregating sites for each population or can I project down to the same sample size for almost all the populations? For the group of 72 individuals, the best fs.S() is for a sample size of 104…I have never seen such a high sample size in the different analyses I have found... does it sound correct?

That rule of thumb is for the case when many SNPs have missing data (individuals missing calls). A SNP can only be in an SFS of size N if at least N individuals have been called for that SNP. But if you don't have much missing data, you should just use the full sized SFS for each population you have. There's no need to project down to the same sample size for all populations. (I did it in the first PLoS Genetics paper, but I didn't need to, and I retrospect it would have been slightly better not to.)

2. Model selection : For some populations, all the model implemented in Dadi.Demographic1D give about the same LL. The AIC are also very close. But when I look a the residual plots (see document attached), they seem to show a trend for all the model except one (the bottlegrowth). Can I choose this model based on the value of the LL (the highest) and the residual plot to make simulations with ms with this model?

The bottle-growth model does fit slightly better, but it's very slight. I doubt you would find a significant difference if you did a likelihood ratio test. So I would stick with the simpler two-epoch model. In my experience, if you over-fit by choosing too complex a model, the results become harder to interpret.

3. Parameters : If I choose the bottlegrowth model I was wondering how to interpret the parameters. nuB is equal to 5.7 and nuF is equal to 1.5. My understanding is that the population had an instantaneous size change whereby the population size has been multiplied by 5.7. And then, the population started to growth exponentially to increase again the population size by 1.5 times the ancient population. Doest it make sense? I have seen a recent publication where they had the same kind of parameters for a bottlegrowth model: the first parameter nuB was > 1, meaning expansion, and the nuF parameter was < 1, corresponding to a bottleneck. To me the model looked more like a growth-bottleneck model. Is it possible to interpret the parameters in this manner?

Yes, the second interpretation is correct, if nuB = 5.7 and nuF=1.5, the population would be growing then decreasing.

Best,

Ryan

Sorry for the long email and thank you very much for your help and this forum that helps me a lot with Dadi,

Best,

Caroline

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.
<DADI models.pdf>

Ryan Gutenkunst
Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569
http://gutengroup.mcb.arizona.edu

carooo...@gmail.com

unread,

Jan 23, 2015, 4:02:51 PM1/23/15

to dadi...@googlegroups.com

Hi Ryan,

thank you very much for your help!

Best,

Caroline

Reply all

Reply to author

Forward

0 new messages