Input data assumption and ANGSD to Dadi

698 views
Skip to first unread message

Yafei Mao

unread,
Jun 26, 2016, 9:45:27 PM6/26/16
to dadi-user
Hi All,

May I have your help for me to understand dadi more?

1. Do I need to filter out snp data before running dadi? In detail, I always filter missing data, maf, hwe and linked sites.. et al to get the independent 'neutral sites'. However, in my mind, the maf and hwe filtering would have effects on rare allele, does it matter for dadi running? In all, which filtering for snp data do i need to preform before dadi running?

2. I usually use ANGSD to preform analysis but also I tried use GATK for genotype calling and plink to filtering. But I have 5~8 x coverage for my samples so that ANGSD would be more precise. Thus, Could I use ANGSD sfs estimation results directly to dadi? if so, anything else I need to take care?

3. Actually, my samples (>200 inds) from different locations should be one population according to Admixture and faststructure analysis, also, I found the Fst among different location are pretty small (the largest one =0.008). Thus, I would like to split the one population to few 'subpopulation' or 'cluster' according to PCA or Admixture? Could dadi precise estimate the 'subpopulation' demographic history ?

Thanks a lot

Best, 
Yafei   

Gutenkunst, Ryan N - (rgutenk)

unread,
Jun 27, 2016, 7:08:25 PM6/27/16
to dadi...@googlegroups.com
Hello Yafei,


On Jun 26, 2016, at 6:45 PM, Yafei Mao <yaf...@gmail.com> wrote:
May I have your help for me to understand dadi more?

1. Do I need to filter out snp data before running dadi? In detail, I always filter missing data, maf, hwe and linked sites.. et al to get the independent 'neutral sites'. However, in my mind, the maf and hwe filtering would have effects on rare allele, does it matter for dadi running? In all, which filtering for snp data do i need to preform before dadi running?

Yes, the MAF filtering will definitely bias the frequency spectrum, and the HWE will likely do as well. The key for dadi is to have an accurate estimate of the frequency spectrum, so be careful about filtering steps that are biased against rare alleles.

2. I usually use ANGSD to preform analysis but also I tried use GATK for genotype calling and plink to filtering. But I have 5~8 x coverage for my samples so that ANGSD would be more precise. Thus, Could I use ANGSD sfs estimation results directly to dadi? if so, anything else I need to take care?

Yes, ANGSD results can be analyzed by dadi.

3. Actually, my samples (>200 inds) from different locations should be one population according to Admixture and faststructure analysis, also, I found the Fst among different location are pretty small (the largest one =0.008). Thus, I would like to split the one population to few 'subpopulation' or 'cluster' according to PCA or Admixture? Could dadi precise estimate the 'subpopulation' demographic history ?

You could try fitting a model to your subpopulations. With such low divergence, you may find it challenging to fit. In particular, you may find dadi inferring a very long “divergence” time with high gene flow. Those two parameters are confounded, however, so both estimates will have very low precision.

Best,
Ryan

Thanks a lot

Best, 
Yafei   

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.

--
Ryan Gutenkunst
Assistant Professor of Molecular and Cellular Biology, University of Arizona
phone: (520) 626-0569, office: LSS 325, web: http://gutengroup.mcb.arizona.edu

Latest papers: "Triallelic population genomics for inferring correlated fitness effects of same site nonsynonymous mutations"
"Whole genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection"

Yafei Mao

unread,
Jun 28, 2016, 7:35:42 AM6/28/16
to dadi-user
Hi Gutenkunst,

Thanks for your reply. May I have more question here^^?

For my case, low divergence among 'subpopulations', as you said, it will be changed to fit model. More, divergence time is not big deal for me but effective population is important for my analysis. 

1. Do you think dadi is powerful for estimating effective population size of single 'subpopulation' so that I could know whether bottleneck occurred in the past? in my mind, it should be work. 

2. Do you think filtering rare allele is good for my case in dadi analysis?

Thanks a lot~
Best, 
Yafei 

Gutenkunst, Ryan N - (rgutenk)

unread,
Jun 28, 2016, 12:48:41 PM6/28/16
to dadi...@googlegroups.com
Hello Yafei,


On Jun 28, 2016, at 4:35 AM, Yafei Mao <yaf...@gmail.com> wrote:
For my case, low divergence among 'subpopulations', as you said, it will be changed to fit model. More, divergence time is not big deal for me but effective population is important for my analysis. 

1. Do you think dadi is powerful for estimating effective population size of single 'subpopulation' so that I could know whether bottleneck occurred in the past? in my mind, it should be work. 

It’s worth a try. I assume you would see similar patterns for the different ‘subpopulations’, if they’ve had lots of migration between them.

2. Do you think filtering rare allele is good for my case in dadi analysis?

In general it’s not. You lose a lot of power. It’s better to try and model or correct any biases in rare alleles.

Best,
Ryan

Yafei Mao

unread,
Jun 29, 2016, 8:12:43 PM6/29/16
to dadi-user
Hi Ryan,

Thanks a lot, I will try it and let you know how will be going^^

Best, 
Yafei 
Reply all
Reply to author
Forward
0 new messages