putative mutation rate from sequence divergence

205 views
Skip to first unread message

bio...@gmail.com

unread,
Oct 28, 2016, 4:44:14 AM10/28/16
to dadi-user


Hi all,

I now just touch the field of demographic estimation, after reading your paper (Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data), I noticed that you have got a putative mutation rate from your real data, but I can not calculate from the present data (The human-chimp divergence in the data is 1.13%. We assumed a divergence time of 6 My [45] and a generation time of 25 years. This yielded an estimated neutral mutation rate of u=2.35*10^8 per site per generation, which is comparable to direct estimates [46]) by myself. In addition, how do you get the divergence between human and chimp, can I just use the following formula to get the average divergence from SNP data? 

So I hope you can give us a description about your processes about getting the mutation rate. 


Many thanks.

Best

Zheng Zhuqing

Gutenkunst, Ryan N - (rgutenk)

unread,
Oct 28, 2016, 1:43:30 PM10/28/16
to dadi...@googlegroups.com
Hello Zheng,

We got the divergence between human and chimp just by counting differences between the aligned reference genomes. Because we have an estimate of the divergence time in years and the generation time, we could convert that to a per-generation mutation rate.

If you do have such data for your species, you’ll need to estimate the mutation rate some other way, perhaps by assuming it is similar to a closely related species.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at https://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.

--
Ryan Gutenkunst
Assistant Professor of Molecular and Cellular Biology, University of Arizona
phone: (520) 626-0569, office: LSS 325, web: http://gutengroup.mcb.arizona.edu

Latest papers: 
“Selection on network dynamics drives differential rates of protein domain evolution”
PLoS Genetics; http://dx.doi.org/10.1371/journal.pgen.1006132
"Triallelic population genomics for inferring correlated fitness effects of same site nonsynonymous mutations"
Genetics; http://dx.doi.org/10.1534/genetics.115.184812
"Whole genome sequence analyses of Western Central African Pygmy hunter-gatherers reveal a complex demographic history and identify candidate genes under positive natural selection"
Genome Research; http://dx.doi.org/10.1101/gr.192971.115

bio...@gmail.com

unread,
Oct 29, 2016, 10:14:06 AM10/29/16
to dadi-user
Hi Ryan,

Thanks for your response, I know how to calculate with your data, 0.0113*25/(2*6*1000000), but I do not know why did you divide by 2, can you explain it for me?

In addition, my SNPs data is huge which includes more than 40M loci, can I just use a portion of these loci to do demographic inference, such as prune my SNPs data to remove linkage disequilibrium, just use intergenic sites to construct folded SFS? I hope you can give us some suggestions about selecting sites from whole genome resequencing data. 

Thanks.
Best

Zheng Zhuqing


在 2016年10月29日星期六 UTC+8上午1:43:30,Ryan Gutenkunst写道:
Hello Zheng,

We got the divergence between human and chimp just by counting differences between the aligned reference genomes. Because we have an estimate of the divergence time in years and the generation time, we could convert that to a per-generation mutation rate.

If you do have such data for your species, you’ll need to estimate the mutation rate some other way, perhaps by assuming it is similar to a closely related species.

Best,
Ryan

Gutenkunst, Ryan N - (rgutenk)

unread,
Oct 29, 2016, 11:27:42 AM10/29/16
to dadi...@googlegroups.com
Hello Zheng,

On Oct 29, 2016, at 7:14 AM, bio...@gmail.com wrote:
Thanks for your response, I know how to calculate with your data, 0.0113*25/(2*6*1000000), but I do not know why did you divide by 2, can you explain it for me?

The factor of two is from the definition of the population genetic parameter theta.

In addition, my SNPs data is huge which includes more than 40M loci, can I just use a portion of these loci to do demographic inference, such as prune my SNPs data to remove linkage disequilibrium, just use intergenic sites to construct folded SFS? I hope you can give us some suggestions about selecting sites from whole genome resequencing data. 

Yes, you can use a subset of your data. If you’re interested in demographic history, you want ones without much selection on them. Intergenic or synonymous sites are the most common to use. There’s no real need to prune to remove linkage disequilibrium.

Best,
Ryan
Reply all
Reply to author
Forward
0 new messages