scheme for non-parametric bootstraped datasets

Julie Jacquemin

unread,

Nov 11, 2015, 4:02:00 AM11/11/15

to dadi-user

Dear Ryan and dadi users,

I started using dadi for my demographic analysis and everything worked great, but now i want to estimate my parameters uncertainties and i'm stuck...because to be honest, i don't understand at all the principle of Non-parametric bootstrap of my dataset from independant units of my data. Googling Non-parametric bootstrap didn't help so i hope some of you can answer my following questions:

1) I know the basic principles of a bootstrap : in my file with SNPs, resampling with replacement over my different SNPs, which each represent 1 unit. But i don't understand what you mean when you say ' instead of resampling from the SNPs, resample from regions you've sequenced'. Concretely how does this work? My SNPs are distributed over 9 different scaffolds, so i guess those would be my independant units. But it doesn't mean i sample only 1 SNP per scaffold right? otherwise i would have only 9 SNPs in each of my bootstraped dataset... If i sample several SNPs for each scaffold, then the results is the same as if i was sampling SNPs in the general file without taking in account the scaffold information in my opinion: i would have again SNPs that are potentially not independent because located on the same scaffold...

2) Must the bootstraped datasets have the same length (same number of SNPs) than the original dataset, or is it supposed to be a subset of those initial SNPs? Should the different boostraped datasets have the same total length between each other? Should they have the same number of SNPs from each of my scaffolds?

Thanks

Julie

Gutenkunst, Ryan N - (rgutenk)

unread,

Nov 11, 2015, 5:06:04 PM11/11/15

to dadi...@googlegroups.com

Hi Julie,

In the bootstrap, you divide your sequence data into regions that can (ideally) be considered independent. Then you resample from those regions and generate new frequency spectra, using all the SNPs from each sampled region. You want to sample the non-independent SNPs on each scaffold; that's how you capture the effect of that non-independence.

The bootstraps do not need to have the same number of SNPs or total length.

Note that using our newly developed Godambe methods will be more efficient that reoptimizing parameters for all your bootstraps: http://dx.doi.org/10.1093/molbev/msv255 . If you take this approach, please cite the paper!

Best,

Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To post to this group, send email to dadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/dadi-user.
For more options, visit https://groups.google.com/d/optout.

--

Ryan Gutenkunst

Assistant Professor
Molecular and Cellular Biology
University of Arizona
phone: (520) 626-0569, office LSS 325

http://gutengroup.mcb.arizona.edu

Julie Jacquemin

unread,

Nov 12, 2015, 4:41:25 AM11/12/15

to dadi-user

Dear Ryan,

Thanks a lot for your quick answer.

I will then divide my scaffolds into smaller units that can be considered independent. So to be sure i understood correctly, if i have a SNP file like that

SNP1 region 1

SNP2 region 1

SNP3 region 1

SNP4 region 1

SNP5 region 2

SNP6 region 2

SNP7 region 3

SNP8 region 3

SNP9 region 3

SNP10 region 3

SNP11 region 3

SNP12 region 4

A first bootstrap could look like that

SNP5 region 2

SNP6 region 2

SNP12 region 4

SNP5 region 2

SNP6 region 2

And another one could look like that

SNP1 region 1

SNP2 region 1

SNP3 region 1

SNP4 region 1

SNP12 region 4

SNP7 region 3

SNP8 region 3

SNP9 region 3

SNP10 region 3

SNP11 region 3

?

Thanks again for your help

Julie

Gutenkunst, Ryan N - (rgutenk)

unread,

Nov 12, 2015, 9:28:56 AM11/12/15

to dadi...@googlegroups.com

Yes.

Mikhail Matz

unread,

Nov 13, 2015, 9:58:47 PM11/13/15

to dadi-user

Hello Julie - I have a little script (attached) that bootstraps dadi data across genomic scaffolds (i.e., it resamples scaffolds with replacement). This might work for you if your genome consists of 100's - 1000's scaffolds.

Mikhail

dadiBoot.pl

Julie Jacquemin

unread,

Nov 17, 2015, 4:06:04 AM11/17/15

to dadi-user

Hello Mikhail,

Thanks a lot, the script worked perfectly! Really useful

Julie

Reply all

Reply to author

Forward