SFS from RAD data – how to deal with chromosome and block structure?

daniel...@unibas.ch

unread,

Jun 11, 2014, 6:32:11 AM6/11/14

to fasts...@googlegroups.com

Dear Laurent

My empirical SFS data are based on ca. 15 million base positions in total, but these are located on > 166,000 small (89-base) RAD loci (restriction-site associated DNA) scattered randomly all across the species’ genome. This makes tracking the chromosome and block structure a nightmare. Hence I was wondering if you could suggest a simple way how to define the chromosome, block, and recombination settings. E.g., could all these data be treated as derived from a single, large chromosome? Or am I overlooking something and these setting are not important when working with observed SFS data anyway?

cheers,
daniel

jeffrey...@wright.edu

unread,

Jun 12, 2014, 1:01:19 PM6/12/14

to fasts...@googlegroups.com

I am having the same problem. My understanding (based on a previous post) is that you can only have 1 chromosome when using FREQ. Just for practice, I used my observed SFS (calculated from 33,000 nucleotides from 378 RAD loci; ~1,400 SNPs) to estimate demographic parameters under an IM model with 0 recombination. I actually obtained reasonable results from this; i.e., the results were similar to what I obtained analyzing sequences from 20 introns with Jody Hey's IM software. I'm now rerunning fastsimcoal with a low recombination rate defined (1e-8) to see how comparable the results are. I'll be happy to share what I find. In the meantime, any advice is very much welcomed!

Laurent Excoffier

unread,

Jun 13, 2014, 1:52:35 AM6/13/14

to fasts...@googlegroups.com

Hi Daniel,

what do you want to do with fastsimcoal2, simulating RAD tag data or estimating parameters with it?
In the latter case, you just need to define your data type as FREQ and specify you want simulate only one locus, and fastsimcoal2 will do the job.
If you want to simulate what you have, and if you can assume that you have 166000 independent RAD seqs, then simulate them as independent chromosomes, with one RAD seq of 89 bp each. Then within your 89 bp, you can define the recombination rate and mutation rate you want.

For FREQ data, recombination rate is irrelevant.

Hope it helps

laurent

Laurent Excoffier

unread,

Jun 13, 2014, 1:56:03 AM6/13/14

to fasts...@googlegroups.com

To be more specific, your par file could look like
//Number of independent loci [chromosome]
166000
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per generation recombination and mutation rates and optional parameters
DNA 89 recRate mutRate 0.33

with recRate and mutRate values of your choice

and with FREQ data it could look like

//Number of independent loci [chromosome]
1
//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per generation recombination and mutation rates and optional parameters
FREQ 1 0 mutRate OUTEXP

daniel...@unibas.ch

unread,

Jun 13, 2014, 3:08:28 AM6/13/14

to fasts...@googlegroups.com

Laurent and Jeff
My goal is parameter estimation based on observed empirical SFS, hence FREQ mode. Thanks for clarifying this!
daniel

jeffrey...@wright.edu

unread,

Jun 16, 2014, 1:09:12 PM6/16/14

to fasts...@googlegroups.com

Just a quick update:

I have now estimated demographic parameters from the joint SFS while defining "0 recombination" and "low recombination" (1e-8). The results from the two separate runs were comparable for all parameters except migration. When assuming a single chromosome with no recombination, fastsimcoal inferred strongly asymmetrical migration (<1 migrant per generation vs ~8 migrants per generations moving in each direction). However, when assuming low recombination, symmetrical migration was inferred (~2.5 migrants in each direction).

Are we losing information on the stochasticity of lineage sorting by simulating a single underlying gene tree?

jeffrey...@wright.edu

unread,

Jun 16, 2014, 1:11:26 PM6/16/14

to fasts...@googlegroups.com

On a related note, dadi goes to the opposite extreme and assumes that each SNP is independent. I'm trying to get it up and running to compare with fastsimcoal, but I'm finding the manual difficult to follow.

Laurent Excoffier

unread,

Jun 24, 2014, 3:41:25 AM6/24/14

to fasts...@googlegroups.com

Actually, with FREQ data, recombination is irrelevant, so the difference you obtain seem to be just random fluctuations in the final parameters.

You should make sure that you have reached a maximum likelihood, implying that you should repeat the estimation procedure quite a number of times (e.g. 100x) and select the solution with maximum likelihood in the end.

My impression is that 2.5 or 8 migrants are actually very close to each other and if you were computing confidence intervals by parametric bootstrap,these estimates would not differ significantly

best

laurent

Reply all

Reply to author

Forward