Simulation of SNPs in fsc. Why not using sequence lengths of 1

108 views
Skip to first unread message

Another Gecko

unread,
Sep 4, 2022, 7:51:37 PM9/4/22
to fastsimcoal2
Hi,
I have a question regarding the simulation of SNPs and I understand from the manual that:
. Use short segments of DNA and the -sX option to generate X SNPs instead'


I therefore followed the examples and created something like 10000 chromosomes and sequences length of 10 to simulate non-linked SNPs (based on examples found on the web).
All examples I found use sequence lengths between 10 and 100 to simulate SNPs.

My question is now why I cannot simple simulate a sequence length of 1 for SNPs (because then I would not have to deal with multiple SNPs along a sequence) and I have the "feeling" that I do not understand exactly how SNPs are simulated and this might let me do something that is wrong/stupid.

Thanks in advance,
Bernd

Austin Koontz

unread,
Nov 4, 2022, 5:40:16 PM11/4/22
to fastsimcoal2
Hi Bernd, 

I'm just a fellow fastSimcoal user, but: when you simulate SNPs using the DNA marker type, you specify the length of the sequence using the "num loci" parameter in the last line of the parameter file. As noted in the manual, for DNA markers, this value is the sequence length. For instance, the lines below:

```
//Number of independent loci [chromosomes]
10 0
//Per chromosome: Number of linkage blocks
6
//Per block: data type, num loci, rec. rate and mut rate + optional parameters
DNA 30 0 0.000005 0.333333333333333
DNA 30 0 0.000005 0.333333333333333
DNA 30 0 0.000005 0.333333333333333
DNA 30 0 0.000005 0.333333333333333
DNA 30 0 0.000005 0.333333333333333
DNA 30 0 0.000005 0.333333333333333
```
This would simulate 10 chromosomes, with 6 "blocks" per chromosome. Each block would be 30 bp long. I'd recommend reading the "Input File Syntax" section of the manual for more explanation on this.

Specifying a sequence length of 1 would simulate a single base pair as the genetic information for your individual--which isn't typically how SNP data is presented. Usually, SNP data consists of a series of base pairs (50 bp, 100 bp, even 500 bp), with a single or multiple sites (SNPs) within that section of DNA.

Hope this helps,
-Austin

Reply all
Reply to author
Forward
0 new messages