SNP simulation with mutation rate

1,052 views
Skip to first unread message

Rémi Tournebize

unread,
Jan 7, 2015, 6:45:16 AM1/7/15
to fasts...@googlegroups.com
Dear Laurent and list,

Again many thanks for this wonderful and fast tool to simulate genetic data. I have read the "A special note on the simulation of SNP data" paragraph of the fastsimcoal manual but the simulation of SNP datasets with specified mutation rate appears still a little obscure to me. I would like to simulate a dataset of 10000 SNP with a mutation rate of, say, 2e-8. How can I pass the mutation rate information to fastsimcoal? Is the only way to do so is to specify a DNA dataset in the .tpl file and then output SNP using the -S and -I parameters in the command line?

Many thanks for your lights,

Best regards and happy new year,

Rémi

Laurent Excoffier

unread,
Jan 21, 2015, 6:07:59 AM1/21/15
to fasts...@googlegroups.com
As I replied by email, you cannot use a mutation rate with the SNP data type.
I am also not recommending simulating SNPs with the SNP data type, but rather to use DNA and  the -s option to specify how many SNP you want to keep.
Using the SNP data type indeed leads to biases in the SFS, e.g. after a bottleneck.

So, if you simulate very short DNA fragments, e.g. 10 bp, you would be sure that each fragment would have either 0 or 1 snp.

Then for instance by specifying the option -s10000, you would keep the first 10,000 simulated SNPs, which which should be independent.

To see how many fragments would need to be simluated to get 10,000 SNPs of course depends of your mutation rate and on your demography, and some trials with different mutation and demographic parameters should be done first


jason....@gmail.com

unread,
Jun 17, 2015, 5:58:45 PM6/17/15
to fasts...@googlegroups.com
Hi Laurent,

I would also like to simulate, say 10,000 SNPs for each of 10 simulations. I tried the above instructions as follows:

./fsc25211 -t 1popDNArand.tpl -n 1 -e 1popDNArand.est -E 10 -s 10000

//per Block: data type, num loci, rec. rate and mut rate + optional parameters
DNA 10 0.00000 MUTRATE 0.33

But it only outputs 1 or a few SNPs per file (because each sequence is only 10bp long), not 10000.

Is there anyway to simulate 10,000 SNPs without invoking -I, and have them all output to a single file? If not, I am curious what you had in mind for the above response (perhaps to combined SNPs across thousands of output files?).

Thanks,

Jason

Laurent Excoffier

unread,
Jun 21, 2015, 2:59:53 AM6/21/15
to fasts...@googlegroups.com
Hi Jason,

-s 10000 defines a maximum no. of SNPs to output. If you do not reach this number with the current mutation rate, then it outputs what the total (lower) number of SNPs.

However, you can generate SNPs at more than one locus.

For example in the example below, I tel fsc to generate 100000 independent loci of 10 bp

//Number of independent loci [chromosome]
100000 0
//Per chromosome: Number of linkage blocks
1

//per Block: data type, num loci, rec. rate and mut rate + optional parameters
DNA 10 0.00000 MUTRATE 0.33

and with the -s 100000 option, it wil keep a maximum of 10000 SNPs generated across the 100000 segments.

As I told to Rémi, you can adjust the mutation rate to generate at most 1 SNP per 10 bp , such that you wil have your 10,000 independent SNPs generated this way.

Hope it helps

laurent

jason....@gmail.com

unread,
Jul 5, 2015, 7:19:49 PM7/5/15
to fasts...@googlegroups.com
Hi Laurent,

Thanks for your response. I don't know the mutation rate in my case. From the context of you response it suggests that it doesn't matter what the mutation rate is so long as it generates the desired number of SNPs. Is that correct?

Thanks,
Jason

Laurent Excoffier

unread,
Jul 10, 2015, 10:25:12 AM7/10/15
to fasts...@googlegroups.com
If there is no recombination, or if you are simulating independent loci, and at most one snp per locus, yes you are right, but if you are simulating a large genomic region where  there is recombination, then the mutation rate does matter.

In the example delineated above with many independent short DNA segments, the mutation rate would not matter as long as you have enough SNPs.
But if you specify a very low mutation rate, you may not get enough SNPs. You can maybe do some test runs to adjust this mutation rate, such as to have at most one mutation per simulated DNA segment

best
L

kritika...@gmail.com

unread,
Nov 27, 2015, 1:42:50 AM11/27/15
to fastsimcoal
Dear Laurent,
I am trying to simulate ~2000 SNPs using fastsimcoal within ABCtoolbox. As you suggested I have been playing around with the mutation rate to determine which is a best way to obtain SNPs. Below is the command I am using with -s2000 setting.

//Number of independent loci [chromosome]
100000

//Per chromosome: Number of contiguous linkage Block: a block is a set of contiguous loci
1
//per Block:data type, number of loci, per generation recombination and mutation rates and optional parameters
DNA 5 0 0.000001 0.33

I usually get enough SNPs using the above command. Is it possible to output only a single SNP per 5bp DNA fragment. When Ne is high, I get more than 1 SNP per fragment. Should I reduce the DNA fragment length to 3 to avoid linked SNPs.

Thank you

Regards
Kritika

Laurent Excoffier

unread,
Dec 1, 2015, 5:48:32 AM12/1/15
to fastsimcoal
Hi,

it really depends what you want to do with these SNPs, and if you really have such a high mutation rate it is unlikely that your observed SNPs are also independent.
But you may try reducing the size ot 3

L
Reply all
Reply to author
Forward
Message has been deleted
0 new messages