How to get the observed SFS

1,342 views
Skip to first unread message

liuq...@gmail.com

unread,
Aug 2, 2015, 2:37:12 AM8/2/15
to fastsimcoal
Dear Laurent,
These days I want to use fastsimcoal2 to infer the demographic history of three plant populations having 200, 180, and 210 individuals, respectively.

From the manual, I found fastsimcoal2 can calculate the expected joint SFS from the simulated data using the -d option. However, before I perform the simulation, an observed joint SFS file should be prepared first. At the beginning, I plan using dadi to calculate the observed joint SFS from my SNP data, and I found its calculation speed was quite slow because of the three pops with larger sample sizes.

So do you have any suggestions on calculation the observed joint SFS from my SNP data so as to facilitate the simulation using fastsimcoal2?

I am looking forward to your response!
Thank you!
Qingpo

Laurent Excoffier

unread,
Aug 2, 2015, 3:32:43 AM8/2/15
to fastsimcoal
Yes, you can try to use arlequin to this
http://cmpg.unibe.ch/software/arlequin35/
SNP allele coded as zero (0) is assumed ancestral

best

laurent

liuq...@gmail.com

unread,
Aug 2, 2015, 4:07:44 PM8/2/15
to fastsimcoal
Hi Laurent,
Thanks for your prompt reply.
I read the Arlequin manual, and have three more questions.

1) It said "Enables the analysis of DNA sequence data coded as SNP (i.e. 0,1,2,3 instead of C,A,T,G)", it means in the input file, 0, 1, 2, and 3 represent C, A, T, and G, respectively? If yes, how to understand "SNP allele coded as zero (0) is assumed ancestral"? For example, at a SNP site there is a A and C variation, we must define the corresponding site in different individuals as 1 or 0. If A is the ancestral state, so ...

2) If I have SNP numbers greater than 1 million that is longer than the maximum length of DNA input sequence required by Arlequin, how can I process this larger file?

3) There are some missing data at some specific SNP sites of different individuals. Can I put "?" at the missing data site?

Thank you!
Qingpo

Laurent Excoffier

unread,
Aug 2, 2015, 4:13:33 PM8/2/15
to fastsimcoal
1) If you want to compute the unfolded SFS you need to know what is the ancestral and what is the derived state for each snp allele, and in this case I assume that zero is the ancestral state, whatever nucleotide it is.
Arlequin will also compute the folded SFS, and in this case it does nto matter what is 0,1, 2, or 3

2) In that case you need to split your sequence in smaller chunks, and process each file separately with arlequin/arlecore and then get the overall SFS as the sum of the separate SFSs

3) I'd advise to remove sites with missing data.

L

Reply all
Reply to author
Forward
0 new messages