From the manual, I found fastsimcoal2 can calculate the expected joint SFS from the simulated data using the -d option. However, before I perform the simulation, an observed joint SFS file should be prepared first. At the beginning, I plan using dadi to calculate the observed joint SFS from my SNP data, and I found its calculation speed was quite slow because of the three pops with larger sample sizes.
So do you have any suggestions on calculation the observed joint SFS from my SNP data so as to facilitate the simulation using fastsimcoal2?
I am looking forward to your response!
Thank you!
Qingpo
1) It said "Enables the analysis of DNA sequence data coded as SNP (i.e. 0,1,2,3 instead of C,A,T,G)", it means in the input file, 0, 1, 2, and 3 represent C, A, T, and G, respectively? If yes, how to understand "SNP allele coded as zero (0) is assumed ancestral"? For example, at a SNP site there is a A and C variation, we must define the corresponding site in different individuals as 1 or 0. If A is the ancestral state, so ...
2) If I have SNP numbers greater than 1 million that is longer than the maximum length of DNA input sequence required by Arlequin, how can I process this larger file?
3) There are some missing data at some specific SNP sites of different individuals. Can I put "?" at the missing data site?
Thank you!
Qingpo