Hello Ryan,
many thanks for all your help throughout.
I have now started to simulate data with dadi using the IM demographic model which fitted much better the private polymorphisms.
The function to simulate data for each chromosome is:
mscore = Demographics2D.IM_mscore(params)
mscommand = dadi.Misc.ms_command(scaledtheta, datasize, mscore, int(args.rep), recomb=rec, rsites=chrL)
msfs=dadi.Spectrum.from_ms_file(os.popen(mscommand))
## fold simulated data because original data is also folded
msfold = msfs.fold()
In the end I would like to have about 100 replicates for each chromosome and then add them up to make the final dataset similar to the one used to estimate ML parameters. My problem now is whether I could do this by save the frequency spectrum msfold and then add all spectrums together, for example:
finalfs = msfold1 + msfold2 + ... + msfold16 (16 chromosomes)
or is it better to save the raw output from ms and then in some way concatenate all simulated chromosomes?
After making some tests, I noticed that the summed frequency spectrum of some data matches very closely to the freq spectrum of the concatenated data, although there were some differences but only in the far 8th decimal number, so I guess this approach should be ok.
Many thanks,
Pedro