Just started using msprime and it seems real fast!
My problem is that I am relatively new to Python and writing code isn't coming all that easy.
Someone mentioned in the past that to get a dataset that fits the bill you should aim to get more SNPs than you need, and then randomly subsample.
Can anyone advise me on the most efficient way to sample a subset of the SNP? I have successfully called msprime.simulate and have the tree sequence object at hand.
Ultimately I would like to write the genotypes out using write_vcf(), but only write out the subsample. Not sure if this is possible.
Any help would be greatly appreciated.
Richard
University of Tasmania, Australia