Exporting a subset of variants

25 views
Skip to first unread message

Richard Kerr

unread,
Sep 1, 2020, 2:10:48 AM9/1/20
to msprime-users
Just started using msprime and it seems real fast!

My problem is that I am relatively new to Python and writing code isn't coming all that easy.

Someone mentioned in the past that to get a dataset that fits the bill you should aim to get more SNPs than you need, and then randomly subsample.

Can anyone advise me on the most efficient way to sample a subset of the SNP? I have successfully called msprime.simulate and have the tree sequence object at hand.

Ultimately I would like to write the genotypes out using write_vcf(), but only write out the subsample. Not sure if this is possible.

Any help would be greatly appreciated.

Richard

University of Tasmania, Australia

Jerome Kelleher

unread,
Sep 2, 2020, 7:12:00 AM9/2/20
to msprim...@googlegroups.com
Hi Richard,

The best way is to subset the sites like you've been doing in your other
email, and then call write_vcf() on the subsetted tree sequence.

Cheers,
Jerome
> --
> You received this message because you are subscribed to the Google
> Groups "msprime-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to msprime-user...@googlegroups.com
> <mailto:msprime-user...@googlegroups.com>.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/msprime-users/89c243e5-8807-453e-8fd2-3421d69394bfn%40googlegroups.com
> <https://groups.google.com/d/msgid/msprime-users/89c243e5-8807-453e-8fd2-3421d69394bfn%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages