converting vcf file to ms-style output

206 views
Skip to first unread message

sb

unread,
May 24, 2021, 3:59:06 PM5/24/21
to slim-discuss
Hi Ben,

I have simulated data using SLiM using tree sequence recording and produced a vcf file at the end. However, I want to produce ms-style output from this vcf file. How do you propose I should do that. I was able to produce ms-style output directly from SLiM but that was without tree sequence recording and was taking a long time to finish one simulation. 

cheers,
Noor

Ben Haller

unread,
May 24, 2021, 4:42:38 PM5/24/21
to slim-discuss
Hi Noor.  Well, I'm not sure *exactly* what you're trying to do, but you could certainly output *both* a .trees file and an MS file at the end of SLiM simulation.  But perhaps you're overlaying neutral mutations in Python, and you want those to be included in the MS output?  In that case,

- I think there has been talk of adding MS output to msprime/tskit; I'm not sure what the state of that is, perhaps Peter knows;

- msprime can certainly output VCF, and then I imagine there are open-source tools to convert VCF to MS, but I don't know about that offhand;

- you could write out a VCF from msprime and write your own VCF-to-MS converter if you can't find one; probably not super-hard as long as you're just trying to process the VCF files generated by your workflow (rather than all VCF files in general, which would be a complex undertaking since VCF is a rather complex file format);

- you could write Python code that writes out an MS file based upon the tree sequence itself (no VCF intermediate); I have no idea whether that would be easy or hard, I'm not particularly good at the Python side of things.  If you did that, you might submit it to the tskit folks for inclusion.

Anybody have a workflow for this already sorted out?

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Ben Jeffery

unread,
May 24, 2021, 5:24:24 PM5/24/21
to slim-discuss
Hi Noor,

tskit has a `write_ms` method! It is not in the public API documentation as it is a new-ish method that we are still refining (Your feedback would be useful in this respect), and which may be eventually moved to the "tsconvert" repo.
Basic usage:

with open("output.ms", "w") as ms_file:
    ts.write_ms(tree_sequence, ms_file)

Hope this helps,
Ben Jeffery
Reply all
Reply to author
Forward
0 new messages