get vcf from tree-seq record

646 views
Skip to first unread message

Charlotte Her

unread,
Feb 28, 2019, 10:57:08 AM2/28/19
to slim-discuss
Hi,

I begin with SLiM and it's great to see so many documentation, but with all these informations I'm a little bit lost.

I'd like to simulate data with several demographic variation, and a selective sweep at some point in time.
Moreover I want to keep the ancestry informations but also several haplotypes of the extant population.

To do that, I used tree sequence records and recapitation and I overlaid neutral mutations with msprime

My problem is that when I transform my final .trees into vcf (with msprime), I get only one individual with the same genotype at all genomic positions. Is there something to do before to get several haplotypes from the extant population (a kind of outputVCFsample but for tree-seq) ?

Thanks !

--Charlotte

Ben Haller

unread,
Feb 28, 2019, 11:30:21 AM2/28/19
to slim-discuss
Hi Charlotte.  I don't know anything about VCF output from msprime.  At a wild guess, though, it sounds from the msprime docs (https://msprime.readthedocs.io/en/stable/api.html) like write_vcf() writes the "sample", an msprime concept that can be manipulated.  Perhaps you have the sample set to a single genome (presumably unintentionally), and you need to set it to be the set of genomes that you want to output?  Peter Ralph, who often reads this list, might have more to say.  If that doesn't help, then this sounds like a question for the msprime folks, I would say.  Good luck!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Charlotte Her

unread,
Feb 28, 2019, 11:43:05 AM2/28/19
to slim-discuss
Hi Ben,

Thanks for your help, I also post a question on msprime group and I'll take a closer look on this "sample" concept.

I also think to use slim by loading tree-seq file with readFromPopulationFile() and then using outputVCFsample(). Do you think it's possible ?

Ben Haller

unread,
Feb 28, 2019, 11:46:13 AM2/28/19
to slim-discuss
I don't see why not.  Of course you'll want to write out the .trees file from Python using pyslim, so that it is in a form that SLiM is able to read back in.  Given that, I would think that outputVCFSample() would work fine.  Of course there's nothing like trying it.  :->

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Charlotte Her

unread,
Feb 28, 2019, 12:06:24 PM2/28/19
to slim-discuss
One last question,

I'll try with this script :

initialize() {
   
   
    initializeSLiMModelType("WF");
    initializeTreeSeq();
    initializeMutationRate(2e-8);
    initializeMutationType("m1", 0.5, "f", 0.0);
    initializeGenomicElementType("g1", m1, 1.0);
    initializeGenomicElement(g1, 0, 44047080);
   
        // read rec map
    lines = readFile("rec_map_slim_chr26.txt");
    rates = NULL;
    ends  = NULL;
   
    // get ends and rates in the right format (see man. p98)
    for (line in lines)
    {
        components = strsplit(line, "\t");
        ends       = c(ends, asInteger(components[0]));
        rates      = c(rates, asFloat(components[1]));
    }
   
    ends  = c(ends[1:(size(ends)-1)] - 2, 44047080);
    rates = rates * 1e-8;
    initializeRecombinationRate(rates, ends);
   
}

1 early() {
    sim.readFromPopulationFile("slim_test2_recap_overlaid.trees");
}

1 late() {
    p0.outputVCFSample(20, filePath = "./slim_test.vcf");
}


but I get an error : ERROR (SLiMSim::__TabulateMutationsFromTables): (internal error) mutation metadata length does not match derived state length.

What does this mean?

Ben Haller

unread,
Feb 28, 2019, 12:21:36 PM2/28/19
to slim-discuss
I think this almost certainly means that the version of SLiM you are using, and/or the version of pyslim you are using, are not in synch.  If you update both to the latest release version the error ought to go away.

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Peter Ralph

unread,
Feb 28, 2019, 12:27:26 PM2/28/19
to Ben Haller, slim-discuss
Hello again, Charlotte!

As I said on the msprime list
I'm happy to help debug your VCF issue, but I'll need to see the python code you're using to get the VCF.



--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To post to this group, send email to slim-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/35111b2e-389c-4220-ba01-c38809397b8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Charlotte Her

unread,
Mar 1, 2019, 2:24:08 AM3/1/19
to slim-discuss
Thank you Peter, I just used the code in the example :

import msprime, pyslim

ts = pyslim.load("slim_test2_recap_overlaid.trees")

with open("slim_test2_recap_overlaid.vcf", "w") as vcf_file:
        ts.write_vcf(vcf_file, 1)
Reply all
Reply to author
Forward
0 new messages