union of segregating sites across subpopulations

19 views
Skip to first unread message

Max Shpak

unread,
Jun 6, 2022, 5:36:06 PM6/6/22
to slim-discuss
Suppose that I model 3 subpopulations resulting from population splits, e.g.

     initialize() {
     initializeMutationRate(1e-7);
     initializeMutationType("m1", 0.5, "f", 0.0);
     initializeGenomicElementType("g1", m1, 1.0);
     initializeGenomicElement(g1, 0, 99999);
     initializeRecombinationRate(1e-8);
    }
   1 { sim.addSubpop("p1", 500); }
   1000 { sim.addSubpopSplit("p2", 100, p1); }

    3000 { sim.addSubpopSplit("p3", 10, p2); }
 
 If I now write three output files in MS format for each subpopulation, I get segregating sites specific to each subpopulation, so that (for instance) segregating site #5 in population 1 need not correspond to a homologous site for segregating site #5 in population 2. The only way to parse which are homologous (so that, for example, frequencies can be compared across samples) is to create a dictionary from the mutation tables.

    10000 late() { p1.outputMSSample(10, replace=F, filePath="~/S1.txt");}
    10000 late() { p2.outputMSSample(10, replace=F, filePath="~/S2.txt");}
     10000 late() { p3.outputMSSample(10, replace=F, filePath="~/S3.txt");}

Instead of this, I would like to have output similar to ms, where the set of segregating sites represents the union of those in p1,p2,p3, so that the first 10 rows correspond to sites in p1, the next 10 in p2, and the last 10 rows to p3.

I want to do something along the lines of p_all = c(p1,p2,p3) and then have
p_all.outputMSSample(... etc)

but this naive approach doesn't work. Is there some way to have SLiM return MS format output so that the set of columns represents the union of segregating sites and that the rows are ordered from the first to last sample from each population, i.e. k1 + k2 + k3 where k1,k2,k3 are the number of samples from each subpopulation?

Ben Haller

unread,
Jun 6, 2022, 6:21:20 PM6/6/22
to Max Shpak, slim-discuss
Hi Max,

Yes, this sort of thing is exactly the purpose of the output methods on Genome, which are more general (but lower-level and a little more complex to use) than those on Subpopulation.  Essentially – typing into email, so you may need to tweak this a bit to get it to work – you would first draw your samples:

sample1 = p1.sampleIndividuals(10);
sample2 = p2.sampleIndividuals(10);
sample3 = p3.sampleIndividuals(10);

then make a single vector of the genomes of those samples:

sampleGenomes = c(sample1, sample2, sample3).genomes;

and then output an MS file from that vector:

sampleGenomes.outputMS(filePath="~/S123.txt");

Note that making the sample genomes vector can probably be done more briefly:

sampleGenomes = c(p1,p2,p3).sampleIndividuals(10).genomes;

That ought to work too, I think.

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Max Shpak wrote on 6/6/22 2:36 PM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/e078e836-79b8-4bd4-bd28-f69c49eea351n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages