Suppose that I model 3 subpopulations resulting from population splits, e.g.
initialize() {
initializeMutationRate(1e-7);
initializeMutationType("m1", 0.5, "f", 0.0);
initializeGenomicElementType("g1", m1, 1.0);
initializeGenomicElement(g1, 0, 99999);
initializeRecombinationRate(1e-8);
}
1 { sim.addSubpop("p1", 500); }
1000 { sim.addSubpopSplit("p2", 100, p1); }
3000 { sim.addSubpopSplit("p3", 10, p2); }
If I now write three output files in MS format for each subpopulation, I get segregating sites specific to each subpopulation, so that (for instance) segregating site #5 in population 1 need not correspond to a homologous site for segregating site #5 in population 2. The only way to parse which are homologous (so that, for example, frequencies can be compared across samples) is to create a dictionary from the mutation tables.
10000 late() { p1.outputMSSample(10, replace=F, filePath="~/S1.txt");}
10000 late() { p2.outputMSSample(10, replace=F, filePath="~/S2.txt");}
10000 late() { p3.outputMSSample(10, replace=F, filePath="~/S3.txt");}
Instead of this, I would like to have output similar to ms, where the set of segregating sites represents the union of those in p1,p2,p3, so that the first 10 rows correspond to sites in p1, the next 10 in p2, and the last 10 rows to p3.
I want to do something along the lines of p_all = c(p1,p2,p3) and then have
p_all.outputMSSample(... etc)
but this naive approach doesn't work. Is there some way to have SLiM return MS format output so that the set of columns represents the union of segregating sites and that the rows are ordered from the first to last sample from each population, i.e. k1 + k2 + k3 where k1,k2,k3 are the number of samples from each subpopulation?