My question deals with how to preserve/recover information about subpopulations specified in slim sim generated by treeSeqOutput. It seems that I can only recover attributes of the population as a whole rather than of the subpopulations, despite the fact that ts was created with an explicity demographic model with 3 supopulations of different size and split times.
In my specific case, I'm simulating a model with 3 subpopulations (one recent split, one more ancient split) and introducing selective sweeps in various subpopulations. In the simplest example, I only have a selective sweep in the third subpopulation (the code is set to restart if the sweep doesn't complete)
initialize() {
initializeTreeSeq();
initializeMutationRate(0);
initializeMutationType("m2", 1.0, "f", 0.025);
initializeGenomicElementType("g1", m2, 1.0);
initializeGenomicElement(g1, 0, 99999);
initializeRecombinationRate(7.04e-8);
initializeGeneConversion(0.838, 518, 1.0);
}
1 early() {defineConstant("simID", getSeed());}
1 early() { sim.addSubpop("p1", 2e+4); }
1 early() { sim.addSubpopSplit("p2", 4e+3, p1); }
1000 early() { sim.addSubpopSplit("p3", 4e+3, p2); }
1000 late() {
sim.outputFull("restart_slim_" + simID + ".txt");
x = rdunif(1,0,99999);
target = sample(p3.genomes, 8, replace=F);
target.addNewDrawnMutation(m2, x);
}
2000 late() {
m2mut = sim.mutationsOfType(m2);
freqs = sim.mutationFrequencies(p3, m2mut);
if(any(freqs == 1.0)){
cat(simID + ": FIXED\n");
sim.deregisterScriptBlock(self);
sim.treeSeqOutput("ThreePopFix.trees");
}
else
{
sim.readFromPopulationFile("restart_slim_" + simID + ".txt");
setSeed(rdunif(1, 0, asInteger(2^62) - 1));
// re-introduce a sweep mutation
x = rdunif(1,0,4999);
target = sample(p3.genomes, 8, replace=F);
target.addNewDrawnMutation(m2, x);
}
}
So far, so good. However, following recapitation and the addition of mutations, I am unable to recover any information for the subpopulations, e.g. in python:
rts = pyslim.recapitate(ts_new, recombination_rate= 1.14e-8, ancestral_Ne=20000, random_seed=recap_seed)
mts = msprime.sim_mutations(rts, rate = 1.22e-8, keep=True, random_seed = mutation_seed, model=msprime.BinaryMutationModel())
Because the underlying demography is a 3 subpopulation model, attributes such as segregating_sites and diversity should be 3 element arrays. Instead, they are only single element arrays, i.e. for the entire population, e.g.
>>> mts.diversity()
array(0.00022775)
>>>mts.segregating_sites()
array(0.00898)
For more complex models in particular, I will need these attributes on a per-subpopulation basis. Why isn't this information retained in the ts object (the treeSeqOutput), and what do I need to do in order for it to be preserved/recovered?