recapitation of slim ts gives very low nucleotide diversity

58 views
Skip to first unread message

Max Shpak

unread,
Jul 4, 2022, 12:55:04 PM7/4/22
to slim-discuss
I created a 3-population model in Slim with an initial split at 2e3 generations and a more recent split at 1e3, with a selective sweep occurring in the third (most recent split) population. The only mutations introduced by Slim are those undergoing the sweep in the third population.

The three populations have respective sizes 2e4, 4e3, and 4e3

When I recapitate this tree with an ancestral population size of 2e4 and apply mutations at a per-site substitution rate of 1.22e-8, I obtain nucleotide diversity that is nearly 100-fold too low. The population of size 20K undergoing neutral evolution should have pi ~ 1e-3, what I get instead is 2e-5. When I run the same mutation/recombination/conversion rates for this demogrpahic in msprime, I get the correct value of pi.

I suspect that the problem is in the recapitation step, but I don't see the possible source of error.

I've appended both my slim and pyslim code below:

Slim code for 3 population demography and sweep in third population, conditioning on fixation, generates ThreePopFix.trees

initialize() {
 initializeTreeSeq();
 initializeMutationRate(0);

 initializeMutationType("m2", 1.0, "f", 0.025);
 initializeGenomicElementType("g1", m2, 1.0);
 initializeGenomicElement(g1, 0, 99999);
 initializeRecombinationRate(7.04e-8);
 initializeGeneConversion(0.838, 518, 1.0);
}
1 early() {defineConstant("simID", getSeed());}
1 early() { sim.addSubpop("p1", 2e+4); }
1 early() { sim.addSubpopSplit("p2", 4e+3, p1); }
1000 early() { sim.addSubpopSplit("p3", 4e+3, p2); }

1000 late() {
sim.outputFull("restart_slim_" + simID + ".txt");
x = rdunif(1,0,99999);
target = sample(p3.genomes, 8, replace=F);
target.addNewDrawnMutation(m2, x);
}
2000 late() {
m2mut = sim.mutationsOfType(m2);
freqs = sim.mutationFrequencies(p3, m2mut);

//print(freqs);
if(any(freqs == 1.0)){
  cat(simID + ": FIXED\n");
  sim.deregisterScriptBlock(self);
  sim.treeSeqOutput("ThreePopFix.trees");
  sim.simulationFinished();
}
else
{
 //cat(simID + " : NOT FIXED - RESTARTING\n");

 sim.readFromPopulationFile("restart_slim_" + simID + ".txt");
 setSeed(rdunif(1, 0, asInteger(2^62) - 1));
 // re-introduce a sweep mutation
 x = rdunif(1,0,99999);
 target = sample(p3.genomes, 8, replace=F);
 target.addNewDrawnMutation(m2, x);
}
}

Recapitation (with ancestral population of 2e4, no structure) and applying mutations:

ts = tskit.load("ThreePopFix.trees")
ts_new = pyslim.SlimTreeSequence(ts)

rts = pyslim.recapitate(ts_new, recombination_rate = 1.14e-8, gene_conversion_rate = 5.9e-8, gene_conversion_tract_length = 518, ancestral_Ne=20000, random_seed = 10)

mts = msprime.sim_mutations(rts, rate = 1.22e-8, keep=True, random_seed = mutation_seed, model=msprime.BinaryMutationModel())

diversity = mts.diversity(sample_sets=[mts.samples(population=1), mts.samples(population=2), mts.samples(population=3),]).tolist()

Ben Haller

unread,
Jul 4, 2022, 1:17:47 PM7/4/22
to Max Shpak, slim-discuss
Hi Max,

Please work towards this being a "minimal reproducible example" – simplify the model to narrow down the cause.  You have gene conversion turned on in your SLiM model; if you turn it off, does the discrepancy still occur?  You have a selective sweep in one population; if you turn that off (since it should not affect the subpopulation where you are looking at diversity, as I understand it), does the discrepancy still occur?  You have multiple subpopulations; if you get rid of one, or two, does the discrepancy still occur?  And so forth.  This is essential to understanding what is going on.  In doing this, you might arrive at an understanding of the issue, and perhaps realize that it is not, in fact, a bug; or you might narrow the model down to a very specific reproducible case where you can say "if I take out any remaining aspect of the model, the discrepancy no longer occurs", which makes it much easier for one of us to figure out what the cause might be, and whether it is in fact a bug.  Producing a minimal reproducible example is necessary work before reporting something like this.  Thanks!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Max Shpak wrote on 7/4/22 9:55 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/c80185d5-3005-4dc3-b985-b97750f6a65en%40googlegroups.com.

Max Shpak

unread,
Jul 4, 2022, 6:11:15 PM7/4/22
to slim-discuss
I think that I've pinned down the problem - it isn't an issue with recapitation. Rather, it seems to be an issue with conditioning on fixation and restarting the simulation when the beneficial allele(s) are not fixed. Namely, if by chance the first execution goes to fixation by the last generation, I get pi ~ 1e-3, consistent with what the theory predicts. If the program has to re-run one or more times due to failure to fix, I invariably wind up with pi about 2 orders of magnitude too small, suggesting that lineages are lost in the restart.

It is as though the restart leads to the loss of population history rather than resetting it.

The method I'm using is to check if the beneficial allele is fixed at t = 2K, and if it isn't, to restart the simulation for t=1K, which coincides with the time of the second split just before the beneficial mutations are introduced. Presumably the error is with either the place where I save the partial tree or the way I call it on the restart (bold in code below) but I don't see what could be going wrong. Is there any reason why a setup like the one below would fail to capture the early genealogy on restart?

initialize() {
 initializeTreeSeq();
 initializeMutationRate(0);
 initializeMutationType("m2", 1.0, "f", 0.1);

 initializeGenomicElementType("g1", m2, 1.0);
 initializeGenomicElement(g1, 0, 99999);
 initializeRecombinationRate(1.2e-8);
}
1 early() {defineConstant("simID", getSeed());}
1 early() { sim.addSubpop("p1", 2e+4); }
1 early() { sim.addSubpopSplit("p2", 4e+3, p1); }

1000 early() {
sim.addSubpopSplit("p3", 4e+3, p2);
}

1000 late() {
sim.outputFull("restart_slim_" + simID + ".txt");
x = rdunif(1,0,99999);
target = sample(p3.genomes, 8, replace=F);
target.addNewDrawnMutation(m2, x);
}

2000 late() {
m2mut = sim.mutationsOfType(m2);
freqs = sim.mutationFrequencies(p3, m2mut);
if(any(freqs == 1.0)){
  sim.deregisterScriptBlock(self);
  sim.treeSeqOutput("ThreePopFix.trees");
}
else
{

 sim.readFromPopulationFile("restart_slim_" + simID + ".txt");
 setSeed(rdunif(1, 0, asInteger(2^62) - 1));
 x = rdunif(1,0,99999);
 target = sample(p3.genomes, 8, replace=F);
 target.addNewDrawnMutation(m2, x);
}
}

Ben Haller

unread,
Jul 4, 2022, 6:18:23 PM7/4/22
to Max Shpak, slim-discuss
Hi Max, thanks for following up.  I think I see the problem.  You're using outputFull(), which just saves out the current state of the simulation.  You need to be using outputTreeSeq() to output a .trees file, which will save the ancestry information.  An easy mistake to make; I think we saw the same problem a while back on the discussion list, in fact.  Maybe calling outputFull() when tree-sequence recording is enabled ought to issue a warning; but that seems perhaps excessive, since calling it could still be useful.  Hmm – maybe calling readFromPopulationFile() with a non-tree-sequence file when tree-seq recording is enabled is what ought to warn (or even error), especially if it is not just setting up the initial state of the simulation.  Peter, if you're reading this, what do you think?  Doing this works fine, but it's probably just about never what the user actually wants to do...


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Max Shpak wrote on 7/4/22 3:11 PM:

Max Shpak

unread,
Jul 5, 2022, 8:51:19 AM7/5/22
to slim-discuss
Just as a follow-up, that should be

treeSeqOutput()

rather than outputTreeSeq.

Thank you, Max

Max Shpak

unread,
Jul 5, 2022, 9:54:07 AM7/5/22
to slim-discuss
Using treeSeqOutput() did resolve my problem, I'm now getting values of pi very close to theta for the neutral populations.

One question I have is why outputFull() rather than treeSeqOutput() is used in the examples of conditioning selective sweeps on fixation in the manual (i.e. section 9.2, pages 163-4). Is there some reason why these examples save the current state of ts rather than the tree history up to the time point of interest, e.g.

1000 late() {
// save the state of the simulation
sim.outputFull(tempdir() + "slim_" + simID + ".txt");

Ben Haller

unread,
Jul 5, 2022, 11:26:02 AM7/5/22
to Max Shpak, slim-discuss
Hi Max,

Indeed, yes; thanks for the correction.  :->

And to clarify: the sweep recipes in chapter 9, which I gather you're working from, use outputFull() instead of treeSeqOutput() because those recipes do not use tree-sequence recording.  When using tree-sequence recording, the recipe in section 17.3 provides a better basis for work; it uses treeSeqOutput() to preserve the ancestry information on reload, as you want to do in your model.

Happy modeling!


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Max Shpak wrote on 7/5/22 5:51 AM:
Reply all
Reply to author
Forward
0 new messages