# strange behaviour when modelling recurrent mutation

164 views

### Jason de Koning

Jun 13, 2017, 2:18:47 PM6/13/17
to slim-discuss
Hi Ben et al.,

First, I just wanted to say that we love SLiM! Thanks to you and Philip for all of your work on this excellent resource.

We are experiencing some unexpected behaviours when modelling some rather extreme parameter values. Maybe we're doing something wrong.

We are looking at recurrent fast mutation (\theta = 0.024), full dominance (h=1.0), and strong positive selection s=0.5 (2Nes = 1,000). To keep things simple we are using a single genomic element as in your example 10.5.1 for soft sweeps.

When heterozygous fitness is recessive (h=0), we see normal behaviour with serial fixations (after introducing a return process) happening at about the expected rate (~ every 4,801 generations). However, when we include dominance (h=1.0) or sometimes codominance (h=0.5), the population tends to get "stuck" at 50% frequency and just hangs out there. The expected rate of fixations for h=1 should be ~ one fixation every 381 generations (according to an exact calculation). Instead we see maybe one or two fixations and then this pathological behaviour that persists until the end of the simulation.

When the population is in this state, hovering around 50%, mutation counts suggest the entire population is homozygous for the mutant state (I think). However, allele frequencies are close to 0.5 and fitness is close to 1.5. I'm not sure why frequency isn't exactly 1, and why mean fitness is not exactly 1.5.

In this state we observe:

Output of countOfMutationsOfType():

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

"Mean fitness: " "1.86225"

Frequencies:

0.5085 0.4915

We were hoping that you might have an idea about what's happening? Code to reproduce the problem is below.

Many thanks,
Jason de Koning

// Note, for our application we introduce a return process that resets the population to mutant frequency of 0 following fixation. The problem described
// occurs with or without this return process

initialize() {

// fast recurrent mutation, \theta = 4 * 1000 * 6e-6 = 0.024

initializeMutationRate(6e-6);

// m1 mutation type: strong positive selection (dominant), 2Nes = 1000

// Note: things work as expected if h is set to 0.0 or 0.5

initializeMutationType("m1", 1.0, "f", 0.5);

// don't convert to substitution

m1.convertToSubstitution=F;

// don't stack mutations

m1.mutationStackPolicy = "f";

// g1 genomic element type: uses m1 for all mutations

initializeGenomicElementType("g1", m1, 1.0);

// uniform chromosome of length 1 bp with "uniform recombination"

initializeGenomicElement(g1, 0, 0);

// set recombination to be ~ free

initializeRecombinationRate(0);

}

fitness(m1) {

// UGLY HACK: whenever we hit frequency=1, introduce a return to frequency=0

if (sim.mutationFrequencies(p1, mut) == 1.0) {

print("Fixation occurred. Murdering the population to return freq to 0.");

p1.genomes.removeMutations( sim.mutationsOfType(m1), T );

}

// standard fitness return

if (homozygous)

return 1.0 + mut.selectionCoeff;

else

return 1.0 + mut.mutationType.dominanceCoeff * mut.selectionCoeff;

}

// create a population of 500 individuals

1 {

}

1:40000 {

if (sim.generation % 1000 == 0) {

// print mutation counts

allIndividuals = sim.subpopulations.individuals;

print(allIndividuals.countOfMutationsOfType(m1));

// print individual fitnesses

print(c("Mean fitness: ", mean(p1.cachedFitness(NULL))));

// print mutant freqs

cat("Frequencies:\n");

print(sim.mutationFrequencies(p1, sim.mutationsOfType(m1)));

}

}

40000 late() { sim.outputFixedMutations(); }

### Ben Haller

Jun 13, 2017, 6:45:48 PM6/13/17
to slim-discuss
Hi Jason.  I'm glad you're finding SLiM useful!

I think this behavior is actually correct, interestingly.  I'm not a population geneticist, so I will defer to Philipp for a proper explanation, but we were also surprised, some months back, by similar behavior in a similar model, and looked into it quite thoroughly and concluded that it was in fact correct.  So hopefully Philipp will reply on that front; if not, then perhaps I'll take a stab at explaining it to the best of my understanding.

For me, I would like to just note that the code you mark as an "ugly hack" in your comments is perhaps a bit uglier than you might expect.  Modifying genomes inside a fitness() callback is probably not a good idea, since SLiM is actively engaged in looping through the mutations in all of the genomes at that very moment.  When you remove mutations from genomes inside the fitness() callback, you kind of pull the rug out from under SLiM, and I'm not sure the resulting behavior will be terribly well-defined; indeed, I wouldn't be entirely surprised if it crashed now and then.  You should instead move that code into an early() or late() event, I think.  (I did so myself just now and verified that this was unrelated to the surprising behavior you described, though.)  It would probably be good to try to set some clear policies about what actions scripts are allowed to take during what generation stages, to avoid this sort of thing!  :->  Here's what I might suggest doing instead:

1: late()

{

// UGLY HACK: whenever we hit frequency=1, introduce a return to frequency=0

muts = sim.mutations;

freqs = sim.mutationFrequencies(p1, muts);

if (any(freqs == 1.0)) {

print("Fixation occurred. Murdering the population to return freq to 0.");

p1.genomes.removeMutations( sim.mutationsOfType(m1), T );

}

}

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

### Philipp Messer

Jun 13, 2017, 9:55:04 PM6/13/17
to Ben Haller, slim-discuss
Hi Jason,

We indeed observed exactly the same effect about half a year ago and it took us a while to convince ourselves that it's a true effect, not a bug in the simulation. The general mechanism behind this must be some form of epistasis resulting from linkage to the two "dominant" haplotypes. If I remember correctly, the heterozygotes for the two haplotypes always had much higher fitness, so they were always chosen as the parents in the next generation (while recombinants were suppressed). I think we never found a really satisfying universal explanation (other than that linkage can generally lead to non-additive behavior that can in principle results in such balancing selection). But I do remember that we were actually able to show the stability of such a system in a scenario with just a few loci and strong selection coefficients, where we explicitly calculated the fitness values of the different genotypes. It always stuck me as a pretty artificial scenario though, because of the large number of strong selection coefficients involved, so we didn't follow up on this. I thought it was just an example how the Wright-Fisher model gets weird under such parameters. Let us know if you find a better explanation.

Best,
Philipp

--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to slim-d...@googlegroups.com.

--
Philipp W. Messer
Assistant Professor
Department of Biological Statistics and Computational Biology
Cornell University
102J Weill Hall, Ithaca, NY 14853
mes...@cornell.edu | 408-636-8701
http://messerlab.org

### A.P. Jason de Koning

Jun 15, 2017, 3:09:09 PM6/15/17
to slim-discuss

Hi all,

After a somewhat lengthy off-list discussion, this issue has been resolved. Many thanks to Ben and Philipp for clarifying a number of points.

In summary:

Problem: We wanted to simulate a single biallelic locus and were interested in measuring properties of trajectories that ended in fixation. We encountered two problems with this. First, our populations were often getting “stuck” at 50% allele frequency, which was mysterious. Second, when this occurred, SLiM went from reporting one mutant (when calling sim.mutationFrequencies() ) to reporting two mutants. We did not expect this behavior when mutationStacking was turned off.

Explanation: As explained by Ben below, the issue is that SLiM tracks mutant lineages not simply the frequency of the mutant state. So, what we were observing was the de novo origination of a second mutant lineage while a first was already segregating. Because we had only one mutation type, with strong positive selection, the two mutant forms would together come to dominate the population, and they would reach an equilibrium of 50:50 frequency, since they had identical selection coefficients.

Solution: To avoid this scenario, we introduced a return process that identifies when the mutant state takes over the population (i.e., when the sum of the mutant frequencies reaches 1, or equivalently, when the sum of the mutant allele counts hits 2N). When this occurs, we reset the population and allow it to start again from a monomorphic wildtype population. The code for this follows:

// If running the simulation for 40,000 generations

1:40000 {

// Return process: whenever we hit frequency=1, introduce a return to frequency=0

muts = sim.mutations;

counts = sim.mutationCounts(p1, muts);

if (sum(counts) == 2N) {

print("Fixation occurred. Murdering the population to return freq to 0.");

p1.genomes.removeMutations( sim.mutationsOfType(m1), T );

}

}

This gives results as expected. I include edited excerpts of my conversations with Ben, below, in case it will be useful to the list.

Cheers,

- Jason

--
A.P. Jason de Koning, Ph.D.

Assistant Professor
University of Calgary, Cumming School of Medicine
Dept. of Biochemistry and Molecular Biology
Dept. of Medical Genetics
Alberta Children's Hospital Research Institute for Child and Maternal Health
and Arnie Charbonneau Cancer Institute

3330 Hospital Drive N.W.
Health Sciences Centre HSC 1151
Office: 403-210-7638 | Fax: 403-270-8928
Email: jason.d...@ucalgary.ca
Web:   http://lab.jasondk.io

---

Setting mutationStackPolicy to “f” specifically makes it so that if SLiM is considering introducing a new mutation at a given position, but a mutation of the same type already exists at that location in that individual, then the new mutation is suppressed.  But only if one already exists in that individual!  The location is not somehow “reserved” by the fact that a mutation already exists at that position in other individuals.  There is no such concept in SLiM.  So in your model as written, it is possible for two different mutations to be in circulation at the locus, and then it is perhaps clear how/why they might tend to “stick” at around 50% frequency, for the reasons Philipp described.

Is the idea that the model is still biallelic in the sense that there is one circulating mutant state, but that if a secondary mutant arises from a non-mutant during the segregation of an initial mutant, that SLiM considers the lineage arising from the second mutant a “different mutation”? I could see how this could be useful for tracking lineages during soft sweep scenarios, but as far as a (biallelic) WF model with mutation is concerned, reaching 100% frequency for the mutant state is fixation (regardless of whether the population arrived there via a ‘hard’ or ‘soft’ process). Perhaps this is where our thinking diverges. So to simulate a biallelic model should we be considering the frequency of the mutant state as the sum of the frequencies returned by sim.mutationFrequencies()? Or are you instead saying that SLiM is effectively modelling some kind of finite-sites infinite-alleles situation?

Every mutation that arises in SLiM is tracked separately.  There is no concept of a generic “mutant state”; there are just actual mutations.  Each mutation that arises is distinct.  Two mutations that have the same mutation type and the same selection coefficient are nevertheless considered distinct mutations; they do not get “merged” by the SLiM engine.  You can certainly do such merging yourself, if that is what you want; you can look for mutations that satisfy whatever your merging criteria are, and when you find a mutation B that you wish to merge into mutation A, remove B from all genomes where it occurs and add A to those genomes instead.  In fact, I have written a recipe for the next version of the SLiM manual that does exactly that; I can send you that draft if you like.  Or, as you say, if you want all mutations of a given type to be considered the same mutation, then you can simply add up their frequencies (or even better, to avoid possible floating-point roundoff issues, add up their counts, which are also available from SLiM).  Recipe 10.5.1 in the existing cookbook has a presentation of a soft sweep model that handles this, so you might look at that recipe for some example code.

It is also true that in a sense, with the default stacking behavior, SLiM models a sort of finite-sites infinite-alleles model, since new mutations and stacked mutations can produce what are effectively new alleles at each base position.  But since your model sets the stacking policy to “f”, that would then no longer be the case.

In either case, hanging out at 50:50 for the two mutants may make sense if they have identical large selection coefficients (as they do in our case). This perhaps raises an issue about how frequencies are displayed in the GUI. Until a second mutant lineage arises, frequency corresponds to frequency of the single mutant state. But after the second lineage arises, it continues to refer only to the frequency of the first mutant state. Is that right?

Each mutation is displayed with its own bar in the chromosome view.  If two mutations are at the same base position, their bars will overlap.  If they have different selection coefficients, then you might see a short bar of one color in front of a taller bar of the other color.  If they have the same selection coefficient, and are thus displayed in the same color, then you will, in effect, see just a single bar with a height corresponding to the higher frequency among the two mutations.  Again, SLiM in no way merges the two mutations together, conceptually or actually.  They are two separate mutations that just happen to be at the same location, use the same mutation type, and have the same selection coefficient.  So they are displayed independently in the GUI as well.

If you want to control new mutations completely, as you apparently do, then simply set the mutation rate to zero and introduce new mutations yourself using addNewMutation() or addNewDrawnMutation().  You can detect when there is no mutation existing (by checking sim.mutations) and introduce one.  If you want the introductions to occur stochastically, with some mutation rate, you can use the random functions provided by Eidos, such as rpois(), to do so, but with a check to prevent one from being added when one already exists, as you want.

Thanks!

### andrea mrnjavac

Nov 10, 2021, 8:07:58 AM11/10/21
to slim-discuss
Hi Ben, Jason and Philipp,

I'd also like to thank you for the comprehensive manual (and the hidden jokes in it!) and the helpful discussion group!
I have a similar problem when simulating strong selection. I simulated a single autosomal locus with fixed selection coefficient and weak selection (Nes<1), like this:
"initialize() {
initializeMutationRate(1e-3);

initializeMutationType("m1", h, "f", 0.001);

initializeGenomicElementType("g1", m1, 1.0);

initializeGenomicElement(g1, 0, 0);
initializeRecombinationRate(0.5);

initializeSex("A");
}

1 {
}

105000 late() { sim.outputFixedMutations("~/auto_pos_"+h+"_INDEX_0.001s_1l_100i.out.fixed"); }
"

I also did the simulation with 1000 loci and free recombination, and the results, observed number of substitutions in the simulation, correspond to the diffusion approximation.
However, when I tried to simulate mutations with stronger selective effects (Nes>1) (and high dominance coeff. value), I observed the phenomenon which Jason described: mutations reach an equilibrium at 50% frequency and the fixation cannot occur. I was really happy to see that this issue has been discussed and I used Jason's "hack". Forcing fixation with this hack indeed gave the results that match diffusion approximation for strong selection, but for weak selection the hack messes up everything, which actually makes sense. I still do not fully understand what is going on here, simulations corresponds to diffusion approximation for weak selection, but for strong selection I need to force fixation for the simulation results to match diffusion approximation. (just to add, this is occurring even when simulating 1000 loci instead of 1) Do you know why is this happening? And is it biologically justified then to use one simulation set-up for weak selection and another one for strong selection?

Best,
Andrea

### Ben Haller

Nov 10, 2021, 9:22:46 AM11/10/21
to slim-discuss
Hi Andrea.  Your model uses a dominance coefficient "h", but I don't think you mentioned what value of h you are using for these simulations.

I can't comment on matching the diffusion approximation, etc., since that's outside my area of expertise.

I can say that two possible issues occur to me, and I'm not sure which one (maybe both) applies to you.  One possibility is the same issue that Jason encountered: that multiple mutational lineages arose at the same position, from independent de novo mutations, but he wanted them to be considered "the same" for purposes of calculation of frequencies, fixation, etc.  As the discussion with Jason notes, there are a variety of ways of dealing with this; see the end of the list for the best option.  (1) You can merge different mutational lineages together after the fact, by detecting that a new mutation has occurred at a position that already has a mutational lineage going, and removing the new mutation and adding instead the existing mutation at that position.  (2) You can just let the multiple lineages exist, but add their frequencies together when doing your analysis.  (3) You can do what Jason did, and kill off the population whenever the sum of the frequencies reaches 1.0 – but that only works for single-locus simulations so it's not recommended in general.  (4) You could detect "fixation" (i.e., the sum of the frequencies of the mutational lineages at a given position being 1.0) and remove all mutations at that position from the simulation, similar to Jason's solution but it ought to work for multilocus simulations too.  (5) Probably the best and most "modern" solution, since it depends upon the mutation() callback mechanism in SLiM 3.5, is to use a mutation() callback to tell SLiM to use the existing mutational lineage rather than creating a new mutation.  Section 18.14 of the manual has a recipe for this, for modeling identity-by-descent (IBD) in nucleotide-based models.  The nucleotide-based aspect of the recipe is not important to the basic mechanism; you can just remove that, if you wish, and the approach should still work.  The mutation() callback would then look like this:

mutation(m1) {
m = sim.subsetMutations(position=mut.position);
if (m.length())
return m;
return T;
}

This causes new mutations to use an existing mutational lineage at the same position, if one exists, rather than creating a new mutation, thus providing an IBD model in SLiM (which is, I gather, what you and Jason both want).  See section 18.14 for further discussion.

The other possibility is that you are encountering what I call a "logjam", in which two alternative genomic states are favored, with balancing selection between them, because that outcome maximizes fitness.  This phenomenon depends upon the dominance coefficient being used.  If a mutation is toward the dominant end of the spectrum, then individuals can get most of the benefit of that mutation by possessing just one copy of it, in one genome; having a second copy in the second genome confers relatively little additional benefit (or no additional benefit, for complete dominance).  This can lead to divergence of the genome into two alternative states, A and B, where individuals that are AB across the whole genome have the highest fitness and individuals that are AA or BB have lower fitness, because the AB individuals are getting most/all of the fitness benefits from both the A and B sets of mutations, whereas AA and BB individuals get the fitness benefit of only one set of mutations (lower, due to dominance).  Recombination can break up this logjam if it is strong enough, producing individuals that are homozygous for *both* the A and B sets of mutations (AABB individuals, we could call them) and thus are even higher fitness than the AB individuals that are heterozygous for both; but if recombination is too weak and dominance is sufficiently strong, this "logjam" mechanism can force permanent divergence of the genome into A and B variants that each stay at ~50% frequency.  As this divergence proceeds, the fitness difference becomes sufficiently large that AA and BB individuals are effectively dead, with mating success near 0, and the AB individuals are the ones that breed in each generation.  This outcome is probably biologically unrealistic, a consequence of parameter values (mutation rate, dominance coefficients, distribution of fitness effects, recombination rate, etc.) that are unrealistic; if you're hitting this issue, the likely solution is to make your parameter values and/or your model more biologically realistic.  I thought there was some prior discussion of this issue on slim-discuss, but I'm having trouble finding it at the moment; if somebody else is able to dig it up that would be helpful (Miguel, maybe you were involved in that discussion?).  I'm not sure whether there's a standard pop-gen term for this phenomenon, but I think it likely is not captured by typical analytical models, as it is a multiallele/multilocus phenomenon driven by linkage.  Anyhow, it's not clear to me whether this issue is a factor in what you're doing or not, but I thought I ought to mention it as a possibility.  If you run your model under SLiMgui it is usually pretty clear when a logjam occurs, as you get a genome-wide divergence into two alternative states that balance around 50% frequency.  If you switch the chromosome view in SLiMgui into haplotype display mode (right-click on the chromosome view to get the context menu where you choose that), you can see the alternative A and B states as distinct genome-wide haplotypes.

I hope this helps!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

### andrea mrnjavac

Nov 16, 2021, 2:07:36 PM11/16/21
to Ben Haller, slim-discuss
Hi Ben,

thank you for the explanation. I was experiencing "logjam" and managed to fix it by lowering the mutation rate in order to prevent multiple mutations from segregating at high frequencies at the same time.

Best,
Andrea

--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to a topic in the Google Groups "slim-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/slim-discuss/iHrVKX6qWco/unsubscribe.
To unsubscribe from this group and all its topics, send an email to slim-discuss...@googlegroups.com.