Realistic background selection

Julien Y. Dutheil

unread,

Nov 1, 2021, 4:58:20 AM11/1/21

to slim-discuss

Hi,

We are trying to perform some (relatively) realistic simulations under background selection. "Realistic" here means with a gene density and recombination map that look like that of a real chromosome. For that, we combined two examples from the SLiM manual, one that implements BGS, yet with arbitrary equality sized and distant genes along the chromosome, with the one that reads a recombination map from a file. We used data from chromosome 2L of Drosophila.

The issue is that when looking at the resulting simulation, the average TMRCA is obviously inversely correlated with the recombination rate so that low-recombination regions have a higher genetic diversity (see attached figure). This seems relatively counter-intuitive to me and suggests that I must have done something wrong. (Note that low recombination regions are not less dense, so this is not the reason).

The code we use to simulate the tree sequence is in the attached file Simulate.slim.

As you can see, it is really a mix between the two examples from the manual, only adding the reading of the exon coordinates.

Neutral mutations are then added using msprime, and the resulting TMRCA computed using another script, getTMRCA.py.

I have to admit being a bit stuck there, any help (in the code or the interpretation of the results) would be very much appreciated!

All the best,

Julien.

getTmrca.py

Simulate.slim

Sim1.pdf

Ben Haller

unread,

Nov 2, 2021, 3:03:06 PM11/2/21

to slim-discuss

Hi Julien. I'm not really a population geneticist per se, so my instincts may be wrong on this, but "the average TMRCA is obviously inversely correlated with the recombination rate so that low-recombination regions have a higher genetic diversity" does not sound obviously wrong to me...? With a higher recombination rate, deleterious mutations can get purged and beneficial mutations can get fixed more efficiently, right? Which makes for low diversity. Whereas with a low recombination rate you can get a big block of the genome that contains both deleterious and beneficial mutations together (perhaps ending up being more or less neutral overall), and if that big block doesn't get split up by recombination they can't get separated from each other and so the whole region will just drift and accumulate more stuff – i.e., high diversity. This is why Muller's Ratchet is a problem for clonal (i.e., non-recombining) organisms, right? It looks like your model contains only deleterious mutations, but the same basic logic applies, doesn't it? Like I said, this is not my area of expertise, so apologies if I'm just being foolish. Anyhow, I was curious so I put together a very simple version of your model (apologies for the atrocious formatting):

initialize()

{

defineConstant("N", 10000);

defineConstant("L", 1e7);

initializeMutationType("m1", 0.0, "g", -(5 / N), 1.0); //Mean and shape

initializeGenomicElementType("g1", m1, 1.0);

initializeMutationRate(1e-10);

initializeGenomicElement(g1, 0, 2*L - 1);

initializeRecombinationRate(c(1e-6, 1e-8), c(L - 1, 2*L - 1));

}

1 { sim.addSubpop("p1", N); }

1: late() {

if (sim.generation % 1000 != 0)

return;

genomes = sim.subpopulations.genomes;

high_r = calcHeterozygosity(genomes, start = 0, end = L - 1);

low_r = calcHeterozygosity(genomes, start = L, end = 2*L - 1);

catn(sim.generation + ": high r == " + high_r + ", low r == " + low_r);

}

100000 late() { sim.simulationFinished(); }

This just makes a chromosome with a high-r half and a low-r half, and models a variety of deleterious mutations just as your model did. I'm not sure whether the values for L and for recombination rate are comparable to what you're using, and I used a mutation rate of 1e10 for speed and to model less than complete coverage of the chromosome by genes. So all that is kind of arbitrary and you might want to play around with this model more on your end to make it more comparable to your scenario. I only had the time/patience to run it for 20,000 generations, but it did seem like it was maintaining/building higher diversity in the low-recombination region:

1000: high r == 1.69526e-07, low r == 1.82844e-07

2000: high r == 3.59783e-07, low r == 3.89605e-07

3000: high r == 5.20246e-07, low r == 5.41626e-07

4000: high r == 7.13875e-07, low r == 6.6935e-07

5000: high r == 8.01378e-07, low r == 7.91837e-07

6000: high r == 8.66481e-07, low r == 1.00161e-06

7000: high r == 9.26864e-07, low r == 1.24064e-06

8000: high r == 1.01754e-06, low r == 1.39038e-06

9000: high r == 1.10932e-06, low r == 1.45669e-06

10000: high r == 1.06537e-06, low r == 1.45886e-06

11000: high r == 1.26335e-06, low r == 1.53748e-06

12000: high r == 1.31336e-06, low r == 1.66525e-06

13000: high r == 1.44629e-06, low r == 1.78663e-06

14000: high r == 1.51388e-06, low r == 1.92276e-06

15000: high r == 1.43256e-06, low r == 1.94525e-06

16000: high r == 1.3436e-06, low r == 2.00036e-06

17000: high r == 1.44247e-06, low r == 2.16455e-06

18000: high r == 1.49168e-06, low r == 2.07088e-06

19000: high r == 1.54869e-06, low r == 2.05062e-06

20000: high r == 1.66637e-06, low r == 2.16987e-06

That's only one run, though, and for only 20,000 generations; I don't know whether this is replicable/significant, or what the equilibrium would look like. But anyhow, I didn't see any obvious bug in your model, and what you observe seems plausible to me. Maybe folks who know more pop-gen than I will chime in on this (I'd be interested to hear that), but I think maybe there is no problem here?

Cheers,

-B.

Benjamin C. Haller

Messer Lab

Cornell University

Julien Y. Dutheil

unread,

Nov 2, 2021, 5:37:11 PM11/2/21

to slim-discuss

[just realised I only replied to the author, so I copy my answer here]

Dear Ben,

Thank you very much for your answer. Indeed in our setup, only deleterious mutations are simulated. That recombination facilitates selection makes up for low diversity at the selected loci, but those should supposedly be a minority. At neutral, linked loci, however, recombination should prevent the reduction of diversity due to selection by suppressing LD. So the usual rationale is "high recombination" => "less reduced diversity because of linkage" (that is, higher diversity).

Meanwhile, I found out a bit more precisely where the issue/oddity might be. When looking at the TMRCA directly from the tree sequence simulated by SLIM, I do observe the expected pattern: the TMRCA is reduced in low-recombining regions. Only after the recapitation step do I observe the opposite pattern! So it seems that the recapitation step is the issue here... maybe I should ask the msprime folks?

Many thanks again for your insights,

Julien.

Ben Haller

unread,

Nov 2, 2021, 5:50:32 PM11/2/21

to slim-discuss

Hi Julien,

OK, I'm clearly out of my depth on this, so I'll bow out. :-> Probably Peter Ralph will weigh in regarding recapitation, etc. Good luck!

Cheers,

-B.

Jose Luis Mijangos

unread,

Nov 2, 2021, 6:32:17 PM11/2/21

to Ben Haller, slim-discuss

Hi Julien,

I believe what you are seeing in your simulations is called associative over dominance (AOD). AOD occurs when two or more deleterious alleles are segregating at linked locations in the genome. With an appropriate combination of selection, population size and low recombination (Zhao & Charlesworth, 2016; Becher et al., 2020; Gilbert et al., 2020) the selection to remove the deleterious alleles is impeded by the advantage to double heterozygotes with the deleterious alleles in repulsion (i.e., on different haplotypes), in which the deleterious effects of the mutations are obscured. Genetic diversity is then maintained at other sites in the genomic regions surrounding the deleterious loci, with alternative alleles being maintained in the haplotypes linked to each deleterious mutation.

I investigated AOD during my PhD and we will soon publish some articles about our results.

Meanwhile, I wrote some code to use slim in R using slimR written by Russell Dinnage (https://rdinnager.github.io/slimr/). These simulations show how AOD and background selection work. Code for this simulations can be accessed in https://github.com/mijangos81/slimr_workshop_CBA

References

Zhao, L., & Charlesworth, B. (2016). Resolving the conflict between associative overdominance and background selection. Genetics, 203(3), 1315-1334.
Becher, H., Jackson, B. C., & Charlesworth, B. (2020). Patterns of genetic variability in genomic regions with low rates of recombination. Current Biology, 30(1), 94-100.
Gilbert, K., Pouyet, F., Excoffier, L., & Peischl, S. (2020). Transition from background selection to associative overdominance promotes diversity in regions of low recombination. Current Biology, 30(1), 101-107.

Cheers,
Luis

--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/c9f5b359-cb24-4581-bf80-02c4dffcbceen%40googlegroups.com.

Julien Y. Dutheil

unread,

Nov 3, 2021, 3:36:34 PM11/3/21

to slim-discuss

Dear Luis,

Thank you very much for the insights! This sounds like a plausible mechanism, but I would expect to see this effect on the TMRCA of the tree sequence generated by SLiM, right? Here I only observe this after the recapitation, which, if I understand correctly, considers a purely neutral model. So the question is, why does recapitation adds a much older TMRCA in low recombining regions, or maybe, in regions where the TMRCA was lower?

Cheers,

Julien.

Julien Y. Dutheil

unread,

Dec 13, 2021, 4:36:09 AM12/13/21

to slim-discuss

Hi,

Just a small update on this: after several attempts and offline discussions, it seems that the issue with this is that the low-rec regions are more prone to have multiple roots than high rec regions. Therefore, recapitation adds extra TMRCA in those regions. I also realised that I was using a coefficient of dominance of 0 (fully recessive mutations). Setting h = 0.5 and letting the population evolve for more generations leads to the expected behavior, with a pattern of TMRCA / pi that looks much more like that of real data.