Not achieving expected diversity metrics in simple diploid and haploid simulations

25 views
Skip to first unread message

Peter Laurin

unread,
Jun 26, 2024, 2:44:11 PM (7 days ago) Jun 26
to slim-discuss
Hi Ben et al.,

I've been working with simulating bacterial populations in SLiM following Jean Cury et al 2022's approach and have been puzzled recently when simple SLiM simulations return less diversity than analytically-expected values of genetic diversity given a certain mutation rate. I've attached two SLiM scripts here as a quick reference, one diploid and one haploid. Both run a constant N=10,000 population for 10*Ne simulations and then  take a MS sample of 100 individuals. I parameterize the mutation rate by giving SLiM an expected θ (here, 0.03) and then dividing by 2 * Ne for the haploid population and 4 * Ne for the diploid. 

Upon calculating pi and watterson's theta in these samples and replicating these simulations hundreds of times, I've noticed that the average outputted diversity is less than expected ( θ = 0.03). This is surprising to me, and I've not been able to wrap my head around why this may be occurring. Please see attached SLiM scripts and post-processing script. I've also attached my script for measuring diversity, but am unsure whether this is to fault as both pi and watterson's theta do seem to be matching one another (perhaps hard to see below, but means match). 

Quick summary of attached files:
haploid_minimal.slim -- SLiM script of haploid population
diploid_minimal.slim -- SLiM script of diploid population
parse_ms.py -- takes MS files from SLiM scripts, outputs to 'haplotype' file
compute_S_dist.py -- takes haplotype file or directory of haplotype file and outputs a text file of segregating sites, filename, and pi, in ranked order of highest # segregating sites to lowest
S_and_pi_counts.tsv -- one output of compute_S_dist.py, used to generate the figure below



Thanks for your help!

Peter

theta_off.png
Here,  the x axis shows the type of simulation (diploid with no recombination, haploid with no recombination, and diploid with recombination rate matching mutation rate). pi and theta S refer to pairwise diversity and watterson's theta, estimated using 100 samples. 0.03 is the expected diversity across simulations.







compute_S_dist.py
diploid_minimal.slim
haploid_minimal.slim
S_and_pi_counts.tsv
parse_ms.py

ckyri...@gmail.com

unread,
Jun 27, 2024, 11:15:48 AM (6 days ago) Jun 27
to slim-discuss
Hi Peter,

I just had a quick look at your scripts and it seems maybe that your burn-ins arent quite long enough for neutral diversity to get to equilibrium. A typical rule of thumb is that it takes ~10*Ne generations to get to equilibrium, which would mean 100,000 generations here with Ne=10,000 rather than just 10,000. So maybe try running the model a bit longer?

Best,

Chris

Peter Laurin

unread,
Jun 27, 2024, 12:51:05 PM (6 days ago) Jun 27
to slim-discuss
Thanks Chris!

Correct me if I'm wrong, but I think in this case I reschedule the script block to take place at 10*Ne = 100000 gens 

community.rescheduleScriptBlock(s1, start=N_generations, end=N_generations);

Apologies if between all the files, things are a bit opaque -- I'll probably simplify the scripts and creating a reproducible example if there's not a clear bug somewhere...

Peter



ckyri...@gmail.com

unread,
Jun 27, 2024, 1:11:04 PM (6 days ago) Jun 27
to slim-discuss
Ah ok sorry - missed the "s1" before the 10000 at the start of that late block. When I output heterozygosity every 1000 generations during the burn-in using the calcHeterozygosity() function (with Ne=1000 and r=1e-6) it seems to get to 0.03 by the end of the burn in. So maybe an issue with the downstream calculations?

gen,meanHet

1000,0.0116093

2000,0.0184296

3000,0.0224237

4000,0.0257944

5000,0.0268498

6000,0.0283921

7000,0.030477

8000,0.0327227

9000,0.0304859

10000,0.0296154

Peter Ralph

unread,
Jun 27, 2024, 1:15:05 PM (6 days ago) Jun 27
to Peter Laurin, slim-discuss
Hi, folks - 

I had a quick look here (and just saw Chris's reply), and I'm also getting diversities that agree with what's been inputted? There's rather too many scripts for me to dig into the details, though. (For future reference, my recommendation is to reduce your script to something that is totally minimal, runs fast, and reproduces the problem - probably, you will identify the problem along the way, and if not, it'll be much easier for someone else to find it. For instance, when I ran the code I changed pop_size to 100.)

So: with a simplified version of diploid.slim with a much smaller population size so it runs in a reasonable time:
```
initialize()  {
   initializeTreeSeq();
   defineConstant("simID", getSeed());
   defineConstant("pop_size", 100);
   defineConstant("d", 0.003);
   defineConstant("genome_size", 500000);
   defineConstant("N_generations", 10 * pop_size);
   defineConstant("mu", d / 4 / pop_size);
   defineGlobal("r", mu);
   initializeMutationRate(mu);
   initializeMutationType("m1", 0.5, "f", 0.0);
   initializeGenomicElementType("g1", m1, 1.0);
   initializeGenomicElement(g1, 0, genome_size);
   initializeRecombinationRate(r); // crossover rate
   m1.mutationStackGroup = 1;
   m1.mutationStackPolicy = 'l';
}

1 early() {
   sim.addSubpop("p1", pop_size);
   community.rescheduleScriptBlock(s1, start=N_generations, end=N_generations);
}

s1 10000 late() {
   sim.treeSeqOutput("test" + simID + ".trees");
}
```
and the following python script:
```
import tskit
import glob

tt = [tskit.load(x) for x in glob.glob("*.trees")]

print(sum([t.diversity() for t in tt]) / len(tt))
```
I get diversity values that are increasingly close to 0.003 the more replicates I run. So, if you are getting a discrepancy, here's some things I'd check:
  1. that you've run enough replicates / have enough recombination
  2. that you're not hitting saturation (i.e., lots of sites with multiple mutations)
  3. that there's not an issue with downstream calculation

--peter


From: slim-d...@googlegroups.com <slim-d...@googlegroups.com> on behalf of Peter Laurin <peter...@g.ucla.edu>
Sent: Thursday, June 27, 2024 9:51 AM
To: slim-discuss <slim-d...@googlegroups.com>
Subject: Re: Not achieving expected diversity metrics in simple diploid and haploid simulations
 
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/b75f06f7-4337-4964-b697-f1f2faacf075n%40googlegroups.com.

Peter Laurin

unread,
Jun 27, 2024, 1:28:14 PM (6 days ago) Jun 27
to slim-discuss
Thanks I'll try this out!

Peter

Reply all
Reply to author
Forward
0 new messages