Odd behavior in nonWF Island-model

168 views
Skip to first unread message

Daniel Priego E.

unread,
May 15, 2023, 2:36:25 PM5/15/23
to slim-discuss
Hi everyone,

I've been working on a simple nonWF dynamics on top of a simple Wright island model, but somehow I don't get a match between the simulations and prediction from classical popgen. In the model, at cycle 1 I introduce a marker neutral mutation (m1) at position 1000 with allelic frequency of 50%, in a region of 100kb, and track its dynamics over 50000 generations. I  calculate the following quantities:
 FST, for either that locus of interest (m1_Fst.svg) or for the entire genomic region (genome_Fst.svg), the time until fixation for the m1 mutation (t1_fix.svg), and genomic heterozygosity (genome_Het.svg). For all these plots (except for time until fixation), I and take the average of cycles 10001 through 49996, and average over 100 replicates.
I can't get an intuition of why mutation and recombination are making an effect if all mutations are neutral (m1 and m2), but the apparent trend is that the lower the mutation and recombination rates the better the match between simulation and prediction.
Is there something obvious from how SLiM works that I could be missing?

The formulas I use for FST in the marker mutation m1 case is: (1-m)^2 / (2*N-(2*N-1)*(1-m)^2) , and in the whole genomic region is (1-m)^2*(1-mu)^2 / (2*N-(2*N-1)*(1-m)^2 *(1-mu)^2) . Time until fixation is -4*Ne*(p0*log(p0)+(1-p0)*log(1-p0)).
m is migration rate, mu is the SLiM mutation rate times genomic region size, N is deme size, p0 is the initial allelic frequency of mutation m1, Ne is the diploid effective population size given by N*nd / (1-FST), wherein FST doesn't consider mutation.

Thanks for any feedback on this!

Best, 

Daniel Priego
  
SimpleNonWFIsland.slim
t1_fix.svg
genome_Het.svg
genome_Fst.svg
m1_Fst.svg

Daniel Priego E.

unread,
May 15, 2023, 4:04:51 PM5/15/23
to slim-discuss
I forgot to mention other parameter values, deme size N =10, number of demes nd=100. I should emphasize that the mutation of interest (m1) is introduced only once at the first cycle, whereas the other neutral mutations (m2) can be introduced along the the rest of time steps according to the defined mutation rate. As both of the mutations that I'm modeling are neutral by definition, I'm expecting no background selection effects at all.

Elissa Suphapun Sorojsrisom

unread,
May 16, 2023, 11:41:18 AM5/16/23
to slim-discuss
Best to wait for someone who knows more to weight in, but my first thought is that it may be a burn-in problem. You are introducing your tracked mutation right at the beginning of the simulation on an "empty" chromosome that carries no background diversity. It wouldn't surprise me that changing mutation and recombination rates changes the dynamics of how neutral standing diversity is reached at the beginning of the simulation. 

Another thing is that classical popgen assumes a strict adherence to the population size - if I am interpreting your simulation correctly, you are drawing the litter size from a poison distribution with a mean K, which will naturally cause the actual number of individuals produced each generation to fluctuate. I didn't see an implementation for ensuring that no more than N individuals (carrying capacity) are actually created - so it may be possible that the true population size is fluctuating (this seemed to be the case when I ran the simulation). I believe this could really mess up your results compared to expectations - I'd be interested to see the difference if you do change this part of the simulation. 

All the best,
~Elissa

Ben Haller

unread,
May 16, 2023, 12:26:08 PM5/16/23
to Elissa Suphapun Sorojsrisom, slim-discuss
Hi Daniel et al.,

I like Elissa's ideas.  Note that section 16.15 of the manual shows how to build a nonWF model that keeps a constant population like (like a WF model); that might be useful for testing Elissa's second idea.

Another suggestion I'd make is that you might have a bug in your FST calculation code.  SLiM has a built-in function, calcFST(), that will calculate FST for you; you might try using that to see whether it solves your problem or not.  If it does, then perhaps your calculation code has a bug, or perhaps different ways of calculating FST are coming up with substantially different results for some reason, with your simulation.

It's hard to guess, beyond that, since you didn't provide any of your code.  In general, if you want help finding a bug in your model, the best/easiest course is to provide your code at the outset – simplified down to a minimal reproducible example, of course.

Good luck!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Elissa Suphapun Sorojsrisom wrote on 5/16/23 11:41 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/9904e9c8-cc34-4aa3-88da-699b86f7f55cn%40googlegroups.com.

Daniel Priego E.

unread,
May 16, 2023, 2:31:32 PM5/16/23
to slim-discuss
Dear Eliisa and Ben,

Thanks for your replies to my post.
I had chosen to attach the SLiM script instead of pasting a huge amount of text, and I naively thought that all the files that I had attached in my original message were visible to you. Actually, this is the first time I post something in Google Groups ever, sorry about that. Since I don't know how to paste my graphics inline, I'll try to figure out how to share my plots here with you so that we're on the same page. 
I understand the idea of having a burn-in period, but I'm not sure it should make a difference for the case I'm exploring. This is a simple tracking of the probability of fixation of a single allele (m1), whose fixation should be affected only by drift as 1) it is neutral, and 2) the neighboring mutations of the region are neutral. Taking as reference the predicted time until fixation of a single allele, in my simulations m1 allele is getting fixated earlier, which implies a smaller effective population size. In principle, the marker allele m1 of my code shouldn't be lost by mutation, as mutations in SLiM are stacked, nor be gained at any other time point except for cycle 1. As for the point of having a WF implemented as a nonWF, I just to wanted to clarify that that's exactly what I'm implementing. The calcFST() function doesn't calculate the F-statistic in the way I need, as that's a paired calculation between 2 sets of genomes, whereas I'm calculating FST for a metapopulation of more than 2 subpopulations, so I extended that function to apply it over a multiple subpopulations case. 
I run my slim code with slirm developed by Vincent Buffalo (https://github.com/vsbuffalo/slirm) to be able to run several replicates in a combination of parameters in a server running Slurm. Having said that, please find my code pasted below:


/*

  Given two haploid genomes, the number of segregating sites between the two, divided by the length of the genomic region, is pi, the average nucleotide heterozygosity, which is the standard measure for the genetic diversity in a population.  The expected value of pi is 4*Ne*mu for a pure neutral model.

 This is an extension of calcPairHeterozygosity function, which allows to select a specific set of mutations for such estimation of pi

*/
function (f$)calcPairHeterozygosityMut(object<Genome>$ genome1, object<Genome>$ genome2, [Ni$ start = NULL], [Ni$ end = NULL], [No<MutationType> mutationTypes=NULL],[logical$ infiniteSites = T])
{
if (community.allSpecies.length() > 1)
{
species = unique(c(genome1.individual.subpopulation.species, genome2.individual.subpopulation.species), preserveOrder=F);
if (species.length() != 1)
stop("ERROR (calcPairHeterozygosityMut()): genome1 and genome2 must belong to the same species.");
}
else
{
species = community.allSpecies;
}
if (!isNULL(mutationTypes)){
muts1 = c();
muts2 = c();
for (mtype in mutationTypes){
muts1 = c(muts1, genome1.mutations[genome1.mutations.mutationType == mtype]);
muts2 = c(muts2, genome2.mutations[genome2.mutations.mutationType == mtype]);
}
}
else{
muts1 = genome1.mutations;
muts2 = genome2.mutations;
}
length = species.chromosome.lastPosition + 1;

// handle windowing
if (!isNULL(start) & !isNULL(end))
{
if (start > end)
stop("ERROR (calcPairHeterozygosityMut()): start must be less than or equal to end.");
m1pos = muts1.position;
m2pos = muts2.position;
muts1 = muts1[(m1pos >= start) & (m1pos <= end)];
muts2 = muts2[(m2pos >= start) & (m2pos <= end)];
length = end - start + 1;
}
else if (!isNULL(start) | !isNULL(end))
{
stop("ERROR (calcPairHeterozygosityMut()): start and end must both be NULL or both be non-NULL.");
}

// do the calculation
unshared = setSymmetricDifference(muts1, muts2);
if (!infiniteSites)
unshared = unique(unshared.position, preserveOrder=F);

return size(unshared) / length;
}

/*

  metapopHeterozygosity estimates the heterozygosity (pi) in a metapopulation by calculating the pi over a random sample.  Note that this function normally uses a random sample of individuals, of a size supplied in the optional parameter sampleSize; but if sampleSize is equal to the number of individuals , it will instead compute pi exactly.  For large metapopulations, that will of course be much slower than using a sample.

*/
function (f$)metapopHeterozygosity(o<Subpopulation> subpops, [No<MutationType>$ mt=NULL], [Ni$ start = NULL], [Ni$ end = NULL], [i$ sampleSize=100])
{
if (sampleSize >= sum(subpops.individualCount))
sampledIndividuals = subpops.individuals;
else
sampledIndividuals = sample(subpops.individuals, sampleSize, replace=F);

pi_total = 0;
for (individual in sampledIndividuals)
{
genomes = individual.genomes;
individual_pi = calcPairHeterozygosityMut(genomes[0], genomes[1], start, end, mt);
pi_total = pi_total + individual_pi;
}
return pi_total / sampleSize;
}

/*

  calcMetaFST is an extension of the native calcFST function to estimate FST of a metapopulation
*/

function (f$)calcMetaFST(o<Subpopulation> subpops, [No<MutationType> mt = NULL], [Ni$ start = NULL], [Ni$ end = NULL])
{
if (subpops.length() == 0)
stop("ERROR (calcMetaFST()): subpops must be non-empty.");
if (sum(subpops.genomes.length()) == 0)
stop("ERROR (calcMetaFST()): genome vector must be non-empty.");
if (community.allSpecies.length() > 1)
{
species = unique(subpops.species, preserveOrder=F);
if (species.length() != 1)
stop("ERROR (calcMetaFST()): all genomes must belong to the same species.");
if (!isNULL(muts))
if (!all(species == muts.mutationType.species))
stop("ERROR (calcMetaFST()): all mutations must belong to the same species as the genomes.");
}
else
{
species = community.allSpecies;
}
// handle mutation type
if (isNULL(mt))
muts = species.mutations;
else {
muts = c();
for (mtype in mt){
muts = c(muts, species.mutationsOfType(mtype));
}
if (size(muts) == 0)
return 0.0;
}

// handle windowing
if (!isNULL(start) & !isNULL(end))
{
if (start > end)
stop("ERROR (calcMetaFST()): start must be less than or equal to end.");
mpos = muts.position;
muts = muts[(mpos >= start) & (mpos <= end)];
length = end - start + 1;
}
else if (!isNULL(start) | !isNULL(end))
{
stop("ERROR (calcMetaFST()): start and end must both be NULL or both be non-NULL.");
}

// do the calculation
mean_p = 0.0;
H_s = 0.0;
non_empty_demes = 0;
for (subpop in subpops)
{
if (subpop.genomes.length() == 0)
next;
non_empty_demes = non_empty_demes + 1;
ps_p = subpop.genomes.mutationFrequenciesInGenomes(muts);
mean_p = mean_p + ps_p;
H_s = H_s + 2.0 * ps_p * (1.0 - ps_p);
}
mean_p = mean_p / non_empty_demes;
H_s = H_s / non_empty_demes;
H_t = 2.0 * mean_p * (1.0 - mean_p);
meanH_s = mean(H_s);
meanH_t = mean(H_t);
if (size(meanH_s) == 0)
meanH_s = 0.0;
if (size(meanH_t) == 0)
meanH_t = 0.0;
if (meanH_s == meanH_t)
fst = 0.0;
else
fst = 1.0 - mean(H_s) / mean(H_t);
return fst;
}

// set up a simple neutral nonWF simulation
initialize() {
initializeSLiMModelType("nonWF");
defineConstant("N", dsize); // carrying capacity for each deme
defineConstant("nd", ndemes); // number of demes
defineConstant("mig_rate", m);  // migration rate
defineConstant("meanls", K); // mean litter size for each reproduction event
defineConstant("L",100000); // chromosome length
defineConstant("A",1000); // locus where the marker mutation (m1) will land
// m1 mutation type: neutral=
initializeMutationType("m1", 0.5, "f", 0.0);
m1.convertToSubstitution = T;
m1.color = "red";

// m2 mutation type: neutral
initializeMutationType("m2", 0.5, "f", 0.0);
m2.convertToSubstitution = T;

// g1 genomic element type
initializeGenomicElementType("g1", m2, 1.0);

// uniform chromosome of length 100 kb with uniform recombination
initializeGenomicElement(g1, 0, L-1);
initializeRecombinationRate(rho*1e-10);
initializeMutationRate(mut*1e-10);

defineGlobal("m1Fixed",0);
if (exists("subdir"))
defineGlobal("parentDir", paste(dir,subdir,sep="/"));
else
defineGlobal("parentDir",dir);
}

1 first(){
sim.setValue("a_m1(t)",NULL);
}
2: first(){
sim.setValue("a_m1(t)",sum(sim.subpopulations.individuals.countOfMutationsOfType(m1))/(sum(sim.subpopulations.individualCount) * 2) + m1Fixed);
}

reproduction() {
// group competition (i.e., pick winning demes)
subpops = sim.subpopulations;
parent_demes = seqAlong(subpops);
for (pd in seqAlong(parent_demes)) {
p = parent_demes[pd];
psubpop = subpops[p];
if (psubpop.individualCount > 0){
litterSize = asInteger(round(mean(rpois(N,meanls))))*psubpop.individualCount;
parents = sample(psubpop.individuals, 2*litterSize, replace = T);
for (i in seqLen(litterSize)) {
psubpop.addCrossed(parents[2*i], parents[2*i+1]);
}
}
}

// disable this callback so it's only run once per cycle, not once per individual
self.active = 0;
}

1 early() {
// create an initial set of nd subpopulations, each with N individuals

for (i in seqLen(nd)){
sim.addSubpop(i, N);
}
// introduce marker mutation at 50% frequency
nmuts = integerDiv(sim.subpopulations.genomes.size(),2);
target = sample(sim.subpopulations.genomes, nmuts);
target.addNewDrawnMutation(m1, A);
}

// provide density-dependent selection and migration
early() {
subpops = sim.subpopulations;
inds = subpops.individuals;
   
// make generations non-overlapping by killing off adults
   ages = inds.age;
   // safe instantaneous elimination of individuals, SLiM 4 style
   sim.killIndividuals(inds[ages > 0]);
inds = subpops.individuals;
   //measure m1 allele frequency in newborns, i.e. exactly after killing adults
   defineGlobal("m1Fixed",sum(sim.substitutions.mutationType == m1));
sim.setValue("n_m1(t)",sum(subpops.individuals.countOfMutationsOfType(m1))/(sum(subpops.individualCount) * 2) + m1Fixed);
   numMigrants = rbinom(1, inds.size(), mig_rate);
if (numMigrants) {
migrants = sample(inds, numMigrants);
currentSubpopID = migrants.subpopulation.id;
newSubpopID = rdunif(migrants.size(), 0, nd-2);
// ensure new deme is different from current one
newSubpopID = asInteger(newSubpopID < currentSubpopID) * newSubpopID
+ asInteger(newSubpopID >= currentSubpopID) * (newSubpopID + 1);
// move in bulk
for (subpop in subpops)
      subpop.takeMigrants(migrants[newSubpopID == subpop.id]);
}

// post-migration density-dependent fitness scaling for each subpop
for (subpop in subpops)
subpop.fitnessScaling = N / subpop.individualCount;
}

late() {
defineGlobal("m1Fixed",sum(sim.substitutions.mutationType == m1));
}

1 late(){
log = community.createLogFile(paste(parentDir,paste(paste(basename,"HetFST",sep="_"),"log",sep="."),sep="/"), logInterval=5);
log.addCycle();
log.addCustomColumn("N_T(t)", "sum(sim.subpopulations.individualCount);");
log.addCustomColumn("Neutral_muts","sim.countOfMutationsOfType(m2);");
log.addCustomColumn("genome_Het(t)","metapopHeterozygosity(sim.subpopulations, m2, sampleSize=sum(sim.subpopulations.individualCount));");
log.addCustomColumn("m1_Het(t)","metapopHeterozygosity(sim.subpopulations, m1, A, A, sampleSize=sum(sim.subpopulations.individualCount));");
log.addCustomColumn("a_m1(t)","sim.getValue('a_m1(t)');"); // frequency of altruistic allele in adults
log.addCustomColumn("n_m1(t)","sim.getValue('n_m1(t)');"); // frequency of altruistic allele in newborns
log.addCustomColumn("m1(t)","sum(sim.subpopulations.individuals.countOfMutationsOfType(m1))/(sum(sim.subpopulations.individualCount) * 2) + m1Fixed;"); // frequency of altruistic allele in surviving juveniles
log.addCustomColumn("genome_Fst(t)","calcMetaFST(sim.subpopulations);");
log.addCustomColumn("m1_Fst(t)","calcMetaFST(sim.subpopulations,m1,A,A);");

}

50000 late() {
sim.outputFull(paste(parentDir,paste(paste(basename,"simFull",sep="_"),"txt",sep="."),sep="/"));
sim.outputFixedMutations(paste(parentDir,paste(paste(basename,"simFixed",sep="_"),"txt",sep="."),sep="/"));
sim.simulationFinished();
}

Ben Haller

unread,
May 16, 2023, 3:35:12 PM5/16/23
to Daniel Priego E., slim-discuss
Hi Daniel,

First of all, my apologies.  It looks like your attachments did come through on your first email; I simply missed that they were there.  I'm sick at the moment, and rather exhausted and fuzzy-brained, which is why I hadn't replied until after Elissa posted; then I wanted to just add a couple of thoughts to what she had written.  So it goes.

Maybe I'm still being fuzzy-brained, but I see that you provided values for N (10) and nd (100), but I don't see that you provided values for m or K or rho or mut.  Did I miss them?  In general, rather than writing text to say what the values of various parameters ought to be, please supply a self-contained model file that can be run under SLiMgui without modifications, and make that model as minimal as it can be while still reproducing the bug.  :->

You write "As for the point of having a WF implemented as a nonWF, I just to wanted to clarify that that's exactly what I'm implementing."  Perhaps Elissa and I are misunderstanding somehow, but it does not look like that is what you have implemented.  The population size within each of your 100 subpops fluctuates randomly from generation to generation, rather than being a fixed number as it would be in a Wright-Fisher model.  That temporal variation in population size seems likely to be changing results compared to theoretical predictions.  In essence, your populations appear to be going through bottlenecks – rather severe bottlenecks, often, for the parameter values I used! – every couple of generations, due to these random fluctuations.  That will doubtless change your results.

I can't comment on the correctness of your "slirm" testbed, or on your three popgen statistic calculators; way too much complexity there for me to do a code review.  It seems quite possible to me that there in an error in there somewhere that is causing incorrect results.  Again, I'd suggest cross-checking your analysis using a different method.  How about outputting MS or VCF, and then using some other existing software package to calculate the values you want?  You could also try simplifying your model, as a means of validating your analysis methods.  For example, if you use them on an actual WF SLiM model, do they then give the correct results?  That would inspire confidence in them.

In the end, debugging is always the same.  Think of every possible weak link in the chain that could be causing a problem.  Test each weak link to confirm that it is doing exactly what you intend it to be doing.  Come up with hypotheses regarding what the problem could be, and then collect observations and data to confirm or reject that hypothesis.  Investigate until you find the problem.  Good luck!


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Daniel Priego E. wrote on 5/16/23 2:31 PM:
Reply all
Reply to author
Forward
0 new messages