output the number of mutations in each of two patches to a LogFile

Richard Sibly

unread,

Mar 28, 2022, 5:42:14 AM3/28/22

to slim-discuss

Hi Ben

I am trying to output the number of mutations in each of two patches to a LogFile. I can get the total number of mutations – see below – but not the numbers in patches separately. Would be very grateful for any help you can give

Many thanks
Richard Sibly

initialize() {
initializeMutationRate(1e-2);
initializeMutationType("m1", 1.0, "f", 0.0);
initializeGenomicElementType("g1", m1, 1);
initializeGenomicElement(g1, 0, 0);
initializeRecombinationRate(0);
initializeSex("A");
}
1 {
sim.addSubpop("p1", 100);
sim.addSubpop("p2", 150);
p1.setMigrationRates(p2, 0.1); // migration p2 -> p1
p2.setMigrationRates(p1, 0.1); // migration p1 -> p2
log = sim.createLogFile("sim_log.txt", sep="\t", logInterval=10);
log.addGeneration();
log.addCustomColumn("FST", "calcFST(p1.genomes, p2.genomes);");
log.addCustomColumn("count",
"sim.countOfMutationsOfType(m1);"
);
}
fitness(m1, p1) { return 0.99;}
fitness(m1, p2) { return 1.1;}
500 { sim.simulationFinished();
print(sim.countOfMutationsOfType(m1));
}

Ben Haller

unread,

Mar 30, 2022, 10:55:42 AM3/30/22

to slim-discuss

Hi Richard! One possible approach looks like this (slotting this code into your model as you posted it, and apologies for the formatting):

function (integer$)countOfMutationsInSubpop(object<Subpopulation> subpop) {

muts = subpop.genomesNonNull.mutations;

muts = unique(muts, preserveOrder=F);

return size(muts);

}

1 early()

{

...

log.addCustomColumn("count_p1", "countOfMutationsInSubpop(p1);");

log.addCustomColumn("count_p2", "countOfMutationsInSubpop(p2);");

}

This adds a user-defined function that counts the number of segregating mutations in a given subpopulation by simply getting a vector containing every mutation in every genome in the subpopulation, uniquing that vector so each mutation occurs only once, and then returning the size of the uniqued vector. It uses the "preserveOrder=F" parameter to unique() to make that function perform much more efficiently (see the Eidos manual for details). It does not limit the count to mutations of type m1; you could add that in if you wish, either by hard-coding "mutationsOfType(m1)" into the code of the countOfMutationsInSubpop() function, or by having that function take an additional MutationType parameter, as you wish.

However, there is an alternative implementation that is probably far more efficient (I haven't tested it, but it should be); it defines the same user-defined function like this:

function (integer$)countOfMutationsInSubpop(object<Subpopulation> subpop) {

counts = sim.mutationCounts(subpop, NULL);

return sum(counts > 0);

}

This uses SLiM's own internal machinery for computing mutation counts/frequencies. It requests counts within the given subpopulation, and returns the number of mutations for which the count is greater than zero. If you wanted to adapt this to only count mutations of type m1, you would pass sim.mutationsOfType(m1) instead of NULL for the second parameter to mutationCounts().

So, there are two different approaches, one doing the count yourself in Eidos (probably inefficient), one having SLiM do it for you (probably efficient). The efficiency difference between the two will probably grow as the number of segregating mutations grows; for your model as given, with less than 100 mutations typically segregating, either approach should be perfectly fine, and the first approach might prove more flexible in some respects if you end up having additional requirements for the count that mutationCounts() doesn't support.

I hope this helps; happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Richard Sibly

unread,

Apr 1, 2022, 5:26:21 AM4/1/22

to slim-discuss

Hi Ben

Many thanks for your help, much appreciated. I have slightly modified your code, below, for my purpose, which is to model a simple one-locus system with two alleles with specified fitness effects. I want to compare SLiMgui’s output with output from an existing simple deterministic model. At present the outputs are broadly similar but there are important differences in where they eventually stabilise. Also changing the value of the dominance coefficient h does not affect the SLiMgui output – which it should. I think I may be mis-specifying the fitness or dominance coefficient in SLiMgui.

Here is the SLiMgui code as slightly modified by me:

function (integer$)countOfMutationsInSubpop(object<Subpopulation> subpop) {
muts = subpop.genomesNonNull.mutations;

return size(muts);
}
initialize() {
initializeMutationRate(1e-2);
initializeMutationType("m1", 1, "f", 0.0);
m1.mutationStackPolicy = "f";

initializeGenomicElementType("g1", m1, 1);
initializeGenomicElement(g1, 0, 0);
initializeRecombinationRate(0);
initializeSex("A");
}

1 early()
{
sim.addSubpop("p1", 1000);
sim.addSubpop("p2", 1000);
p1.setMigrationRates(p2, 0.05); // migration p2 -> p1
p2.setMigrationRates(p1, 0.05); // migration p1 -> p2
count = sim.countOfMutationsOfType(m1);

log = sim.createLogFile("sim_log.txt", sep="\t", logInterval=10);
log.addGeneration();
log.addCustomColumn("FST", "calcFST(p1.genomes, p2.genomes);");
log.addCustomColumn("count",
"sim.countOfMutationsOfType(m1);"
);

log.addCustomColumn("count_p1", "countOfMutationsInSubpop(p1);");
log.addCustomColumn("count_p2", "countOfMutationsInSubpop(p2);");
}

fitness(m1, p1) { return 0.9;}
fitness(m1, p2) { return 1.1;}
1001 { sim.simulationFinished();
}

As mentioned, I want to compare SLiMgui’s output with output from my existing simple deterministic infinite-population model, which has one locus with two alleles P and Q in an environment consisting of two niches with some migration between niches (see below) prior to mating. The locus determines ecological adaptation to one niche or the other. The fitnesses of the three genotypes in each niche are as follows:
Niche 1 Niche 2
Genotype PP PQ QQ PP PQ QQ
Fitness 1 1+hs1 1+s1 1 1+hs2 1+s2
Relative frequency p1^2 2p1q1 q1^2 p2^2 2p2q2 q2^2

p1 and q1 represent the relative frequencies of the P and Q alleles at the start of a generation in niche 1, their frequencies in niche 2 are p2 and q2. Parameter h indicates the level of dominance of the Q allele. Migration rates are specified as the proportion of individuals in one niche that migrate to the other each generation after viability selection and population regulation have occurred. When analysing the model I suppose that Q is disadvantageous in niche 1 (i.e., s1 is negative) but advantageous in niche 2 (i.e., s2 is positive), while PP homozygotes have fitness 1 in both niches.

Please check I’ve not introduced errors into the SLiMgui code. As mentioned the outputs of SLiMgui and my model are broadly similar but 1) there are important differences in where they eventually stabilise, and 2) changing the value of the dominance coefficient h does not affect the SLiMgui output. I think I may be mis-specifying the fitness or dominance coefficient in SLiMgui.

Any help you can give would be much appreciated.

Many thanks
Richard

Richard Sibly

unread,

Apr 1, 2022, 7:09:02 AM4/1/22

to slim-discuss

Hi again Ben

I have been rerunning the SLiMgui code with more sensible values of mutation rate (now 1e-6) and population sizes (now 10000) and am now getting much better agreement between the two models.

However I would still be very grateful if you could check my version of the SLiMgui code is the proper version to compare with my deterministic model as detailed in my last post.

Thanks again

Richard

Ben Haller

unread,

Apr 1, 2022, 5:28:17 PM4/1/22

to slim-discuss

Hi Richard. I can't really get into the business of checking everybody's code for correctness – for one thing, SLiM has hundreds of users, so that would be a full-time job in and of itself, and for another thing, "correctness" is hard to evaluate; what is "correct" depends very much upon what you're trying to do, and only you really know that in complete detail. When you write a model, you need to do the work to validate it; you can't rely on my opinion or anybody else's as to whether your model is correct. Print out intermediate results and check whether they look like what you expect; observe the model in SLiMgui and look at values in the Eidos console; compare to analytical results (as you are doing, I gather), but also perhaps to individual-based simulations run in some other way; and so on and so forth. Model validation is hard work. It's good that you are seeing agreement between your two approaches now with some parameter values; I would say the next step would be to try to better understand why they diverge for other parameter values, and determine whether that is due to a bug in your code, or to the analytical model being inaccurate due to assumptions that get violated in that region of parameter space. I personally do not know the answer to that, although a mutation rate of 1e-2 and recombination rate of 0 are certainly a pretty extreme region of parameter space where linkage effects are probably going to be very strong and analytical models will probably be inaccurate unless they somehow factor that in. :->

Quickly skimming your code, I do notice a couple of things that raise my eyebrows. One is that you modified the code for countOfMutationsInSubpop() by removing the call to unique(). That call was there for a reason, as I think I explained in my post, and I don't see a mention from you as to why you removed it, so that seems like a potential bug. Another is that you're using stacking policy "f"; that is almost never what one wants, and there was a recent discussion on slim-discuss about why, and how one might approach things in a more realistic manner. You might want to read through that discussion and alter your approach accordingly; or perhaps your reasons for using it are valid, I don't know. Finally, I can see why the dominance coefficient doesn't matter; you are completely overriding SLiM's fitness calculations with your own fitness() callbacks, and those fitness() callbacks don't use the dominance coefficient or the zygosity of the mutation in the focal individual. The manual discusses this in several spots, notably in the reference documentation for fitness() callbacks, section 25.2. You'll therefore want to rework the design of your fitness() callbacks to factor in zygosity and dominance. Oh, and it looks like you have a stray line of code, "count = sim.countOfMutationsOfType(m1);", the result of which is not used.

There might be other bugs/issues, and the things I noted above might not be bugs/issues if they are intentional for some reason. But those are what I see, in a quick look. Good luck and happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Richard Sibly

unread,

Apr 12, 2022, 9:49:06 AM4/12/22

to slim-discuss

Hi Ben

Thanks for bearing with me.

My aim is to output allele frequencies (i.e., count of number of m1 IBS alleles divided by twice population size) in each subpopulation in a biallelic model.

As I understand it muts = unique(muts, preserveOrder=F) has the effect that countOfMutationsInSubpop() returns the number of distinct mutational lineages – which is not what I want. I found that by removing it I seem to be getting the needed number of IBS alleles. Have I made a mistake here?

I introduced m1.mutationStackPolicy = "f" in an attempt to make the model biallelic. Slim-discuss is very helpful and I now see that “l”, not “f”, is what I need.

My SLiMgui model agrees with my infinite-population deterministic model when h=0.5 but not when h=1. Attempting to remedy this I followed your suggestion to modify the fitness() callbacks using information from section 25.2. My intention in the code below is that the fitness of IBS m1 is the same whether m1 is homozygous or heterozygous, while the fitness of homozygous wild type is 1.0. Unfortunately the outputs of my slim and my infinite-population deterministic model do not converge on the same values. My code is as follows:

function (integer$)countOfMutationsInSubpop(object<Subpopulation> subpop) {
muts = subpop.genomesNonNull.mutations;

return size(muts);//used to log allele frequencies in each subpopulation
}
initialize() {
initializeMutationRate(1e-6);

initializeMutationType("m1", 1.0, "f", 0.0);

m1.mutationStackPolicy = "l";

initializeGenomicElementType("g1", m1, 1);
initializeGenomicElement(g1, 0, 0);
initializeRecombinationRate(0);

initializeSex("A");//individuals either male or female
}
1 early()
{
sim.addSubpop("p1", 10000);
sim.addSubpop("p2", 10000);

p1.setMigrationRates(p2, 0.05); // migration p2 -> p1
p2.setMigrationRates(p1, 0.05); // migration p1 -> p2

log = sim.createLogFile("sim_log.txt", sep="\t", logInterval=10);
log.addGeneration();
log.addCustomColumn("FST", "calcFST(p1.genomes, p2.genomes);");

log.addCustomColumn("count_p1", "countOfMutationsInSubpop(p1);");
log.addCustomColumn("count_p2", "countOfMutationsInSubpop(p2);");
}
fitness(m1, p1){

if (homozygous)
return 1.0 - 0.05;
else
return 1.0 - mut.mutationType.dominanceCoeff *0.05;
} //fitness in subpopulation 1 is 0.95 in individiduals homozygous or heterozygous in m1 if h=1
fitness(m1, p2){

if (homozygous)
return 1.0 + 0.05;
else
return 1.0 + mut.mutationType.dominanceCoeff *0.05;
} //fitness in subpopulation 2 is 1.05 in individiduals homozygous or heterozygous in m1 if h=1
10001 { sim.simulationFinished();
}

One last point that could perhaps explain the differences between the model predictions: as I understand pp 16-17 of the manual, in SLiMgui N parents are drawn at random after selection and mate within their subpopulation before migration – is this correct?

Will be grateful for any help you can give

Best,

Richard Sibly

Ben Haller

unread,

Apr 12, 2022, 3:07:43 PM4/12/22

to slim-discuss

Hi Richard,

Sounds like removing unique() is getting you what you wanted; I guess I misunderstood before what you were trying to achieve. :->

If you want a biallelic model, stacking policy "l" is a step in the right direction but you may also want to filter out sites with more than one segregating mutation. Stacking policy "l" is not sufficient, in itself, to make a model biallelic, because it doesn't prevent two different new mutations from occurring at the same position in different genomes. There was a recent slim-discuss thread about doing such filtering, in which it was eventually agreed that that procedure matches what empirical studies often do when analyzing genomic data, so it might even make your simulation more realistic trying trying to force it to be truly biallelic at every site.

Regarding your fitness() callbacks, you write "My intention in the code below is that the fitness of IBS m1 is the same whether m1 is homozygous or heterozygous, while the fitness of homozygous wild type is 1.0", but your code has this:

fitness(m1, p1){
if (homozygous)
return 1.0 - 0.05;
else
return 1.0 - mut.mutationType.dominanceCoeff *0.05;
}

That code appears to return (1-s) when homozygous and (1-hs) when heterozygous for a given m1 mutation in p1, so it does not seem to match what you stated as your goal. Anyhow, I'm not sure why your SLiM model has a mismatch with your analytical model; all I can really suggest is check your code, print out values to confirm it is working right, check your assumptions, test things in the Eidos console, etc. Of course sometimes individual-based models do not, in fact, match analytical models, too.

Given that your model appears to be a WF model, I think your statement of how things work is roughly correct, but note that in WF models there is no selection. Parents are chosen randomly with probability proportional to fitness, in the WF model. Only in nonWF models does fitness translate into selection in the sense of a chance of mortality. If you want a more detailed explanation of the mechanics of the WF and nonWF model types, see chapters 22 and 23.

Perhaps further discussion ought to go off-list, as I'm not sure this discussion is of general interest any more. Good luck, and happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Reply all

Reply to author

Forward