Optimizing polygenic epistasis in SLiM

4 views
Skip to first unread message

Emmanuel D'Agostino

unread,
Oct 24, 2025, 4:05:00 PM (5 days ago) Oct 24
to slim-d...@googlegroups.com
Hi all,

I'm simulating polygenic epistasis between a number of pairs of loci using a WF model. I have a table I'm reading into SLiM with three values: the individual fitness effect of the first mutation in a pair alone, the individual fitness effect of the second mutation alone, and the fitness effect of both mutations together in one individual.

I'm implementing these fitness calculations using a modification of the structure Lou et al. 2020 wrote. Those authors' script involves running a for loop that calculates, for each epistatically interacting pair in each individual, the fitness contribution of that pair. These fitness contributions are added to a running tally stored in the .tag property of that individual, and at the end, the .tag property is appended to the individual's fitness.

This is supremely helpful, and it was easy to configure this to use the list of fitness values I described. However, I find that this gets quite slow quite quickly; manually calculating fitness with a for loop that iterates over the genome in each individual in each generation is far slower than SLiM's default non-epistatic fitness calculations. The SLiM manual's discussion of epistasis discusses how to introduce a fitness effect when a mutation of type m2 and one of type m3 are present, but to implement this, I would have to make each mutation in my simulation a different type (and I suspect avoiding this is what spurred Lou et al.'s alternate method in the first place). I also found this discussion of the subject, which is an improvement because it only runs one for loop per generation (iterating over all individuals instead of all pairs in all individuals), but I believe it assumes an equal fitness contribution of each epistatic pair (or at least one that can be directly inferred from the number of pairs, instead of from the pairs themselves).

My question is: does anyone have any suggestions on faster or more computationally efficient ways to encode this kind of polygenic epistasis? Any advice would be very appreciated, thank you! I'm really appreciative of this great program and its supportive community.

All the best,
Emmanuel

P.S. I will note that, instead of counting the number of each mutation in each pair within my for loop, as Lou et al. do, I'm accounting for a known violation of the infinite-sites model by instead counting the number of mutations at each site, since they may have separate mutation IDs. I don't think this is the source of the slowness, but it is a difference I figured may be worth mentioning. Please see below:

Lou et al. code:
EpistasisMutation1 = individual.genomes.mutationsOfType(m2)[which(individual.genomes.mutationsOfType(m2).tag == 2*EpistasisPair-1)];
EpistasisMutation2 = individual.genomes.mutationsOfType(m2)[which(individual.genomes.mutationsOfType(m2).tag == 2*EpistasisPair)];

My code:

EpistasisMutation1 = individual.genomes.mutationsOfType(m1)[which(individual.genomes.mutationsOfType(m1).position == QTL_positions_list1[EpistasisPair-1])];

EpistasisMutation2 = individual.genomes.mutationsOfType(m1)[which(individual.genomes.mutationsOfType(m1).position == QTL_positions_list2[EpistasisPair-1])];



--
Emmanuel R. R. D'Agostino
Ph.D. candidate, Julien Ayroles Lab
Princeton University Dept. of Ecology and Evolutionary Biology

Ben Haller

unread,
Oct 24, 2025, 5:48:49 PM (5 days ago) Oct 24
to Emmanuel D'Agostino, slim-d...@googlegroups.com
Hi Emmanuel!

What I would suggest is something like this:

1. Give every mutation in the simulation a unique tag value, starting at 0.  If you start with a fixed set of mutations in your model and don't make new ones, or if you don't make very many new ones total, then just counting up from 0 might suffice.  If you are generating lots of new mutations every tick and want to track them all, then you might need to come with a way to "compact" the tag values used so as to not have too many gaps due to lost/fixed mutations, for reasons you'll see.  You can assign these unique tag values in a mutation() callback, and keep the running "next tag value" counter with defineGlobal().  (Keep in mind that to give a new value to a defined global, you must use defineGlobal() again; trying to simply assign the new value into the global with = instead makes a local variable with the same name, and so your counter will fail to count upward.  :->)

2. Maintain a matrix of epistatic interactions.  This would be an NxN matrix, where N is the number of mutations you've seen in your model.  The matrix coordinates range in [0, N-1], so N-1 is the last tag value you have used, and the next mutation to occur would get tag N and the matrix would grow to be (N+1)x(N+1).  So initially we might have something like this, for 10 mutations:

> x = matrix(runif(10*10), nrow=10)
> x
          [,0]      [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]      [,8]      [,9]
[0,]   0.66119  0.321093  0.767308  0.726366  0.554002  0.664263  0.123418  0.461412   0.63982   0.61913
[1,]  0.700395  0.402744    0.6399   0.31539  0.370517  0.850313   0.51305  0.738172   0.78411  0.595106
[2,]  0.929289  0.486141  0.779732  0.145578  0.472488  0.537801  0.318302  0.957875  0.599058  0.321007
[3,]  0.213945  0.582444  0.411279  0.979232  0.315887  0.654901  0.545222  0.250662  0.482865   0.51284
[4,]  0.430156  0.604121 0.0189802  0.618016  0.697054 0.0573283  0.661003 0.0621877  0.858239  0.282392
[5,]  0.934721  0.815482 0.0925936  0.309131  0.693908  0.960838  0.875929  0.979244  0.094562  0.426305
[6,]  0.948668   0.70215  0.389113  0.315936  0.597131  0.130967  0.289587  0.219216  0.462998  0.510097
[7,]  0.387071   0.64798  0.748255  0.643054  0.167779  0.704228  0.375858   0.26068 0.0851665  0.848589
[8,]  0.503802  0.248329 0.0213197  0.578648  0.971433   0.62719  0.013039  0.396477  0.326935  0.911202
[9,]  0.205002  0.821452  0.797663  0.943604   0.46504  0.598869 0.0968474  0.679894  0.541755  0.910947

This is not quite right, since the effects would (presumably) be symmetric: the epistatic effect of A on B would be the same as for B on A, and the effect of A on A would be 1.0 (no effect beyond the intrinsic effect of A, which is handled automatically by SLiM).  So you might make the diagonal 1.0 and upper triangle 1.0, so that you only count the effect of A+B once.  Any pairs that don't interact would also be 1.0.  These are multiplicative effects, you get the picture.  Store this matrix in a global with defineGlobal() so it sticks around long term.  Whenever a new mutation arises, which you can detect and handle in a mutation() callback, generate new effects between the new mutation and everybody else, and upsize the matrix using rbind() and cbind().  This matrix will get very large if you're constantly making new mutations, but most of the rows/columns will probably be for tag values that are no longer used (since the corresponding mutation was lost/fixed), so you could periodically compact the matrix down at the same time that you compact the tags down.  Still, it might get to be a pretty big matrix!

3. For a given individual, get the tags for the mutations present: individual.haplosomes.mutations.tag.

4. Then, given that set of tags for a given individual, generate every possible pair of those tags.  Basically you want to turn the single vector of tag values for the mutations in the individual into a matrix of all pairs of tag values, with two columns (for the paired values) and as many rows as you need.  If you don't care whether a given mutation is present once or twice, you might use unique(tags, preserveOrder=F) before you do this; that will be much easier, but if you want to figure out whether the individual is homozygous or heterozygous for each mutation it possesses, and have different epistatic effects depending on that, tabulate(tags) would probably be the place to start down that long and difficult road.  You could also ensure that you omit positions that you know will be 1.0 anyway, if it's easy to do so, perhaps by sorting the tags vector first and then only generating pairs (a, b) where a < b.  I'm not sure what the best way would be to construct this matrix of pairs, but perhaps Google will give you some ideas; a solution that works in R might work in Eidos too, since they are so similar.

5. Given such a matrix of coordinates into the matrix x of epistatic effects, you can use the new subset mode that I added in SLiM 5.1:

> coords = matrix(rdunif(10, min=0, max=9), ncol=2)
> coords
     [,0] [,1]
[0,]    1    6
[1,]    0    3
[2,]    8    2
[3,]    1    1
[4,]    2    5
> x[coords]
0.51305 0.726366 0.0213197 0.402744 0.537801

This returns a vector of all the epistatic effects.  Use the product() function to multiply them all together, and incorporate the result into fitnessScaling for the individual; you're already doing something like that, so you get the idea.

This is by no means the only approach, but it should start to give you ideas about possible approaches.  Another possibility, if the epistatic interaction matrix is very sparse, would be to instead keep two vectors: one vector is a set of pair-identifiers generated by (mut1.tag * LARGE_NUMBER + mut2.tag), such that every possible pair of mutations has a unique pair identifier, and the other vector is a set of epistatic interaction values for the corresponding pair-identifiers.  You'd only add an entry to these two vectors if the epistatic effect of the pair was non-neutral; so this compacts out all the sparseness in the epistatic effect matrix, keeping only the non-zero values.  You could then use match() to look up the index of all of the pair-identifiers for the mutations in an individual, and then for those that are not -1 (see the doc for match(); -1 would indicate "no match", meaning no epistatic interaction for a given pair), you'd look up the corresponding effects in the second vector, and then use product(), yada yada.

There was a student in the Vienna workshop this year who pursued these ideas quite a long way and came up with some very clever ways to optimize all of this to be super fast.  I have bcc:ed her on this email, and perhaps she will be willing to share some methods; but I don't know if her work is published yet, so I'll leave that up to her.  The above is pretty much the initial guidance I gave her, and she ran with it quite a distance!

Anyhow, if you come up with a complete solution that you'd like to share, that would be great; you could even submit it as a pull request to SLiM-Extras.  The main thing is to try to make sure that all of the steps are vectorized as much as possible; you don't want to be doing nested for loops iterating over every tag_a and tag_b in each individual.  Allowing that kind of vectorization is the point of the two approaches I've described above.  I forget how that Vienna student managed to vectorize step 4, but I think she found a way!  I think if you think through exactly what that coordinate matrix ought to look like, for a given set of mutation tags, it might become clear.  Good luck and happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


'Emmanuel D'Agostino' via slim-discuss wrote on 10/24/25 4:04 PM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
Before posting, please read http://benhaller.com/slim/bugs_and_questions.html
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/CACTeHKyaN6rQ4z%3DZ77TEob0XinPAHHMZwJY1_qc_A249KjOG3Q%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages