Hi Emmanuel!
What I would suggest is something like this:
1. Give every mutation in the simulation a unique tag value, starting at
0. If you start with a fixed set of mutations in your model and don't
make new ones, or if you don't make very many new ones total, then just
counting up from 0 might suffice. If you are generating lots of new
mutations every tick and want to track them all, then you might need to
come with a way to "compact" the tag values used so as to not have too
many gaps due to lost/fixed mutations, for reasons you'll see. You can
assign these unique tag values in a mutation() callback, and keep the
running "next tag value" counter with defineGlobal(). (Keep in mind
that to give a new value to a defined global, you must use
defineGlobal() again; trying to simply assign the new value into the
global with = instead makes a local variable with the same name, and so
your counter will fail to count upward. :->)
2. Maintain a matrix of epistatic interactions. This would be an NxN
matrix, where N is the number of mutations you've seen in your model.
The matrix coordinates range in [0, N-1], so N-1 is the last tag value
you have used, and the next mutation to occur would get tag N and the
matrix would grow to be (N+1)x(N+1). So initially we might have
something like this, for 10 mutations:
> x = matrix(runif(10*10), nrow=10)
> x
[,0] [,1] [,2] [,3] [,4] [,5]
[,6] [,7] [,8] [,9]
[0,] 0.66119 0.321093 0.767308 0.726366 0.554002 0.664263
0.123418 0.461412 0.63982 0.61913
[1,] 0.700395 0.402744 0.6399 0.31539 0.370517 0.850313
0.51305 0.738172 0.78411 0.595106
[2,] 0.929289 0.486141 0.779732 0.145578 0.472488 0.537801
0.318302 0.957875 0.599058 0.321007
[3,] 0.213945 0.582444 0.411279 0.979232 0.315887 0.654901
0.545222 0.250662 0.482865 0.51284
[4,] 0.430156 0.604121 0.0189802 0.618016 0.697054 0.0573283
0.661003 0.0621877 0.858239 0.282392
[5,] 0.934721 0.815482 0.0925936 0.309131 0.693908 0.960838
0.875929 0.979244 0.094562 0.426305
[6,] 0.948668 0.70215 0.389113 0.315936 0.597131 0.130967
0.289587 0.219216 0.462998 0.510097
[7,] 0.387071 0.64798 0.748255 0.643054 0.167779 0.704228
0.375858 0.26068 0.0851665 0.848589
[8,] 0.503802 0.248329 0.0213197 0.578648 0.971433 0.62719
0.013039 0.396477 0.326935 0.911202
[9,] 0.205002 0.821452 0.797663 0.943604 0.46504 0.598869
0.0968474 0.679894 0.541755 0.910947
This is not quite right, since the effects would (presumably) be
symmetric: the epistatic effect of A on B would be the same as for B on
A, and the effect of A on A would be 1.0 (no effect beyond the intrinsic
effect of A, which is handled automatically by SLiM). So you might
make the diagonal 1.0 and upper triangle 1.0, so that you only count the
effect of A+B once. Any pairs that don't interact would also be 1.0.
These are multiplicative effects, you get the picture. Store this
matrix in a global with defineGlobal() so it sticks around long term.
Whenever a new mutation arises, which you can detect and handle in a
mutation() callback, generate new effects between the new mutation and
everybody else, and upsize the matrix using rbind() and cbind(). This
matrix will get very large if you're constantly making new mutations,
but most of the rows/columns will probably be for tag values that are no
longer used (since the corresponding mutation was lost/fixed), so you
could periodically compact the matrix down at the same time that you
compact the tags down. Still, it might get to be a pretty big matrix!
3. For a given individual, get the tags for the mutations present:
individual.haplosomes.mutations.tag.
4. Then, given that set of tags for a given individual, generate every
possible pair of those tags. Basically you want to turn the single
vector of tag values for the mutations in the individual into a matrix
of all pairs of tag values, with two columns (for the paired values) and
as many rows as you need. If you don't care whether a given mutation
is present once or twice, you might use unique(tags, preserveOrder=F)
before you do this; that will be much easier, but if you want to figure
out whether the individual is homozygous or heterozygous for each
mutation it possesses, and have different epistatic effects depending on
that, tabulate(tags) would probably be the place to start down that
long and difficult road. You could also ensure that you omit positions
that you know will be 1.0 anyway, if it's easy to do so, perhaps by
sorting the tags vector first and then only generating pairs (a, b)
where a < b. I'm not sure what the best way would be to construct
this matrix of pairs, but perhaps Google will give you some ideas; a
solution that works in R might work in Eidos too, since they are so
similar.
5. Given such a matrix of coordinates into the matrix x of epistatic
effects, you can use the new subset mode that I added in SLiM 5.1:
> coords = matrix(rdunif(10, min=0, max=9), ncol=2)
> coords
[,0] [,1]
[0,] 1 6
[1,] 0 3
[2,] 8 2
[3,] 1 1
[4,] 2 5
> x[coords]
0.51305 0.726366 0.0213197 0.402744 0.537801
This returns a vector of all the epistatic effects. Use the product()
function to multiply them all together, and incorporate the result into
fitnessScaling for the individual; you're already doing something like
that, so you get the idea.
This is by no means the only approach, but it should start to give you
ideas about possible approaches. Another possibility, if the epistatic
interaction matrix is very sparse, would be to instead keep two vectors:
one vector is a set of pair-identifiers generated by (mut1.tag *
LARGE_NUMBER + mut2.tag), such that every possible pair of mutations has
a unique pair identifier, and the other vector is a set of epistatic
interaction values for the corresponding pair-identifiers. You'd only
add an entry to these two vectors if the epistatic effect of the pair
was non-neutral; so this compacts out all the sparseness in the
epistatic effect matrix, keeping only the non-zero values. You could
then use match() to look up the index of all of the pair-identifiers for
the mutations in an individual, and then for those that are not -1 (see
the doc for match(); -1 would indicate "no match", meaning no epistatic
interaction for a given pair), you'd look up the corresponding effects
in the second vector, and then use product(), yada yada.
There was a student in the Vienna workshop this year who pursued these
ideas quite a long way and came up with some very clever ways to
optimize all of this to be super fast. I have bcc:ed her on this email,
and perhaps she will be willing to share some methods; but I don't know
if her work is published yet, so I'll leave that up to her. The above
is pretty much the initial guidance I gave her, and she ran with it
quite a distance!
Anyhow, if you come up with a complete solution that you'd like to
share, that would be great; you could even submit it as a pull request
to SLiM-Extras. The main thing is to try to make sure that all of the
steps are vectorized as much as possible; you don't want to be doing
nested for loops iterating over every tag_a and tag_b in each
individual. Allowing that kind of vectorization is the point of the two
approaches I've described above. I forget how that Vienna student
managed to vectorize step 4, but I think she found a way! I think if
you think through exactly what that coordinate matrix ought to look
like, for a given set of mutation tags, it might become clear. Good
luck and happy modeling!
Cheers,
-B.
Benjamin C. Haller
Messer Lab
Cornell University
'Emmanuel D'Agostino' via slim-discuss wrote on 10/24/25 4:04 PM: