Hi Luis!
I wasn't able to run your model, since the .csv files required weren't
provided, so I just looked at its code a bit.
I don't really understand the modifyChild() code – it's rather complex!
But overall, I'd have a couple of comments:
- If you want to make a model run faster, step one is to produce a
profile of the model, which you can do either in SLiMgui (easier but
less accurate) or at the command line (more complicated but more
accurate); see sections 21.5 and 21.7, and 21.6 regarding profiling
memory usage as well. Start with a profile so you don't waste time
optimizing code that isn't even a performance bottleneck.
- If the modifyChild() callback does turn out to be a bottleneck, you'll
be able to see which lines are the hotspots. Then look for shared
calculations that are done many times (once per microsat, per call to
the callback, for example) and could be done just once. If there are
things you could precalculate, that would be helpful.
- Overall, this design with the modifyChild() callback that loops over
each microsat in each genome, one by one, doesn't benefit from any
vectorization at all, so it probably spends a lot of time in script (but
again, profiling should be used first, to confirm that guess, before
doing any optimization work). You could switch to a more vectorized
design and perhaps get a substantial speedup. For example, you could
handle all of the microsat mutations in a late() event, in bulk.
Perhaps you would find all microsats with a repeat count of N, and
process all of them in the same way (since I guess their transition
probabilities would all be the same). This would allow calls like
"sample(probabilities, 1, weights=roots)" to become vectorized, making
many draws from the same weighted distribution instead of doing those
draws one at a time, which would provide a potentially large speed
advantage. Obviously this is a major redesign and the resulting code
will be more complex than the simple loop-based design you presently
have; whether it's worth the effort is up to you. :->
- Similarly, you do the work of uniquing mutations and making new
mutations one at a time, with a ton of repeated work. For every
microsat in every genome in every tick, you are calling
"sim.mutationsOfType(m2)" to get the microsat mutations again, for just
one example. One useful technique is that nowadays SLiM allows you to
keep a permanent reference to a mutation object in a global constant
(see section 9.9, for example). You might consider a model design in
which you create a permanent microsat mutation with each possible count,
for each position, at the start of the model run. (You'd do this in a
"dummy" subpopulation that you would then kill, before the real
simulation begins.) You could then have, for example, a global constant
that is a matrix (row, column) of all of the possible microsat
mutations, indexed by position (row) and repeat count (column), or some
such. Many possibilities for this sort of thing, which would probably
have the potential to make the code more efficient – again, at the price
of additional work and code complexity, as is usually the case with
optimization work.
There's no easy solution here. Profile, find hotspots, refactor your
code to allow greater vectorization, eliminate duplicated computations
of the same things, lather, rinse, repeat. Optimization work can take
up as much time as you're willing to devote to it, pretty much. :->
As for rescaling, section 5.5 of the SLiM manual has a pretty extensive
discussion. It is far from a panacea, and can introduce serious
artifacts into the behavior of a model, so it should be used with
extreme caution. If profiling indicates that your model has good
potential for optimization (i.e., is spending most of its time in your
script, not in offspring generation and fitness calculation), that is
probably the better path. See also Dabi & Schrider (2024),
presently in bioRxiv I think, for some very useful further discussion of
rescaling artifacts. But if the profile shows the modifyChild() work
to be minor (which seems possible, given that it is mostly only done for
microsats that actually mutate, and the mutation rate is only 0.0001),
then probably most of the runtime is inside the SLiM core doing
reproduction and fitness calculation, in which case rescaling might be
your only option.
Good luck, and happy modeling!
Cheers,
-B.
Benjamin C. Haller
Messer Lab
Cornell University
Luis Fernández wrote on 4/25/24 7:01 PM: