Hi Darren! Sorry for the slow reply.
OK, so, this is kind of a complex question.
One point would be that the user-defined function here is not doing your
performance any good. Internally, to call a user-defined function
Eidos has to create (and then tear down) a new Eidos interpreter, set up
a new symbol table with local variables for all the parameters, etc.
It's not super slow, but if you're running a tight loop and the
user-defined function does something small/trivial, it will help
performance a lot to just inline the code into the loop rather than
calling out to a function.
Another point would be that the asString() call there is again not doing
you any favors. That has to turn the integer value into a string
value, and allocate a new Eidos value to hold the string, and then do a
string-based lookup in the dictionary, which is relatively slow. Much
faster would be to simply use integer keys in your dictionary in the
first place, rather than strings. Dictionary has supported integer keys
for a while now, so unless you're running a fairly old version of SLiM
that feature ought to be available.
A third point is that there is possible vectorization here that you're
not taking advantage of. For each index in ordered_indices_pop, you're
looking up indices_of_quantiles[index] and indices_of_interest[index]
separately, one at a time. That's very slow. You want to do those
subset operations with the whole vector of indices in one go. Always
vectorize performance-sensitive code if you can.
So a rewrite of your code might look like:
phenoFitness_mhw = NULL;
quantiles = indices_of_quantiles[ordered_indices_pop];
indices = indices_of_interest[ordered_indices_pop];
for (quantile in quantiles, index in indices)
phenoFitness_mhw = c(phenoFitness_mhw,
bsi_given_dhw50_df.getValue(quantile)[index]);
Given that we want to loop through both quantuiles and indices in
synchrony, I think the for loop is going to be faster than using
sapply(). Note that I didn't put the statement inside the for loop
inside curly braces {}; that would make the code slower, since it would
have to interpret the curly braces every time through the for loop. (In
interpreted languages like Eidos, pretty much everything you do makes
your code slower, even putting a statement inside curly braces. If
Eidos had a smarter optimizer that could get optimized out, but it
doesn't. :->) Also note that I assumed here that the Dictionary has
been recast to use integer keys instead of strings.
Taking a step back, the fact that this code is using a Dictionary might
not be ideal in the first place. Dictionary is not terribly fast. If
this code remains unacceptably slow after the above changes, you might
think about storing your data as a matrix instead. That might or might
not be faster, depending on what exactly you're doing, but it might be
worth a try.
But perhaps the best idea, if it works for your purposes, is to use the
DataFrame class instead of the Dictionary class. This would be nice
because DataFrame has a method, subset(), that is designed to do exactly
what you are trying to do, if I have understood correctly. With
DataFrame, the code above becomes one line:
phenoFitness_mhw =
bsi_given_dhw50_df.subset(indices_of_interest[ordered_indices_pop],
indices_of_quantiles[ordered_indices_pop]);
I'd imagine that is both the fastest and the cleanest way to do what
you're trying to do, as long as your data can be recast as a DataFrame
object. It's a subclass of Dictionary, so probably it will be suitable
for you as long as your data is "rectangular" – same number of rows for
each column.
I hope this helps; post again if not. Happy modeling!
Cheers,
-B.
Benjamin C. Haller
Messer Lab
Cornell University
Darren Li wrote on 8/28/25 10:49 PM: