Hi Noam,
First of all, sorry for the difficulties posting. (A general message to
slim-discuss users: Google Groups seems to random decide to veto
messages as spam. If that happens to you, please send me an email
off-list immediately, telling me that your message is in quarantine. I
can approve it – but I don't know it's there, because Google Groups
takes up to a week to bother sending me an email saying "I've randomly
rejected a message to your group, you might want to have a look". This
is out of my control; it's a bug they seem disinclined to fix, because
they apparently can't afford to hire an intern to fix their bugs.
<deep breaths>)
Looking at your example code, I guess I'm puzzled by the whole structure
of things. I don't see the purpose of the dictionary, or the match()
call. The vector `subpopulations` is a vector 0, 1, ..., N-1, such that
the value of it at an index i *is* the index i; `subpopulations[10]` is
10, for example. So when you do `sample(subpopulations, n_offspring,
replace=T)`, yes, you get back a vector of subpopulation IDs; but you
also get back a vector of indices. So instead of your options 1 or 2
(dictionary or match()), you can simply do:
// Option 3: simple lookup
destination_densities_3 = densities[destinations];
That produces identical results to your options 1 and 2, and is of
course faster.
So if your real code is structured in the same way as your example code,
then I think you can get rid of all that complication. You might also
be interested in a few other things that might speed things up,
depending on what you're doing in the surrounding code:
- the Community method subpopulationsWithIDs()
- the fact that subpopulations (and most objects in SLiM) are subclasses
of Dictionary, which means that you could attach density values
directly to subpopulations, like subpop.setValue("density", x), and then
look them up like subpop.getValue("density") – but keeping the
densities as a separate vector of values will probably be even faster
for you, since your code uses subpopulation *indices*, not subpopulation
*objects*
In general, dictionaries are slow but general; you can put anything into
them, and keep any number of values, so they're very flexible, but they
are much slower than simpler (but less flexible) approaches. The
problem in your "option 1" is not sapply() – that is quite fast. It is
that the sapply() call is doing a dictionary lookup for each value, and
that's going to be very slow. Your option 2, on the other hand, uses
match(), which is basically a vectorized loop like sapply() (so on that
ground they are roughly equal), but whereas option 1 does a dictionary
lookup per element, option 2 does what's called a "hash table lookup"
per element, which uses an extremely fast C++ data structure that needs
to only be built once and then can be used again and again to provide
lookups in a very efficient way – much faster than a dictionary lookup.
So yes, match() is going to be fast, because it avoids the dictionary.
But the fastest approach, if it works for you, is my option 3, simply
using the indices that you have sampled as indices, without using
match() at all, since (at least in your example code) all that it is
telling you is that the value 10 is at index 10, the value 17 is at
index 17, etc., which you knew already. :->
I hope this helps; happy modeling!
Cheers,
-B.
Benjamin C. Haller
Messer Lab
Cornell University
Noam Vogt-Vincent wrote on 3/14/25 9:06 PM: