Hi Hanbin!
I'm not aware of anybody having tested it, but my guess is that using
r=0.5 with a single simulation will be a clear win over doing separate
1-site simulations and concatenating them. SLiM is quite different from
msprime in that objects representing individuals and genomes and so
forth get created and torn down with each generation of forward
simulation. With N 1-locus simulations you'd be paying that price N
times, and I doubt any other speedup you might get would be worth it.
You'd also be paying the price of simplifying, writing out the tree
sequence, etc., N times. Using r=0.5 certainly does slow SLiM down
considerably, compared to a much lower recombination rate; but I don't
see any reason to expect that N separate 1-site simulations would be
faster. If you do a speed comparison, though, I'd be interested to hear
how it goes!
But that's all running single-threaded. As you say, this might be
affected by parallelization, for sure. If you want to simulate 1000
sites, and you have 1000 cores at your disposal, then you can do 1000
separate 1-site SLiM simulations simultaneously. That would doubtless
be faster than using one core to do one 1000-site SLiM simulation –
probably much faster. Of course then you get the filesystem clutter,
and the cost of concatenating the 1000 tree sequences back together
again, and so forth. So doing this for just 2 sites would probably not
be a win, both in terms of overall runtime and pipeline complexity;
doing it for 1000 is very probably a win. I don't know where the
crossover is where it becomes worthwhile, and to some extent that's
subjective (how much pain you experience from the filesystem clutter
etc. :->). Again, I'm not aware of anybody having tested such
possibilities; please do report back if you do! (And no, there is no
way to do a single SLiM run utilizing multiple cores at the present
time, so that's not an option here. Maybe some day!)
For N 1-site simulations, I don't see any obvious way to avoid the large
number of temporary tree sequences, no. And no, there is no way to
pipe the generated tree sequences directly into Python. That would be
nice, but I have no idea how it would work. (If you or someone else has
a proposal for how this could be implemented usefully, please of course
open a GitHub issue! It sounds like a cool and useful idea, I just
don't know enough about Python and tskit-in-Python to have the slightest
idea how it could be done.)
Cheers,
-B.
Benjamin C. Haller
Messer Lab
Cornell University
Hanbin Lee wrote on 11/8/25 1:15 PM: