Hi Daiki! Yes, 10 million individuals is
pretty big for forward simulation, and if you combine that with also
simulating an entire chromosome, things probably get quite memory-hungry
and slow. :->
I do wonder: are you forward-simulating only the loci that affect your
trait of interest (the trait I guess you will GWAS on), or are you
simulating all of the neutral mutations as well? And are you
forward-simulating a whole burn-in period for the model, or just a
period of time when the trait of interest is under selection? You don't
say whether you're using tree-sequence recording or not, but for a very
large simulation like this, it is very likely to be helpful. You could
then leave out all of the neutral mutations, and skip any neutral
burn-in that you're doing, and only forward-simulate the non-neutral
mutations post-burn-in. You'd then recapitate and overlay neutral
mutations after forward simulation, in Python. If you're not doing that
already, you should definitely consider it; it might make your
simulations MUCH more tractable.
That said, yes, you can model each chromosome with an independent run,
but that approach has a lot of limitations. You'd only be able to use
it if the evolutionary dynamics of each chromosome were independent. To
take a sample from such a scheme, yes, you'd just sample individuals
independently for each chromosome, and arbitrarily associate those
samples with each other to form multi-chromosome individuals for
analysis – with independent simulations for each chromosome, there's not
really a better way to associate individuals with each other. If the
assumption I just mentioned is true – that the evolutionary dynamics of
each chromosome are independent – then maybe that's not a completely
unreasonable thing to do, although I think it'd still distort reality
somewhat. If your model has strong selection, or other things going on
where the fact that chromosomes are packaged together into individuals
matters, then this approach would presumably not work at all.
You could go a step further and actually pre-simulate the exact pedigree
for the simulation and record it into a file (the SLiM manual has an
example of this), including all non-neutral mutations, and then for each
chromosome do a simulation of that exact pedigree (the SLiM manual
again has an example). In that case, you'd have individuals in each
simulation that correspond with each other, have the same parents and
siblings, etc. In that kind of scheme, taking a sample would mean
sampling from the predetermined pedigree's individuals, and then taking
those same individuals from each of the per-chromosome simulations.
There's no reason you culdn't do that, although I'm not aware of anybody
having actually done it. With 10 million individuals your pedigree
files will be quite large! Whether it would be a speed win or not is
hard to say – depends how fast you can simulate the pedigree up front,
and how much overhead is associated with then reading in and following
that pedigree, I suppose. My guess is that it would be a substantial
win, and would then allow you to sample corresponding individuals across
your simulations instead of having to assume independence. You could
also turn on tree-sequence recording in each of the per-chromosome
simulations, so that you still wouldn't need to forward-simulate the
neutral mutations. This seems like quite an interesting approach.
Perhaps Peter has more thoughts on it.
I'd suggest that you look into other possible techniques you might use
to bring the scope of your simulations within reach, too. See not only
the SLiM 5 paper at
https://doi.org/10.1101/2025.08.07.669155 but also
the "SimHumanity" paper at
https://www.biorxiv.org/content/10.1101/2025.09.01.673541v1 for
discussion – the latter paper talks about these issues as well.
Good luck! You're definitely pushing the envelope here, but hey, that's
what makes research interesting! I think with the various techniques
mentioned above, plus possibly some model rescaling and other
techniques, you might be able to get where you're going. :->
Cheers,
-B.
Daiki Tagami wrote on 9/8/25 11:31 AM: