Multiple chromosome simulation with large sample size

10 views
Skip to first unread message

Daiki Tagami

unread,
Sep 8, 2025, 11:31:13 AM (4 days ago) Sep 8
to slim-discuss
Hi all,

I'm now interested in running whole-genome simulations with selection and demographic effects to conduct GWAS, but since the sample size of my simulation is huge (10 million individuals as the final output), a single chromosome simulation takes around 150-300GB of RAM, and I imagine that multiple chromosome simulation is not possible in this case.

I've read the SLiM5 paper in detail, and it says "An alternative to a multi-chromosome (or whole-genome) model is to separately model each chromosome as an independent run of SLiM".
My simulation model is involving 2 subpopulations and I'm taking a subset of those individuals as the final output, so I'm not sure about how I can combine multiple chromosomes for each individual.
In independent chromosome simulation, do we simply randomly assign individual IDs across various chromosomes?
Or, are there any techniques that I can use when merging chromsome data?

Thank you for your help.

Sincerely,
Daiki Tagami

Ben Haller

unread,
Sep 8, 2025, 12:10:58 PM (4 days ago) Sep 8
to Daiki Tagami, slim-discuss
Hi Daiki!  Yes, 10 million individuals is pretty big for forward simulation, and if you combine that with also simulating an entire chromosome, things probably get quite memory-hungry and slow.  :->

I do wonder: are you forward-simulating only the loci that affect your trait of interest (the trait I guess you will GWAS on), or are you simulating all of the neutral mutations as well?  And are you forward-simulating a whole burn-in period for the model, or just a period of time when the trait of interest is under selection?  You don't say whether you're using tree-sequence recording or not, but for a very large simulation like this, it is very likely to be helpful.  You could then leave out all of the neutral mutations, and skip any neutral burn-in that you're doing, and only forward-simulate the non-neutral mutations post-burn-in.  You'd then recapitate and overlay neutral mutations after forward simulation, in Python.  If you're not doing that already, you should definitely consider it; it might make your simulations MUCH more tractable.

That said, yes, you can model each chromosome with an independent run, but that approach has a lot of limitations.  You'd only be able to use it if the evolutionary dynamics of each chromosome were independent.  To take a sample from such a scheme, yes, you'd just sample individuals independently for each chromosome, and arbitrarily associate those samples with each other to form multi-chromosome individuals for analysis – with independent simulations for each chromosome, there's not really a better way to associate individuals with each other.  If the assumption I just mentioned is true – that the evolutionary dynamics of each chromosome are independent – then maybe that's not a completely unreasonable thing to do, although I think it'd still distort reality somewhat.  If your model has strong selection, or other things going on where the fact that chromosomes are packaged together into individuals matters, then this approach would presumably not work at all.

You could go a step further and actually pre-simulate the exact pedigree for the simulation and record it into a file (the SLiM manual has an example of this), including all non-neutral mutations, and then for each chromosome do a simulation of that exact pedigree (the SLiM manual again has an example).  In that case, you'd have individuals in each simulation that correspond with each other, have the same parents and siblings, etc.  In that kind of scheme, taking a sample would mean sampling from the predetermined pedigree's individuals, and then taking those same individuals from each of the per-chromosome simulations.  There's no reason you culdn't do that, although I'm not aware of anybody having actually done it.  With 10 million individuals your pedigree files will be quite large!  Whether it would be a speed win or not is hard to say – depends how fast you can simulate the pedigree up front, and how much overhead is associated with then reading in and following that pedigree, I suppose.  My guess is that it would be a substantial win, and would then allow you to sample corresponding individuals across your simulations instead of having to assume independence.  You could also turn on tree-sequence recording in each of the per-chromosome simulations, so that you still wouldn't need to forward-simulate the neutral mutations.  This seems like quite an interesting approach.  Perhaps Peter has more thoughts on it.

I'd suggest that you look into other possible techniques you might use to bring the scope of your simulations within reach, too.  See not only the SLiM 5 paper at https://doi.org/10.1101/2025.08.07.669155 but also the "SimHumanity" paper at https://www.biorxiv.org/content/10.1101/2025.09.01.673541v1 for discussion – the latter paper talks about these issues as well.

Good luck!  You're definitely pushing the envelope here, but hey, that's what makes research interesting!  I think with the various techniques mentioned above, plus possibly some model rescaling and other techniques, you might be able to get where you're going.  :->

Cheers,
-B.


Daiki Tagami wrote on 9/8/25 11:31 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/0fcc9b63-5462-4e07-8551-c60bd0e2d279n%40googlegroups.com.

Message has been deleted

Daiki Tagami

unread,
Sep 11, 2025, 10:55:39 AM (yesterday) Sep 11
to slim-discuss
Hi Ben,

Thank you very much for your prompt reply, and it was very helpful for me to understand how we can conduct large-scale simulations.
After thinking hard about it for a couple of days, I decided to do independent chromosome simulation at first, and then work on the pedigree-based simulation if I encounter any issues with the independent chromosome simulation.

Also, thank you very much for the "SimHumanity" paper. That paper made me think hard about how we can conduct realistic human genome simulations, and I got a lot of inspiration from the paper.
I hope you have a great day, and thank you again for your kind support.

Sincerely,
Daiki Tagami

Reply all
Reply to author
Forward
0 new messages