Memory usage from tree sequence sim.readFromPopulationFile

22 views
Skip to first unread message

Daiki Tagami

unread,
Jun 16, 2026, 5:33:24 AM (10 days ago) Jun 16
to slim-discuss
Hi all,

I'm now doing a large-scale simulation which is divided up into 2 steps:

1. Simulate a small tree sequence to run the simulation deep into time
In this step, I'm using:

initialize() {
  initializeTreeSeq(timeUnit="generations");
  ...
}
s1 late() {
  sim.treeSeqOutput("XXX.trees"):
}

2. Simulate an explosive growth based on that tree sequence
In this step, I'm using:

initialize() {
  initializeTreeSeq(simplificationInterval=5, timeUnit="generations");
  // I need this simplification so that the required RAM won't explode
  ...
}
1 late() {
   sim.readFromPopulationFile("XXX.trees")
}
s1 late() {
  sim.treeSeqOutput("YYY.trees"):
}

To my great surprise, this step 2 is requiring much more RAM (around x2) compared with the previous simulation when I simply wrote the codes to conduct step 1 and step 2 together without doing sim.readFromPopulationFile.

Would it be possible for someone to let me know what is causing this issue?
Is it better for me to preprocess the resulting tree sequence (such as running simplification) before loading it to SLiM again?

Thank you for your help.

Sincerely,
Daiki Tagami

Ben Haller

unread,
Jun 16, 2026, 6:00:11 AM (10 days ago) Jun 16
to slim-d...@googlegroups.com
Hi Daiki!

Well, it's a little hard to guess without more details about the before and after model setup, and probably doing some testing in a debugger and memory profiler.  One possibility is that readFromPopulationFile() is pushing the memory usage high-water mark higher than it would otherwise go due to bookkeeping that it has to do.  Reading in a tree sequence and building the corresponding SLiM simulation state is a complicated process that involve the use of various temporary data structures.  Those data structures are thrown out after reading is done, but the high-water mark might be pushed unusually high while they still exist.  Another possibility is, of course, a memory leak in SLiM involving readFromPopulationFile() (although I'd be a bit surprised, since leak-check tests are run using unit tests that include reading in a .trees file, so a leak in this code path ought to be caught).  A third possibility is user error on your end, where the "before" and "after" versions of the model are not, in fact, exactly the same apart from the write/read in the middle.  A fourth possibility is issue #552 at https://github.com/MesserLab/SLiM/issues/552, which is about unnecessarily high memory usage when loading very large tree sequences; but I'm guessing this is not the problem biting you, since you say that the .trees file generated in step 1 is "small".  (But depending on exactly what you mean by "explosive" growth in step 2, it is possible that the issue discussed in #552 would actually affect the memory usage in step 2, also.)  Hard to know without a fair bit of investigation.  If the problem is #552, that issue is difficult to fix, and unlikely to be addressed soon; so you might want to simply keep your simulation as one part, without the write/read division into two steps.  (Why do you want to split it, anyway?  You didn't mention that...)

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/5fd547b9-bba1-435e-b3a2-315f8c29fd7an%40googlegroups.com.

Daiki Tagami

unread,
Jun 16, 2026, 8:27:07 AM (10 days ago) Jun 16
to slim-discuss
Hi Ben,

Thank you very much for your prompt reply!
The intial step of "readFromPopulationFile()" does not take up a lot of memory, and it is the simulation afterwards that is memory consuming.

The current simulation model has 4 different populations, and due to memory limitations, it is very difficult to use SLiM to simulate 4 different explosive growth at the same time in 1 SLiM simulation.
So instead, I was thinking about simulating individuals until a certain time point before the initial population diverges into 4 different populations, and then have 4 different simulations afterwards to model the explosive growth.

The initial simulation only had 1 population by using the same demographic model, recombination map, etc (I removed individuals from 3 other populations at a certain time point), and the memory usage for that one was around ~200GB. I was expecting to run 4 different jobs with around ~200GB of memory usage, but in the current one, the explosive growth simulation is taking 300GB+ for each population, which makes it impossible for me to run the simulation.

Do you have any sugestions regarding this?
Thank you for your help.

Sincerely,
Daiki Tagami

Ben Haller

unread,
Jun 16, 2026, 9:48:03 AM (10 days ago) Jun 16
to slim-d...@googlegroups.com
Hi Daiki!

This sounds likely to be due to issue #552, and as I wrote, that issue is unlikely to be fixed soon.  :-O

I'd suggest that you:

(1) Run four separate replicates to produce your four sub-simulations;

(2) Set the RNG seed for the four replicates to the same initial value, to produce exactly the same results from step 1 across all four;

(3) At the point where you transition to step 2, split off the 1/4 of the population that you want for that replicate and simply discard the other 3/4

(4) Also at that point, probably set the RNG seed to a new value that is different for the four replicates, to ensure that they proceed independently rather than all using the same random number sequence from that point forward.

Since step 1 is quite quick, I guess, this should not add much runtime (you'll do step 1 four times, once in each replicate, but I guess that is not a big deal).  It should keep the memory usage down by avoiding the increased memory footprint due to issue #552.


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Daiki Tagami

unread,
Jun 16, 2026, 10:43:43 AM (10 days ago) Jun 16
to slim-d...@googlegroups.com
Hi Ben,

Thank you very much for your reply, and it was very helpful for me.
After having a conversation with my supervisor, we decided to first
try simulating a large sample of Europeans (one of the populations in
the model) and think about other populations later.

Also, out of curiosity, is it possible to modify
"simplicationInterval" in initializeTreeSeq(simplificationInterval=5,
timeUnit="generations"); based on different generations?
The required RAM for simulating individuals until the recent ~100
generations is not very large, and we only need this step for the
final 100 generations.
Using this for the entire simulation takes some time, so I'm wondering
about how I can speed up this simulation.

The steps of the current simulation are:
1. Simulate individuals in the order of 10,000 for 70,000 generations
- This step only uses ~30GB, but it takes time to finish this
simulation. It would be ideal to not use simplificationInterval=5 here
to speed up the simulation.
2. Final explosive growth of individuals for around 200 generations -
This only becomes extremely large in the final 100 generations, where
the output is 7-8 million individuals

My current approach is using
initializeTreeSeq(simplificationInterval=5, timeUnit="generations");
and then using demes-slim to load the entire demographic history to
conduct the simulations.
This "simplificationInterval=5" was necessary to make sure that we can
run this simulation without out of memory error.

Thank you very much for your help, and it was extremely helpful for me.

Sincerely,
Daiki Tagami
> You received this message because you are subscribed to a topic in the Google Groups "slim-discuss" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/slim-discuss/_Tvqc-SruOI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to slim-discuss...@googlegroups.com.
> To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/b733bb8d-880f-4dc7-91db-b5f560e43502%40mac.com.

Ben Haller

unread,
Jun 16, 2026, 10:50:38 AM (10 days ago) Jun 16
to slim-d...@googlegroups.com
Hi Daiki!

Sounds like a fine plan.  :->

You can't modify the simplification interval; but you can just turn
auto-simplification off altogether, and then call treeSeqSimplify()
whenever you want to trigger a simplification.  That gives you complete
control over the timing.

I'm not familiar with demes-slim, so I can't speak to that aspect.

Good luck and happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Daiki Tagami

unread,
Jun 16, 2026, 12:36:17 PM (10 days ago) Jun 16
to slim-discuss
Hi Ben,

Thank you very much for your reply, and it was very helpful for me.
I will try calling treeSeqSimplify() in the final part every 5 ticks, and I will let you know if I get stuck with anything.

I really appreciate your help, and I hope you have a great day!

Sincerely,
Daiki Tagami

Reply all
Reply to author
Forward
0 new messages