non neutral burn-in for large simulation

105 views
Skip to first unread message

DAVID VENDRAMI

unread,
Jun 13, 2022, 2:41:20 AM6/13/22
to slim-discuss

Dear SLiM community,

I am currently trying to run a large simulation that simply takes too long to complete.  I would like to hear if anybody has some suggestions on how to speed this up.

The aim of my simulation is the following. Simulate the known demographic history (which includes a severe bottleneck) of my study system and (a) calculate genetic load of the entire population at given time points, and (b) perform a viability analysis (i.e.: check how often my population would go extinct during or after the bottleneck).

Main features of the simulations are:

- Initial population size of 300,000 samples;

- I only simulate deleterious mutations (neutral mutations do not contribute to genetic load), which appear on my ~2Gb genome at a rate of U = 1.2 deleterious mutations per diploid genome per generation (close to empirical estimates for some model species);

- It mimics the reproductive biology of the species, which causes Ne to be approximately 0.01 of the census size due to highly skewed reproductive success.

Now, a few considerations:

- Obviously the simulation takes super long because I am simulating a large population. The size of 300,000 was chosen because I do have a Ne estimate of 3,000 (3,000 * 100 = 300,000) and I therefore believe I have to stick to it.

- One way around may be to not simulate the reproductive biology of this species (which causes N/Ne to be 100) and simply input Ne sizes (rather than census sizes) into an ideal WF model. However, this is not desirable because the specific reproductive biology of this species may introduce important stochasticity (not captured by a WF model) which may lead to extinction during the bottleneck. So, I’m not really keen to remove this aspect;

- Scaling down the genome does not appear to improve the situation, because U needs to be 1.2 also within a smaller genome (otherwise the genetic load would be underestimated). Thus, the number of mutations SLiM has to keep track of is the same. Moreover, if I scale down the genome I will have to increase the recombination rate so that the “genetic” length of the genome will be the same (otherwise, linked selection (or, on the other hand “linked drift”) during purging, likely to happen during the bottleneck, will be unrealistically influential). 

The main problem is that the simulation requires a really long burn-in for reaching equilibrium, but the burn-in is clearly non-neutral. As a consequence, I think cannot use coalescent simulations or recapitation to generate the burn-in. 

I then guess my question is: can anyone suggest a way to generate a non-neutral burn-in? Importantly, the suggested method should still allow me to obtain information on selection coefficient, dominance and allele frequency (relative to the entire population, and not a sample, ideally) of ALL deleterious mutations present in the population at given time points.

Thank you all very much in advance for your insights and help,

Best,

David

Peter Ralph

unread,
Jun 13, 2022, 4:02:39 PM6/13/22
to DAVID VENDRAMI, slim-discuss
Hi, David - good questions. Here's some thoughts:

- If burn-in is very expensive, you can re-use the same burn-in many times. It doesn't sound to me like anything you're doing should depend very much at all on the randomness of the burn-in (just on number, frequency and selection coef of the segregating mutations; since you will have a lot of these, this should be pretty simliar across burn-in simulations), and so possibly just doing *three* burn-ins, and checking you get the same answer no matter which burn-in you use, should be OK.

- Perhaps the fact that 99% of new offspring die is important during the bottleneck, but maybe it's not during the burn-in phase. You could check by doing one simulation of each type (ie. one with N=3e3 and N/Ne = 1 and the other with N=3e5 and N/Ne=100) and seeing if you get qualitatively different downstream answers. (And, where you do the downstream stuff with the real reproductive biology, as you say.)


- peter

________________________________________
From: slim-d...@googlegroups.com <slim-d...@googlegroups.com> on behalf of DAVID VENDRAMI <david.v...@edu.unife.it>
Sent: Sunday, June 12, 2022 11:41 PM
To: slim-discuss
Subject: non neutral burn-in for large simulation
--
SLiM forward genetic simulation: http://messerlab.org/slim/<https://urldefense.com/v3/__http://messerlab.org/slim/__;!!C5qS4YX3!CBr23T897fmx0hVdgvfeoLOQrZ5tXhmV3RI3Ji6A-1sCTWscfo0a6oOInsVJsVbuE0XMuQlyMBrr87p90t9NlA4xYNPk$>
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com<mailto:slim-discuss...@googlegroups.com>.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/90b3a2ef-6845-495d-92c4-63025db52792n%40googlegroups.com<https://urldefense.com/v3/__https://groups.google.com/d/msgid/slim-discuss/90b3a2ef-6845-495d-92c4-63025db52792n*40googlegroups.com?utm_medium=email&utm_source=footer__;JQ!!C5qS4YX3!CBr23T897fmx0hVdgvfeoLOQrZ5tXhmV3RI3Ji6A-1sCTWscfo0a6oOInsVJsVbuE0XMuQlyMBrr87p90t9NlHK8ReTr$>.

DAVID VENDRAMI

unread,
Jun 14, 2022, 4:11:39 AM6/14/22
to Peter Ralph, slim-discuss
Hi Peter,

Thank you very much for the useful thoughts.

- Re-using the burn-in is definitely a good idea. However I am still struggling to have 1 single burn-in completed. With a population of 300,000 individuals it takes a lot of generations to reach equilibrium. I have a simulation that has been running now for 2 weeks which has not been completed yet.

- your second point makes sense to me! So basically, you are suggesting to assume N/Ne = 1 (and therefore N = 3,000) during the burn-in and run it as in a WF model until equilibrium is reached. After that I can implement the specific reproductive strategy of my species and introduce the bottleneck. Did I get it right? My only question here would be: what would be the appropriate way of increasing the population size from 3,000 (population size during burn-in) to 300,000 (population size when N/Ne = 100, i.e.: when the reproductive strategy of my species is implemented right after the burn-in)? I guess I could simply let it grow from 3,000 to 300,000 by increasing the carrying capacity... Or is this too simplistic?

Many thanks for your insights!
Best,
David

Peter Ralph

unread,
Jun 14, 2022, 9:57:10 AM6/14/22
to DAVID VENDRAMI, Peter Ralph, slim-discuss
> - Re-using the burn-in is definitely a good idea. However I am still struggling to have 1 single burn-in completed. With a population of 300,000 individuals it takes a lot of generations to reach equilibrium. I have a simulation that has been running now for 2 weeks which has not been completed yet.

Ah yes, well, darn. Hm - I'm surprised, though - I would think you'd
be able to run a population of this size for 10Ne = 30,000 generations
quicker than that. It may be that there's some part of your script
that could be more efficient - it's probably worth running the
profiler in SLiMgui to check.

> - your second point makes sense to me! So basically, you are suggesting to assume N/Ne = 1 (and therefore N = 3,000) during the burn-in and run it as in a WF model until equilibrium is reached. After that I can implement the specific reproductive strategy of my species and introduce the bottleneck. Did I get it right? My only question here would be: what would be the appropriate way of increasing the population size from 3,000 (population size during burn-in) to 300,000 (population size when N/Ne = 100, i.e.: when the reproductive strategy of my species is implemented right after the burn-in)? I guess I could simply let it grow from 3,000 to 300,000 by increasing the carrying capacity... Or is this too simplistic?

Right, that's the suggestion. And, maybe you could switch the
reproductive strategy 1,000 generations ago or so. A good thing to
remember here is that you're working with approximations here anyhow -
your species hasn't been of fixed constant size for the last 10Ne
generations anyhow, so you're aiming for something reasonable. But, hm
- you could probably do this gradually, but if you changed the
reproductive strategy and the carrying capacity all at once at the
right point in the life cycle, then the model "shouldn't notice",
since there's only 1% of the previous generation surviving to adults
in the new reproductive strategy? (But, careful about what stage you
do the switch-over at!).

** peter

Ben Haller

unread,
Jun 14, 2022, 8:46:38 PM6/14/22
to slim-discuss
I would just add that two weeks is not necessarily such a big deal, if you can re-use that burn-in as Peter suggests.  A field season, for empirical biologists, often takes several months; an experiment on the LHC can take years to gather enough data; sometimes science takes a bit of time.  :->  Of course it would be nice if the burn-in were faster, and I share Peter's curiosity about why it is taking so long, and maybe there are ways to speed it up – but big forward simulations are time-consuming, and to some extent that's just the way it is.  I've had projects where the compute time was measured in months; you just find something else to work on while the job is running.  I don't mean to be flip; obviously I take the performance of SLiM rather seriously.  I'm just saying sometimes it is what it is, and you live with it, and two weeks is really not so bad in the grand scheme of things.  :->

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

DAVID VENDRAMI

unread,
Jun 15, 2022, 11:21:14 AM6/15/22
to Ben Haller, slim-discuss
Hi Peter, hi Benjamin,

Thank you very much for the further insights.

Regarding the performance of the burn-in: after trying to run it under different conditions, it appears clear to me that what is slowing it down is the size of the simulated population (300,000), given a mutation rate of 2.7e-10, a recombination rate of 1e-8 and a genome ~ 2 Gb in length.

Then, it seems that the reproduction step is also quite time-consuming (but I will now remove it from the burn-in). This is a harem style reproduction where only a certain proportion of males ("PropM") reproduces. Here's how I implemented:

// Harem style reproduction
reproduction(){
// Reproducing males
males=p1.individuals[p1.individuals.sex == "M" & p1.individuals.age >= 1]; // All males in the population of age 1 or older
repr = round(PropM*length(males)); // Select X% of random males as reproducers
repr_males = sample(males,asInteger(repr),replace=F);

// Females
repr_females=p1.individuals[p1.individuals.sex=="F" & p1.individuals.age >= 1];

// reproduction
for (i in repr_females){
parent1 = i;
parent2 = sample(repr_males,1);
p1.addCrossed(parent1, parent2); // Each female will cross with one of the reproducing males
}
}

Many thanks for your help,
All the best,
David


--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/61ef20bb-8253-4c07-82a3-1a448ee6f44fn%40googlegroups.com.

Ben Haller

unread,
Jun 15, 2022, 11:30:41 AM6/15/22
to DAVID VENDRAMI, slim-discuss
Hi David.  It looks like that reproduction() callback does the reproduction for the whole population, not just for the focal individual – what I call "big bang" reproduction.  That is fine, but if you use the reproduction() callback in that way, you need to end with "self.active = 0;" to disable the callback for the rest of the generation.  Otherwise, it gets called again for every other focal individual, and reproduces the whole population over and over, producing vastly more offspring than appropriate.  This is probably why your model is running slowly – you are generating a huge number of offspring (and then probably killing the vast majority of them with density-dependent fitness immediately after).  Apologies if I have misunderstood; but my guess is that if you add "self.active = 0;" at the end of this callback, your model will run orders of magnitude faster, and produce more correct results, too.  :->  This is discussed in, for example, section 16.3 of the manual, where "big bang" reproduction is introduced.


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


DAVID VENDRAMI wrote on 6/15/22 8:21 AM:

DAVID VENDRAMI

unread,
Jun 15, 2022, 11:43:45 AM6/15/22
to Ben Haller, slim-discuss
Hi Benjamin,

Dang, that's true! What a silly oversight. Thank you very much for spotting this!

Best,
David
Reply all
Reply to author
Forward
0 new messages