Coalescent burn-in advice

Nick O'Brien

unread,

Nov 10, 2021, 12:02:10 AM11/10/21

to slim-discuss

Hi all,

I've been working with SLiM to explore how genetic architectures influence polygenic adaptation in a per-locus WF model. Since I'm doing a parameter sweep of sorts to explore different aspects of architectures (allelic effects, number of loci, linkage etc.), I need to run a lot of models. In the past I have been using a full-forward approach to do this: I measure a burnt-in population as one that has an average genome-wide heterozygosity that is equilibriated around the theoretical expectation under mutation-drift balance ( E[He] = θ /(1 + θ) ). This has worked reasonably well, but the models tend to get slow at higher recombination rates and larger population sizes, which makes running these burn-ins a limitation to some of my parameter ranges (particularly population size).

I've been tinkering with the new version of msprime to explore using a coalescent process to burn-in my populations. This seems really promising so far - I'm able to generate neutral variation for my populations considerably faster than from scratch in SLiM, and set mutation types so that QTLs can respond to selection when the pressure is applied. Loading it all into SLiM is straightforward as well, which is awesome! However, there are a few uncertainties I'd like to clear up.

First, I don't intend to use tree-sequence data for my final output. As such, it'd be good to be able to read in a tree sequence for my burn-in without needing to store and update a tree through the rest of my simulation - this is especially pertinent with larger population sizes where simplification slows the simulation down quite considerably, and uses a fair amount of extra memory (which means fewer concurrent simulations on the HPC if I start hitting RAM limits). Is there any way that I can read in the .trees to initialise my population's standing variation without storing/updating trees in SLiM later on?

Second, I'm pretty sure I'm violating some assumptions of the coalescent, but I'm not sure how much it matters. Since I need variation generated for each individual in the population, my sample size is equal to the population size in my tree sequence simulation. I'm using msprime purely to generate some standing variation to reduce the number of generations I need to reach mutation-drift equilibrium, so I'm unsure if this has any ramifications for my conclusions. I'll be running some comparisons to check this regardless, but any advice would be fantastic.

Below are the relevant parts of my msprime script:

```

pop = msprime.Demography()
pop.add_population(name = "A", initial_size = N)

ts = msprime.sim_ancestry(samples = N, demography = pop, sequence_length = 100, discrete_genome = True,
recombination_rate = rwide, random_seed = seed, model = "dtwf")

mutated = pyslim.annotate_defaults(ts, model_type="WF", slim_generation=1)
mutated = msprime.sim_mutations(mutated, rate = 8.045e-6, random_seed = seed,
model = msprime.SLiMMutationModel(type = 1))
mutated.simplify()
mutated.dump(filepath)

```

Thanks in advance!

Cheers,

Nick

Miguel de Navascués

unread,

Nov 10, 2021, 3:51:23 AM11/10/21

to slim-d...@googlegroups.com

Hi Nick,

For the first point, a quick fix could be to load the tree sequence in
SLiM, output the population data outputFull() then open a new instance
of SLiM read the text file with readFromPopulationFile() then run your
model. Maybe there is a better way, this is the first thing it came to
my mind.

For the second point, when simulating tree sequences with msprime you
are using `sim_ancestry(..., model = "dtwf")`, which is not the standard
coalescent but a model that is more adequate when sample size=population
size. So I think you are OK. I think this is the reference for it:
https://doi.org/10.1371/journal.pgen.1008619

Best

Miguel

On 10/11/2021 06:02, Nick O'Brien wrote:
> Hi all,
>
> I've been working with SLiM to explore how genetic architectures
> influence polygenic adaptation in a per-locus WF model. Since I'm doing
> a parameter sweep of sorts to explore different aspects of architectures
> (allelic effects, number of loci, linkage etc.), I need to run a lot of
> models. In the past I have been using a full-forward approach to do
> this: I measure a burnt-in population as one that has an average
> genome-wide heterozygosity that is equilibriated around the theoretical

> expectation under mutation-drift balance ( E[He] = θ**/(1 + θ) ). This

> --
> SLiM forward genetic simulation: http://messerlab.org/slim/
> <http://messerlab.org/slim/>
> ---
> You received this message because you are subscribed to the Google
> Groups "slim-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to slim-discuss...@googlegroups.com
> <mailto:slim-discuss...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/slim-discuss/0cb73d17-15a8-41ec-88af-647bfe21212bn%40googlegroups.com
> <https://groups.google.com/d/msgid/slim-discuss/0cb73d17-15a8-41ec-88af-647bfe21212bn%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Miguel de Navascués

UMR CBGP, INRAE
Centre de Biologie pour la Gestion des Populations
755 avenue du campus Agropolis
CS30016
34988 Montferrier-sur-Lez cedex (France)

phone: +33499623370
fax: +33499623345
e-mail: miguel.navascues AT inrae.fr

Ben Haller

unread,

Nov 10, 2021, 8:30:58 AM11/10/21

to slim-discuss

Hi Nick,

Regarding your first question (loading in a .trees file for a simulation that does not otherwise use tree-sequence recording), this is possible in the current GitHub head version of SLiM. It was reported as an issue (https://github.com/MesserLab/SLiM/issues/216) and has been fixed. So if you build the GitHub head version of SLiM (instructions for doing so are in chapter 2 of the manual), or wait for SLiM 3.7 (which should be released by the end of the year, I hope), you should be able to do this. Let me know if you experience any problems; this change has not received a great deal of testing thus far (but since it shares most of its code path with the normal course of operations, and then just throws away all the tree-seq data structures post-load, problems are not expected – it's a very straightforward fix).

Regarding your second question, Miguel's answer seems likely to be good (thanks Miguel).

Cheers,

-B.

Benjamin C. Haller

Messer Lab

Cornell University

Nick O'Brien

unread,

Nov 10, 2021, 11:32:50 PM11/10/21

to slim-discuss

Hi Ben and Miguel,

Thank you for your comments! I've tried out the GitHub head and it seems to be working fantastically.

I'll let you know if I encounter any problems in my test runs.

Miguel, I read that paper and it quelled my concerns about using this approach. Very cool stuff.

Thanks again!

Cheers,

Nick

Reply all

Reply to author

Forward