Hi all,
I've been working with SLiM to
explore how genetic architectures influence polygenic adaptation in a per-locus WF model. Since
I'm doing a parameter sweep of sorts to explore different aspects of
architectures (allelic effects, number of loci, linkage etc.), I need to
run a lot of models. In the past I have been using a full-forward
approach to do this: I measure a burnt-in population as one that has an
average genome-wide heterozygosity that is equilibriated around the
theoretical expectation under mutation-drift balance ( E[He] =
θ
/(1 +
θ) ). This has worked reasonably well, but the models tend to get slow
at higher recombination rates and larger population sizes, which makes
running these burn-ins a limitation to some of my parameter ranges
(particularly population size).
I've been
tinkering with the new version of msprime to explore using a coalescent process to
burn-in my populations. This seems really promising so far - I'm able to generate neutral variation for my populations considerably faster than from scratch in SLiM,
and set mutation types so that QTLs can respond to selection when the
pressure is applied. Loading it all into SLiM is straightforward as well, which is awesome! However, there are a few uncertainties I'd like to
clear up.
First, I don't intend to use
tree-sequence data for my final output.
As such, it'd be good to be able to read in a tree sequence for my
burn-in without needing to store and update a tree through the rest of my
simulation - this is especially pertinent with larger population sizes
where simplification slows the simulation down quite considerably, and
uses a fair amount of extra memory (which means fewer concurrent
simulations on the HPC if I start hitting RAM limits). Is there any way
that I can read in the .trees to initialise my population's standing
variation without storing/updating trees in SLiM later on?
Second,
I'm pretty sure I'm violating some assumptions of the coalescent, but
I'm not sure how much it matters. Since I need variation generated for
each individual in the population, my sample size is equal to the
population size in my tree sequence simulation. I'm using msprime purely
to generate some standing variation to reduce the number of generations
I need to reach mutation-drift equilibrium, so I'm unsure if this has
any ramifications for my conclusions. I'll be running some comparisons to check this regardless, but any advice would be
fantastic.
Below are the relevant parts of my msprime script:
```
pop = msprime.Demography()
pop.add_population(name = "A", initial_size = N)
ts = msprime.sim_ancestry(samples = N, demography = pop, sequence_length = 100, discrete_genome = True,
recombination_rate = rwide, random_seed = seed, model = "dtwf")
mutated = pyslim.annotate_defaults(ts, model_type="WF", slim_generation=1)
mutated = msprime.sim_mutations(mutated, rate = 8.045e-6, random_seed = seed,
model = msprime.SLiMMutationModel(type = 1))
mutated.simplify()
mutated.dump(filepath)
```
Thanks in advance!
Cheers,
Nick