starting from a coalescent history

Gertjan Bisschop

unread,

Mar 15, 2019, 7:52:57 AM3/15/19

to slim-discuss

Hi,

Thanks for the great piece of software.

At the moment I am trying out the pyslim interface to start a simulation from a coalescent history or to recapitate a simulation.

Because recapitating in my specific scenario seems to take ages (probably due to sequence length of 8.5Mb and the chosen recombination rate, as well as migration of 0.01 between the demes) I decided to start from a neutral coalescent history (which is faster). This is the code I ran in msprime.

#msprime code

---------------------

length = 8.5e6
recombination_rate = 1.15e-8
mutation_rate = 0.0
Ne = 1e6
m = 0.01
M = np.array([
                [0,m],
                [m,0]])
pop_configs = [
        msprime.PopulationConfiguration(sample_size=1000),
        msprime.PopulationConfiguration(sample_size=1000)]
ts = msprime.simulate(population_configurations=pop_configs, migration_matrix=M, length=length)

new_ts = pyslim.annotate_defaults(ts, model_type="WF", slim_generation=1)
new_ts.dump(directory+'/neutr.trees')

---------------------

I then use the resulting treeSequence to start the following slim simulation (using QTLs as described in the SLiM recipe).

However after 20 time steps I always get the following error:

table_collection_sort: time[parent] must be greater than time[child]

#(SLiM code)

---------------------

initialize() {
    defineConstant("optimum1", 2);
    defineConstant("optimum2", -2);
    //standard deviation normal distribution
    defineConstant("sigma", 2.5);
    defineConstant("sigma_m", 0.05);
    defineConstant("simID", getSeed());
    defineConstant("generations", 1000);
    defineConstant("n_loci", 1000);
    defineConstant("migration", 0.01);
    defineConstant("mutation_rate", 3.46e-6);

    //sample n_loci random positions
    positions = sort(sample(4245000:4255000, n_loci));
    defineConstant("POS", positions);
    //adapt so enough neutral sequence at ends
    ends = c(positions[n_loci-1], 8500000-1);
    rates = c(mutation_rate, 0);

    //use spatial dimension to store sum of effect sizes
    //initializeSLiMOptions(dimensionality="x");
    //mutation rate defined for all n_loci sites, add neutral section after         last mutation site
    initializeMutationRate(rates, ends);

    // m1 mutation
    initializeMutationType("m1", 0.5, "n", 0.0, sigma_m); //QTL
    //make sure slim includes fixed mutations to calculate fitness
    m1.convertToSubstitution = F;
    //multiple mutations on one locus, stack effect sizes s, only remember last
    m1.mutationStackPolicy = "l";

    // g1 genomic element type: uses m1 for all mutations
    initializeGenomicElementType("g1", m1, 1.0);
    for (element in positions)
        initializeGenomicElement(g1, element, element);
    initializeRecombinationRate(1.15e-5);
    initializeTreeSeq();
}

1 {
    sim.readFromPopulationFile('neutr.trees');
    //size in each deme 1000
    //sim.addSubpop("p1", 1000);
    //sim.addSubpop("p2", 1000);
    p0.setMigrationRates(p1, migration);
    p1.setMigrationRates(p0, migration);
    }

1: late(){
    inds = sim.subpopulations.individuals;
    inds.tagF = inds.sumOfMutationsOfType(m1);
        }

//make QTLs intrinsically neutral
fitness(m1){
    return 1.0;
        }
//calculate fitness based on sum of effect size of all loci
//selection to different optima (2.0 and -2.0)

1: fitness(NULL, p0){
    return 1 + exp(-((individual.tagF - optimum1)^2)/(2*sigma^2));
        }
1: fitness(NULL, p1){
    return 1 + exp(-((individual.tagF - optimum2)^2)/(2*sigma^2));
        }
1000 late() {
    sim.treeSeqOutput('loc_adap.trees');
    sim.simulationFinished();}
---------------------

I think I annotated the msprime treeSequence properly, so I can't figure out what I did wrong.

Any help would be greatly appreciated.

Kind regards,

gertjan

Ben Haller

unread,

Mar 15, 2019, 10:53:31 AM3/15/19

to slim-discuss

Hi Gertjan. When you run your SLiM model, you should be getting a warning that says:

#WARNING (SLiMSim::ExecuteMethod_readFromPopulationFile): readFromPopulationFile() should probably not be called from an early() event in a WF model; fitness values will not be recalculated prior to offspring generation unless recalculateFitness() is called.

When loading population state from a .trees file, this warning should perhaps be a fatal error instead, because you will inevitably get the error that you are seeing the first time you try to simplify the tree sequence. The reason is that you load the population in a 1 early() event, which means that the loaded population is recorded in the tree-sequence recording tables for generation 1, and then offspring generation occurs in generation 1, creating a batch of offspring individuals that are also recorded in generation 1. The tree-sequence recording code – specifically, tree-sequence simplification – has a problem with offspring that are marked with the same generation timestamp as their parents, and so you get an error.

I thought Peter and I had found a fix for this issue (slightly tweaking the recorded time for the loaded population to be 0.9999 or something instead of 1), since it's a bit annoying, but apparently our fix is not working, at least in some cases. I've made a note to look into it. In any case, your 1 early() event should probably be a 1 late() event anyway to be technically correct, and making that change seems to fix the problem, for me. That way the loaded population is marked as generation 1, and then the first generation of offspring occurs in generation 2, so the recorded tree-sequence is happy. WF models should generally load population state in a late() event for consistency.

Incidentally, your model spends a lot of time simplifying; if that bothers you, and if you can afford a higher peak memory usage, you may want to simplify less often. You can do that by adding "simplificationRatio=INF" to your initializeTreeSeq() call (telling SLiM not to auto-simplify the tree sequence at all), and then adding this to your 1: late() event:

if (sim.generation % 200 == 0)

sim.treeSeqSimplify();

The interval of 200 can be tweaked; the less often you simplify, the higher your peak memory usage will be. Even if you don't care about memory at all, a bigger number is not necessarily better; there is usually some intermediate optimum simplification interval at which the model runs fastest, and intervals beyond that optimum will actually run more slowly because each simplification will take so much longer that it will more than compensate for the decreased frequency of simplification. So the best number is generally found by trial and error. Anyway, maybe your runtimes are short enough that it's not a problem, but I thought I'd mention it since I saw the model spending a lot of time simplifying. :->

Cheers,

-B.

Benjamin C. Haller

Messer Lab

Cornell University

Peter Ralph

unread,

Mar 15, 2019, 1:00:38 PM3/15/19

to Ben Haller, slim-discuss

Thanks for the report! The procedure for starting a SLiM simulation from msprime has a lot of wrinkles to be ironed out, like this one. If you would like to start an issue on github (in the pyslim package), that would be useful.

But, since you do not actually do anything in SLiM with the prior history (right?), there's no need to do it this way - you can simulate in SLiM without prior history, then add it on afterwards with .recapitate(). There are many fewer gotchas with that method. That's what I'd recommend doing:

https://github.com/tskit-dev/pyslim#quickstart-finishing-an-uncoalesced-slim-simulation

--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To post to this group, send email to slim-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/b84607be-b125-4127-82e6-4c143a1c26c2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter Ralph

unread,

Mar 15, 2019, 1:23:49 PM3/15/19

to Ben Haller, slim-discuss

Ben reminded me that you said that recapitation was taking a long time. My guess is that this is because you are trying to recapitate but with no migration between the subpopulations. You should be passing pretty much all the same arguments to .recapitate( ) as to .simulate( ). I could help debug if you post the recapitation code.

Reply all

Reply to author

Forward