Re: strange results from allele_frequencies that depend on call to addNewMutation

33 views
Skip to first unread message
Message has been deleted

Ben Haller

unread,
Jun 14, 2022, 9:28:05 PM6/14/22
to slim-discuss
Hi Jeff.  I think the issue must be with your Python script, in some way; everything seems to be working fine on the SLiM side.  I'm not very pythonic, so I'll let Peter field this one.  :->

Just as an aside – and maybe these are just side effects of simplifying the model for posting here – you define a constant "seed" that is not actually used to set the RNG seed, and you add the new mutations is an early() event which will not work well if their fitness is not neutral, so those seem like possible bugs worth mentioning.  :->  But the frequencies ought to be 0.5 as you say, or close anyhow (I think there's a generation of drift before you write, maybe, and migration in SLiM WF models is stochastic so p1 will not alway be exactly 50% immigrants from p2).

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

On Monday, June 13, 2022 at 6:44:37 PM UTC-7 Jeffrey Groh wrote:
Hi slim team,

I'm attempting a vectorized call to addNewMutation, and confused why the code this appears to be impacting allele frequencies at a fixed number of loci. Similar to recipe 14.7 I introduce marker mutations to genomes of one population to track ancestry in a recipient population. The difference in my code from what I can tell is just that these loci don't cover every position of the genome, but there are 100 of them evenly spaced along the chromosome.

Here is my slim script:

/////////////
initialize() {
defineConstant("outPath", "/Users/jeff/workspace/selection-against-introgression/theory_and_simulations/results/hg1_neutral/");
initializeTreeSeq();
defineConstant("seed", 0);
defineConstant("N1", 100);
defineConstant("N2", 100);
defineConstant("alpha", 0.5);

defineConstant("L", 1e8);
initializeMutationRate(0);
initializeMutationType("m1", 0.5, "f", 0.0);
initializeGenomicElementType("g1", m1, 1.0);
initializeGenomicElement(g1, 0, L-1);
initializeRecombinationRate(1e-8);
}

1 {
    sim.addSubpop("p1", N1);
    sim.addSubpop("p2", N2);
// the donor population carries derived mutations at 100 loci
    p2.individuals.genomes.addNewMutation(m1, 0, asInteger(seq(0, L-1, by = L/100)));
    p1.setMigrationRates(p2, 0.5);
}

1 late() {p2.setSubpopulationSize(0);}
3 late() {
   sim.treeSeqOutput(outPath + "/N" + N1 + "_replicate" + seed + "_F2.trees");
}
////////////////

Below is the code I'm using to get allele frequencies at the 100 loci where I introduced mutations:


####

import msprime, pyslim
import tskit
import numpy as np
import pandas as pd
import sys
import os

def allele_frequencies(ts, sample_sets=None):
    if sample_sets is None:
       sample_sets = [ts.samples()]
    n = np.array([len(x) for x in sample_sets])
    def f(x):
       return x / n
    return ts.sample_count_stat(sample_sets, f, len(sample_sets), windows='sites', polarised=True, mode='site', strict=False)

# Load the .trees file
ts = tskit.load("/Users/jeff/workspace/selection-against-introgression/theory_and_simulations/results/hg1_neutral/N100_replicate0_F2.trees")

sset = ts.samples(1, time = 0)
allset= [sset]
af = allele_frequencies(ts, allset)
print(af)

#####

I expect the frequency at these loci to be ~0.5. But what I'm noticing is that the the values are off by 8 orders of magnitude. I have figured out this error is directly related to the parameter 'L' i.e. number of total loci. For instance, if I make L to be 100, I get the allele frequencies I expect. I suspect there is some error in my call to addNewMutations in how it depends on L, but am not sure why it's wrong. I'm also confused because if I run the model in the GUI, regardless of the value of L I can directly see that the allele frequencies are ~0.5 as expected. Any insights appreciated!

Thanks,
Jeff


Reply all
Reply to author
Forward
0 new messages