Hi slim team,I'm attempting a vectorized call to addNewMutation, and confused why the code this appears to be impacting allele frequencies at a fixed number of loci. Similar to recipe 14.7 I introduce marker mutations to genomes of one population to track ancestry in a recipient population. The difference in my code from what I can tell is just that these loci don't cover every position of the genome, but there are 100 of them evenly spaced along the chromosome.Here is my slim script:
/////////////
initialize() {
defineConstant("outPath", "/Users/jeff/workspace/selection-against-introgression/theory_and_simulations/results/hg1_neutral/");
initializeTreeSeq();
defineConstant("seed", 0);
defineConstant("N1", 100);
defineConstant("N2", 100);
defineConstant("alpha", 0.5);
defineConstant("L", 1e8);
initializeMutationRate(0);
initializeMutationType("m1", 0.5, "f", 0.0);
initializeGenomicElementType("g1", m1, 1.0);
initializeGenomicElement(g1, 0, L-1);
initializeRecombinationRate(1e-8);
}
1 {
sim.addSubpop("p1", N1);
sim.addSubpop("p2", N2);
// the donor population carries derived mutations at 100 loci
p2.individuals.genomes.addNewMutation(m1, 0, asInteger(seq(0, L-1, by = L/100)));
p1.setMigrationRates(p2, 0.5);
}1 late() {p2.setSubpopulationSize(0);}3 late() {
sim.treeSeqOutput(outPath + "/N" + N1 + "_replicate" + seed + "_F2.trees");}
////////////////Below is the code I'm using to get allele frequencies at the 100 loci where I introduced mutations:
####
import msprime, pyslim
import tskit
import numpy as np
import pandas as pd
import sys
import os
def allele_frequencies(ts, sample_sets=None):
if sample_sets is None:
sample_sets = [ts.samples()]
n = np.array([len(x) for x in sample_sets])
def f(x):
return x / n
return ts.sample_count_stat(sample_sets, f, len(sample_sets), windows='sites', polarised=True, mode='site', strict=False)
# Load the .trees file
ts = tskit.load("/Users/jeff/workspace/selection-against-introgression/theory_and_simulations/results/hg1_neutral/N100_replicate0_F2.trees")
sset = ts.samples(1, time = 0)
allset= [sset]
af = allele_frequencies(ts, allset)
print(af)#####
I expect the frequency at these loci to be ~0.5. But what I'm noticing is that the the values are off by 8 orders of magnitude. I have figured out this error is directly related to the parameter 'L' i.e. number of total loci. For instance, if I make L to be 100, I get the allele frequencies I expect. I suspect there is some error in my call to addNewMutations in how it depends on L, but am not sure why it's wrong. I'm also confused because if I run the model in the GUI, regardless of the value of L I can directly see that the allele frequencies are ~0.5 as expected. Any insights appreciated!Thanks,Jeff