I'm currently researching the fluctuations of allele frequencies during polygenic adaptation, with a focus on varying selection coefficients across different loci. My objective is for all loci to be biallelic, akin to SNPs, at the population level. Thanks to prior guidance from this mailing list, I've successfully made all sites biallelic. However, I'm now facing challenges when attempting to initiate simulations with allele frequencies similar to those that would be expected under neutrality.
To sidestep the burn-in phase, I tried using recapitation with PySLIM. But, as far as I have been able to find, PySLIM recapitation results in multiple alleles for each locus — a scenario I'd like to avoid. When I tried to retain only one mutation per site (recapitating first and removing generated mutations from the tree afterwards), I ran into a slew of issues concerning metadata and tree sequence outputs. After numerous days of unsuccessful attempts, I've decided to forgo using PySLIM for this purpose.
It's worth noting that I'm not aiming for the initial population to be in perfect mutation-drift equilibrium. Given the biallelic nature of the sites in my model, a rough approximation of the allele frequency distribution under neutrality would work well enough. This, in theory, roughly aligns with a negative exponential distribution. Here's a bit from my code wherein I tried to introduce mutations in SLiM, drawing frequencies from this distribution:
for (locus in seqLen(ChromosomeLength)) {
derived_allele_freq = rexp(1);
num_derived = asInteger(round(derived_allele_freq * NePop));
num_derived = max(min(num_derived, NePop), 0);
if (num_derived > 0) {
derived_allele_individuals = sample(p1.individuals, num_derived);
derived_allele_individuals.genomes.addNewMutation(mutation_type, locus);
}
}
Unfortunately, the code doesn't produce the desired result, whether I use AddNewMutation or AddNewDrawnMutation. I'm seeking guidance on how to introduce these initial allele frequencies in SLiM without recapitation. It's vital for my model that every locus remains biallelic within the population, regardless of whether they are IBS or IBD. If there's a feasible way to recapitate and still ensure biallelic status for all sites, I'd welcome that solution as well.
Thank you for your unwavering support and the invaluable insights this mailing list consistently offers. It is honestly incredible.
Best regards,
Francesc
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/CAH8u8PrCxw49bMmB6AFfi%3DnaZajGD5GkiSR2_WzhHTS8KSpTVg%40mail.gmail.com.
I apologize for any confusion in my earlier post. It can indeed be challenging to explain complex ideas concisely in text. In a nutshell, I work on the simulation of polygenic selection on standing variation in humans. Specifically, I am interested in understanding the dynamics of allele frequencies a based on their initial state and selection coefficients.
I think I have finally managed to make it work, but I would highly appreciate your opinion to see if there is something wrong with my approach that may distort my analysis down the line…
The model I aim for consists of:
· Population Initialization: A single population is created with L independent, biallelic sites. The recombination rate is set at 0.5. Each site will essentially have one mutation at the population level — no mutation in an individual's genome represents the "ancestral" allele, while a mutation represents the "derived" allele.
· Randomized Selection Coefficients: Mutations are assigned random selection coefficients at each independent site. The first half of the chromosome carries 'm1' mutations, which are predominantly neutral QTLs (with minor variations in s values around 0). On the other hand, 'm2' mutations have significantly larger s values.
· Derived Allele Initialization: Derived alleles or mutations start at random frequencies. This initial distribution mirrors what we'd expect at mutation-drift equilibrium under neutrality. Given the independence of loci, there's no need for burn-in or recapitation — which, as previously discussed, posed challenges when enforcing loci to be biallelic using PySLIM. Instead, I've opted for assigning initial allele frequencies based on the anticipated distribution.
· Frequency Allocation: Each genome site is assigned a random allele frequency from a negative exponential distribution. This isn't the actual expected distribution but serves as a placeholder for testing. For instance, if we draw a 0.4 frequency for site A, we would randomly allocate the mutation to 40% of the population's genomes at that site, considering its corresponding selection coefficient. This allocation is replicated across all L sites.
A simplified version of my attempted code looks like:
----------------------------
initialize() {
---------------------------
The interesting bit is in the 1 {} block.
I hope it is much clearer what I am trying to do. Given I need to run a large nº of simulations and the fact that loci are independent allows me to simplify things, this approach is probably more useful than “classic” recapitation, Yet, let me please know if you think it could be made more robust or improved to prevent future mistakes down the line.
Thank you so much and apologies again if you find it still unclear,
Francesc
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/CAH8u8PoomLoPuVQfwkKNa59oQT_3ex9mD3V%3DUhzjzxqddvjw2A%40mail.gmail.com.