Hi Chase. Interesting puzzles. :->
Here's a model that I think solves these issues for you:
initialize() {
setSeed(6);
defineConstant("L", 1e5);
defineConstant("ACGT", c("A", "C", "G", "T"));
initializeSLiMOptions(nucleotideBased=T);
initializeAncestralNucleotides(randomNucleotides(L));
initializeMutationTypeNuc("m1", 0.5, "f", 0.0);
initializeGenomicElementType("g1", m1, 1.0, mmJukesCantor(1e-7));
initializeGenomicElement(g1, 0, L-1);
initializeRecombinationRate(1e-8);
}
1 {
sim.addSubpop("p1", 250);
sim.addSubpop("p2", 250);
p1.setMigrationRates(p2, 0.01);
p2.setMigrationRates(p1, 0.01);
}
mutation() {
mut.setValue("O", originalNuc);
return T;
}
5000 late() {
// sort mutations by position
muts = sim.mutations;
ord = order(muts.position);
muts = muts[ord];
pos = muts.position;
catn("segregating mutations:");
for (mut in muts)
catn(" " + mut.position + ": " + ACGT[mut.getValue("O")]
+ " -> " + mut.nucleotide);
catn();
catn("fixed mutations:");
for (sub in sim.substitutions)
catn(" " + sub.position + ": " + ACGT[sub.getValue("O")]
+ " -> " + sub.nucleotide);
catn("positions with two segregating mutations:");
pos = sort(muts.position);
pos_len = length(pos);
pos_dup = (pos[0:(pos_len - 2)] == pos[1:(pos_len - 1)]);
dups = pos[0:(pos_len - 2)][pos_dup];
for (dup in dups)
{
dup_muts = muts[muts.position == dup];
catn(" position " + dup + " (" + size(dup_muts) + "
mutations):");
for (subpop in sim.subpopulations)
{
p1g = subpop.individuals.genomes;
p_nucs = p1g.nucleotides(dup, dup, "integer");
p_counts = nucleotideCounts(p_nucs);
catn(" p" +
subpop.id + ": " + paste(ACGT + "=" +
p_counts));
}
}
}
The mutation() callback simply saves originalNuc in a key named "O" on
the mutation. If you're not using the tag property, saving it in tag
would be faster, but setValue() works even if you're using tag for other
purposes. Since originalNuc is an integer 0/1/2/3, I define a vector
named ACGT that can be used to look up the corresponding nucleotide
letter for output.
The output event demonstrates several things. First it sorts all the
segregating mutations by position, which is nice both for outputting
them, and is used later to find duplicate positions. Then it outputs
those sorted mutations, using the saved value in "O" to show what the
mutation was a mutation from/to. Then it prints out the fixed
mutations, which is the same (didn't bother sorting them), but shows
that the "O" key-value pair gets carried over from the Mutation to the
Substitution when substitution occurs. Finally, it finds positions that
have more than one mutation segregating (sorting and then comparing the
vector to itself shifted over 1 is a good fast way to find duplicates),
and loops over those positions, and prints the nucleotide counts for
each such position in each subpopulation. The code there does
vectorize, so it's reasonably fast. Probably the way it looks up all
the genomes for each subpopulation, with "p1g =
subpop.individuals.genomes", for each duplicate, over and over, is a
significant time sink, but I didn't bother profiling the model in
SLiMgui; if that does turn out to be slow, it could be moved outside the
loop with a little more work, particularly if you know ahead of time
what subpopulations exist (rather than this code that uses
sim.subpopulations to be general). The ACGT vector is useful for this
output formatting, too.
The setSeed(6) value is chosen because it results in two duplicate
positions, one of which has two segregating mutations to different
nucleotides (separate mutational lineages, and genetically different),
the other of which has two segregating mutations to *different*
nucleotides (separate mutational lineages, but genetically identical):
positions with two segregating mutations:
position 62376 (2 mutations):
p1: A=480 C=7 G=0 T=13
p2: A=491 C=0 G=0 T=9
position 70918 (2 mutations):
p1: A=0 C=35 G=0 T=465
p2: A=0 C=19 G=0 T=481
Earlier in the output you can easily find the mutations in question,
since they are output in sorted order:
62376: A -> C
62376: A -> T
...
70918: T -> C
70918: T -> C
Of course you could also output the generation the mutations occurred,
etc. I hope this helps! I believe this code is correct, but of course
always check and validate code that you get off of the internet.
;-> Happy modeling!
Cheers,
-B.
Benjamin C. Haller
Messer Lab
Cornell University
Chase W. Nelson wrote on 11/22/21 2:15 PM: