// Read in matrix of Amino Acid fitnesses and then format as matrix in SLiM
// The text file must be formatted with a header of Amino Acids where the left to right order matches SLiM amino acid integer IDs from 0-20
// In single-letter AA IDs these are "X A R N D C Q E G H I L K M F P S T W Y V" where X is stop codon
// Rows then should correspond to integer amino acid positions in the given sequence, of course matching intended sequence length
// Cells are filled with selection coefficients for a given amino acid at a given position in sequence
defineConstant('AAFile', readFile(paste(FIT, '.fl', sep = '')));
defineGlobal('AAMatrix', sapply(AAFile, "strsplit(applyValue, sep = ',');", simplify = "matrix"));
// make all stop codons have fitness of 0
AAMatrix[2, 1:L] = 0;
// define fitness matrix
defineConstant('AAFitnesses', asFloat(AAMatrix[2:22, 1:L]));
// Take the mean fitness of all cells in the matrix except stop codons, which have 0
meanFitness = mean(AAFitnesses);
// take the natural log of this value and multiply by the length of sequence to generate a normalizing factor for fitness
defineGlobal('scaleFitness', log(meanFitness) * L);
// define log of all fitnesses and the range of all sites along the genome
defineConstant('logFitnesses', log(AAFitnesses));
defineConstant('genRange', 0:(L - 1));
Then I have a fitness effect like this
fitnessEffect() {
// Loop through all genomes and define nucleotide, codon and amino acid sequences for comparison to previously defined constants
// Define amino acid sequence for both haplotypes of the given individual
AAs1 = codonsToAminoAcids(individual.haploidGenome1.nucleotides(format = 'codon'), long = 0);
AAs2 = codonsToAminoAcids(individual.haploidGenome2.nucleotides(format = 'codon'), long = 0);
return exp(sum(logFitnesses[cbind(AAs1, genRange)] + logFitnesses[cbind(AAs2, genRange)]) - scaleFitness);
}
This is much cleaner overall than the mess I made in my previous implementation, which I showed in one of my comments in the above thread. However, it does run a good bit slower. Doing a simulation of a kind of near-neutral amino acid landscape, 100 individuals up to 1000 generations with rec rate 1e-8, the current approach has it so the fitness calculation can consistently take up about 84% of the sim time (using the runtime test option in SLiMGui) whereas my former messy approach was about 75%. If recombination rate is raised (as we do in the study I'm using this for) that becomes the limiting factor really, so this isn't the worst thing but the fitness could still make a difference of a few hours in practice given some other params I'm interested in. I figure that a key part is doing as little within the fitnessEffect as possible, and I've tried to get everything in initialization when possible, so I figure there must be a replacement for the two cbinds at the end but I'm not sure. Or is there a better way to get the amino acids out of individual haploid genomes? The log scaling stuff is necessary because individuals can have very low fitnesses with the multiplicative scheme I've implemented, which seems pretty well-grounded in theoretical pop gen stuff. I'll probably stick with this approach over my last one anyways for better user-readability even if I can't cut down on time, but figured it's worth asking now before I go down a rabbit-hole like that one!
Cheers,
Nick
--
SLiM forward genetic simulation: http://messerlab.org/slim/
Before posting, please read http://benhaller.com/slim/bugs_and_questions.html
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/02a9f684-4192-4d33-80de-978241725787n%40googlegroups.com.
initialize() {
setSeed(6479552501706741767);
mu = 1e-8;
defineConstant('L', 100);
defineConstant('N', 100);
r = 1e-8;
FIT = "AA_Fitness_Landscape";
initializeSLiMOptions(nucleotideBased = T);
defineConstant('mutMatrix', mmKimura(mu/2, mu/4));
defineConstant('end', L * 3 - 1);
initializeMutationTypeNuc("m1", 0.5, "f", 0.0);
initializeGenomicElementType("g1", m1, 1, mutMatrix);
initializeGenomicElement(g1, 0, end);
initializeRecombinationRate(r);
defineConstant('CodonFile', readFile(paste(FIT, '.sd', sep = '')));
defineGlobal('CodonMatrix', sapply(CodonFile, "strsplit(applyValue, sep = ',');", simplify = "matrix"));
for (pos in 1:L) {
codonVector = c(CodonMatrix[2:65,pos]);
negCodonVector = c(CodonMatrix[2:65,pos] < 0);
codonVector[negCodonVector] = 0;
CodonMatrix[2:65,pos] = codonVector;
}
// redefine stop codons to all be 0
CodonMatrix[50, 1:L] = 0;
CodonMatrix[52, 1:L] = 0;
CodonMatrix[58, 1:L] = 0;
defineGlobal('AncestralSequence', NULL);
for (pos in 1:L) {
defineGlobal('AncestralSequence', paste(AncestralSequence, sample(CodonMatrix[2:65, 0], 1, weights = asFloat(CodonMatrix[2:65,pos])), sep = ''));
}
initializeAncestralNucleotides(AncestralSequence);
// Read in matrix of Amino Acid fitnesses and then format as matrix in SLiM
// The text file must be formatted with a header of Amino Acids where the left to right order matches SLiM amino acid integer IDs from 0-20
// In single-letter AA IDs these are "X A R N D C Q E G H I L K M F P S T W Y V" where X is stop codon
// Rows then should correspond to integer amino acid positions in the given sequence, of course matching intended sequence length
// Cells are filled with selection coefficients for a given amino acid at a given position in sequence
defineConstant('AAFile', readFile(paste(FIT, '.fl', sep = '')));
defineGlobal('AAMatrix', sapply(AAFile, "strsplit(applyValue, sep = ',');", simplify = "matrix"));
// make all stop codons have fitness of 0
AAMatrix[2, 1:L] = 0;
// define fitness matrix
defineConstant('AAFitnesses', asFloat(AAMatrix[2:22, 1:L]));
defineConstant('logFitnesses', log(AAFitnesses));
defineConstant('genomeRange', 0:(L - 1));
// Take the mean fitness of all cells in the matrix except stop codons, which have 0
meanFitness = mean(AAFitnesses[1:20, ]);
// take the natural log of this value and multiply by the length of sequence to generate a normalizing factor for fitness
defineGlobal('scaleFitness', log(meanFitness) * L);
}
fitnessEffect() {
// Loop through all genomes and define nucleotide, codon and amino acid sequences for comparison to previously defined constants
// Define amino acid sequence for both haplotypes of the given individual
AAs1 = codonsToAminoAcids(individual.haploidGenome1.nucleotides(format = 'codon'), long = 0);
AAs2 = codonsToAminoAcids(individual.haploidGenome2.nucleotides(format = 'codon'), long = 0);
// subset logFitnesses by cbound matrices of AAs and range of genome and exponentiate after applying scale factor
print(exp(sum(logFitnesses[cbind(AAs1, genomeRange)] + logFitnesses[cbind(AAs2, genomeRange)]) - scaleFitness));
return exp(sum(logFitnesses[cbind(AAs1, genomeRange)] + logFitnesses[cbind(AAs2, genomeRange)]) - scaleFitness);
}
// create a population of 100 individuals
1 early() {
sim.addSubpop("p1", N);
}
// output samples of 10 haplosomes periodically, all fixed mutations at end
1000 late() { print('done'); }
initialize() {
setSeed(6479552501706741767);
mu = 1e-8;
defineConstant('L', 100);
defineConstant('N', 100);
r = 1e-8;
FIT = "AA_Fitness_Landscape";
initializeSLiMOptions(nucleotideBased = T);
defineConstant('mutMatrix', mmKimura(mu/2, mu/4));
defineConstant('end', L * 3 - 1);
initializeMutationTypeNuc("m1", 0.5, "f", 0.0);
initializeGenomicElementType("g1", m1, 1, mutMatrix);
initializeGenomicElement(g1, 0, end);
initializeRecombinationRate(r);
defineConstant('CodonFile', readFile(paste(FIT, '.sd', sep = '')));
defineGlobal('CodonMatrix', sapply(CodonFile, "strsplit(applyValue, sep = ',');", simplify = "matrix"));
for (pos in 1:L) {
codonVector = c(CodonMatrix[2:65,pos]);
negCodonVector = c(CodonMatrix[2:65,pos] < 0);
codonVector[negCodonVector] = 0;
CodonMatrix[2:65,pos] = codonVector;
}
// redefine stop codons to all be 0
CodonMatrix[50, 1:L] = 0;
CodonMatrix[52, 1:L] = 0;
CodonMatrix[58, 1:L] = 0;
defineGlobal('AncestralSequence', NULL);
for (pos in 1:L) {
defineGlobal('AncestralSequence', paste(AncestralSequence, sample(CodonMatrix[2:65, 0], 1, weights = asFloat(CodonMatrix[2:65,pos])), sep = ''));
}
initializeAncestralNucleotides(AncestralSequence);
// Read in matrix of Amino Acid fitnesses and then format as matrix in SLiM
// The text file must be formatted with a header of Amino Acids where the left to right order matches SLiM amino acid integer IDs from 0-20
// In single-letter AA IDs these are "X A R N D C Q E G H I L K M F P S T W Y V" where X is stop codon
// Rows then should correspond to integer amino acid positions in the given sequence, of course matching intended sequence length
// Cells are filled with selection coefficients for a given amino acid at a given position in sequence
defineConstant('AAFile', readFile(paste(FIT, '.fl', sep = '')));
defineGlobal('AAMatrix', sapply(AAFile, "strsplit(applyValue, sep = ',');", simplify = "matrix"));
// make all stop codons have fitness of 0
AAMatrix[2, 1:L] = 0;
// define fitness matrix
defineConstant('AAFitnesses', asFloat(AAMatrix[2:22, 1:L]));
defineConstant('logFitnesses', log(AAFitnesses));
defineConstant('genomeRange', rep(0:(L - 1),2));
// Take the mean fitness of all cells in the matrix except stop codons, which have 0
meanFitness = mean(AAFitnesses[1:20, ]);
// take the natural log of this value and multiply by the length of sequence to generate a normalizing factor for fitness
defineGlobal('scaleFitness', log(meanFitness) * L);
}
// create a population of 100 individuals
1 early() {
sim.addSubpop("p1", N);
}
// output samples of 10 haplosomes periodically, all fixed mutations at end
1:1000 late() {
x = sapply(p1.individuals, "sum(logFitnesses[cbind(codonsToAminoAcids(applyValue.haplosomes.nucleotides(format='codon'), long = 0), genomeRange)]);");
print(exp(x - scaleFitness));
p1.individuals.fitnessScaling = exp(x - scaleFitness);
}
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/0088b7dd-12f4-4337-ab73-801bbfaa528bn%40googlegroups.com.
defineGlobal('AAFitnesses', NULL);
for (pos in 1:L) {
defineGlobal('AAFitnesses', c(AAFitnesses, asFloat(AAMatrix[2:22,pos])));
}
defineConstant('logFitnesses', log(AAFitnesses));
// Take the mean fitness of all cells in the matrix except stop codons, which have 0
meanFitness = mean(asFloat(AAMatrix[3:22, 1:L]));
// take the natural log of this value and multiply by the length of sequence to generate a normalizing factor for fitness
defineGlobal('scaleFitness', log(meanFitness) * L);
// Given that length L is a 1-indexed position, the vector 1:L must be subtracted by 1
// Then this value is multipled by 21 (the number of amino acids SLiM defines), and added to the AA value in the fitnessScaling below returns the correct selection coefficient position in the logFitnesses vector
// this must be replicated twice for both haplotypes in an individual
defineConstant('AAIndex', rep((1:L - 1) * 21, 2));
}
// create a population of 100 individuals
1 early() {
sim.addSubpop("p1", N);
}
1:1000 late() {
// Given that logFitnesses for an individuals is 0-indexed vector of all the amino acids (in SLiM integer IDs) in the individual, they will correspond to the defined vector order
// For example, the selection coefficient of Arginine (Long ID, Arg; Short ID, R; Integer ID, 2) at position 5 can be obtained like so:
// AA = 2, Pos = 5, so 2 + (5 - 1) * 21 = 86. Therefore 86 is the correct index in the vector for this selection coefficient
// All values obtained this way will form a vector that is converted to a float, all values in vector are added to 1, which obtains fitness values for all amino acids separately, then the product works as a multiplicative function to produce fitness for a given individual
AAs = sapply(p1.individuals, "sum(logFitnesses[codonsToAminoAcids(applyValue.haplosomes.nucleotides(format='codon'), long = 0) + AAIndex]);");
p1.individuals.fitnessScaling = exp(AAs - scaleFitness);
}
I'm also attaching two images of something like the first minimal script I posted here to see relatively how slow codonsToAminoAcids is, with the same parameters as above (e.g. pop size = 100 and sequence length = 300 nucleotides etc.). The first image implements the fitness effect separate from the codonsToAminoAcids work and the second image just returns fitness of 1 to see how that function does in relative isolation. Definitely seems like it can make a difference in isolation but otherwise I don't know how best to test that kind of thing.
Cheers,
Nick
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/9910cd53-f302-4174-b52e-8949639c0c1cn%40googlegroups.com.
defineGlobal('AAFitnesses', NULL);
for (pos in 1:L) {
defineGlobal('AAFitnesses', c(AAFitnesses, asFloat(AAMatrix[2:22,pos])));
}
 	defineConstant('logFitnesses', log(AAFitnesses));
can now (with SLiM 5.1) just be
AAFitnesses = asVector(asFloat(AAMatrix[2:22,1:L]));