Hi all,
First of all, thank you so much for the fantastic work done with SLiM and the documentation!
Now, I just started to use SLiM and maybe this is a naive question since I am still a beginner, but I want to simulate a population of ~ 50k (or more) individuals with the same variants, similar LD patterns and MAF as in the 1KGP EUR population.
For that, what I did was to use a nucleotide-based model and used the HapMap recombination map and a filtered VCF from the 1KGP that include European samples and variants filtered by MAF>0.01. I simulated a neutral evolutionary scenario over several generations and then I produce an VCF as output. My idea was to obtain a proxy population with similar allele frequencies and LD patterns (not exactly the same, of course, but comparable).
The main issue is that this output VCF has a lot of related people, since I guess the small pool of 500 individuals make a limited source of haplotypes, and therefore the final samples are related, whereas what I need is a large set of unrelated individuals.
What I am asking may be impossible to obtain, but I don't know if there is a way to do that, like increasing the recombination rate at the expense of a lower LD, or including some mutation rate for the 1KGP variants.
Secondly, I may not understand the function outputVCFSample, but if my population has a sample size of 3k individuals, how is it possible for me to extract 10k individuals in the last generation? I though that setSubpopulationSize fixes the number of individuals in the offspring and therefore the final dataset should have at most 3k individuals, but I can extract a higher number of individuals.
Thank you in advance!
Ion Lerga
Script:initialize() {
initializeSLiMOptions(nucleotideBased=T);
length = initializeAncestralNucleotides("PATH/fasta_hg19/hs37d5_chr4.fa");
defineConstant("L", length);
initializeMutationTypeNuc("m1", 0.5, "f", 0.0);
initializeMutationTypeNuc("m2", 0.5, "f", 0.0);
m2.convertToSubstitution = F;
initializeGenomicElementType("g1", m1, 1.0, mmJukesCantor(0.0)); // mmJukesCantor(0.0))
initializeGenomicElement(g1, 0, L-1);
// HapMap recombination map
lines = readFile("PATH/haps_1kgp/genetic_map_chr4_combined_b37.txt");
lines = lines[1:(size(lines)-1)];
rates = NULL;
ends = NULL;
for (line in lines) {
components = strsplit(line, " ");
ends = c(ends, asInteger(components[0]));
rates = c(rates, asFloat(components[1]));
}
ends = c(ends[1:(size(ends)-1)] - 2, length-1);
rates = rates * 1e-8;
initializeRecombinationRate(rates, ends);
}
1 late() {
sim.addSubpop("p1", 503);
p1.genomes.readFromVCF("PATH/chr4.1kgp.eur.vcf", m1);
p1.setSubpopulationSize(1000);
}
2:50 late() {
catn(sim.generation);
}
// 2:50 {
// newSize = asInteger(p1.individualCount + 500);
// p1.setSubpopulationSize(newSize);
// }
50 late() { p1.outputVCFSample(500, filePath="PATH/Results/SLiM.simpop.4.vcf"); }