Hi Scott. Several issues here:
- It sounds like maybe you don't need a VCF file specifically; you write that "I'm using outputVCFsample since it's very important to sample individuals and not genotypes". You can use the lower-level Genome methods output() or outputMS() to output any vector of genomes you wish; you can easily arrange that the vector of genomes consists of pairs of genomes belonging to individuals, for example. If you draw a sample of individuals from a subpopulation with, say, "i_sample = sample(p1.individuals, 10, replace=F);", you can then get the genomes for those individuals with i_sample.genomes (arranged as the first and second genomes of the first individual, then the first and second genomes of the second individual, etc.; this is guaranteed by how Eidos handles vectorized property access, see the Eidos manual), and output them with i_sample.genomes.output(), for example. This sort of flexibility in the sample to be output is the reason for the existence of the low-level methods on Genome.
- It sounds like you want to output a sample from each of several subpopulations and you're concerned about matching up the information in those samples. It might be easier, perhaps, for you to output everything in one go. Again, the lower-level methods on Genome will allow you to do that, since you can call them on any vector of genomes. You could draw samples of individuals from each subpop with sample(), glom them together into one vector with c(), and then call sample_vector.genomes.output() – or outputMS(), or outputVCF() – on that one vector. If you are sampling a fixed number of individuals from each subpop, then you already know which subpop each genome comes from, in that glommed-together output; if not, you might want to output the subpopulation indices for the sampled genomes separately.
- To answer your original question directly: whether you are outputting in SLiM format, MS format, or VCF format, all segregating mutations will be included in the output even if they are fixed within the subpopulation or within the sample being output. SLiM only considers mutations "fixed" if they are fixed population-wide; it does not generally check for, or care about, fixation within smaller demographic units. (For MS output specifically, a "filterMonomorphic" flag is supported that allows mutations that are present in every sampled genome to be omitted from the output (see the doc); that option is not provided for the other output methods, only for MS output, and it is false by default.) So there should be no need to do the matching up with outputMutations() that you describe, I think. If mutations have fixed population-wide and have been replaced by Substitution objects, on the other hand, they will not be included in the output of any of these methods; outputFixedMutations() must be used to get them.
I hope this helps; let me know if you have further questions. Also, if you haven't noticed it already, section 13.8 in the current SLiM manual shows a recipe for using SLiM inside an ABC estimation procedure. I don't know very much about such matters, however; if you have suggested improvements for that recipe, they would be most welcome. Good luck!
Cheers,
-B.
Benjamin C. Haller
Messer Lab
Cornell University