subpopulation/individual info on exported VCF?

29 views
Skip to first unread message

font...@gmail.com

unread,
Apr 29, 2025, 11:04:24 AMApr 29
to slim-discuss
Hi Ben and all,

First, thanks for SLiM5!

Ben, I have this model with more than one subpopulation. I want to export a vcf file from SLiM to be further transformed to a genlight in R for downstream analyses.

On my previous code, I had VCFs being exported for each subpopulation which was combined a posteriori:
(something along these lines:)
if (sim.cycle > burnin)
{
if (logclock == sampfr)
{
for (l in community.logFiles)
{
l.logRow();
}
for (b in sim.subpopulations)
{
b.genomes.outputVCF(paste0(directory,"/VCF_p"+b.id+"_"+idtag+"_g"+sim.cycle+".vcf"), outputMultiallelics = F);
}
defineGlobal("logclock", 0);
}
}

Which worked, just required some extra work after to combine, make sure they had the same size, etc.

Now as I am updating my code to SLiM5, I wonder if there is anyway I can export a single vcf for individuals from all subpopulations and be able to know who is  who when I process the vcf later on.

I have tried the following:

sim.subpopulations.haplosomes.outputHaplosomesToVCF(filePath="/PATH/slim5/test.vcf" ,outputMultiallelics=F, simplifyNucleotides=T, groupAsIndividuals=T);


Which does produce a single vcf with all the individuals at the time of export (e.g. p1.individualCount = 81, p2.individualCount = 102, VCF has 183 individuals).
But I can't seem to find any information about who comes from where in the vcf/genlight later on.

In R, when I transform this test VCF into genlight and I ask for individual names, I get i0, i1, i2,...,i182 (total 183 individuals), with no subpop info.

Is it safe to assume that the first 81 individuals (i0 to i80) come from p1 and the following 102 inds (i81 to i182) come from p2? Is that how the inds in the VCF file produced by SLiM are organized? If so, that is straight forward to figure out later on who comes from where.

There is also a way to append the subpop info to the ind name?

Thanks Ben!

Cheers

JP


Ben Haller

unread,
Apr 30, 2025, 2:59:04 AMApr 30
to font...@gmail.com, slim-discuss
Hi JP!

Actually, I think the new outputIndividualsToVCF() method should be exactly what you're looking for.  See section 28.3.2 for the output format, and section 8.3.7 for an example of its usage.

Regarding the question you ask, yes, the assumption you state would be safe to make.  For outputHaplosomesToVCF(), the haplosomes are output in the order in which they are provided.  However, using outputIndividualsToVCF() should be much cleaner than outputHaplosomesToVCF(), for your purposes.  Please ask if you hit any snags; you're perhaps the first person to be using these new APIs, so I'd like to hear if you experience any problems.  :->

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


font...@gmail.com wrote on 4/29/25 4:04 PM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/1d4b4a86-62e9-4a51-a695-a6bec8179182n%40googlegroups.com.

font...@gmail.com

unread,
May 1, 2025, 4:49:55 PMMay 1
to slim-discuss
Hi Ben,

Haha, glad to try new features! This one in particular (outputIndividualsToVCF), is kinda like you read my mind. It does work well, I can see the subpopulation info in the VCF as well as in the individual names.

The output vcf however behaved differently for me in two instances outside of SLiM:

When reading it in R using vcfR, the ploidy information comes empty. So if I imported the vcf with vcfR and then transformed to a genlight, I had to manually add the ploidy (2) to the object.

However when importing with dartR, the ploidy info is there. Both approaches didn´t automatically sort the subpopulation info into populations, I had to do that manually after too.

I will keep you in the loop if I face any issues.

Thanks Ben!

Cheers

Ben Haller

unread,
May 2, 2025, 3:37:38 AMMay 2
to font...@gmail.com, slim-discuss
Hi JP!  Glad to hear the new outputIndividualsToVCF() method is a good fit for you!  :->  It's really the way output ought to have worked all along, but class Individual didn't even exist in SLiM back when the original output methods were designed; things have evolved quite a bit over the years!

I don't know what vcfR or dartR might be looking for that they're not finding.  The VCF standard itself is big and poorly described, and then there's no end of third-party additions and extensions to it (such as the VCF tags that SLiM itself adds!); it's rather a mess, frankly.  If you see a problem with SLiM's compliance with the official VCF spec itself, that would be a bug that I'd certainly want to fix; please file an issue on such problems.  If the problem is that SLiM is not writing out some optional VCF tag that a particular third-party package defines/uses, but that is not part of the official VCF spec itself, that wouldn't be a bug so much as a feature request, but I'd be happy to consider such extensions if they would be useful; again, please file an issue.  In either case, your issue should state exactly what the problem is and what the fix would look like, showing a specific example of the change to the VCF output format that you want.

Thanks, and happy modeling!


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


font...@gmail.com wrote on 5/1/25 9:49 PM:
Reply all
Reply to author
Forward
0 new messages