Discrepancies in variants between input and output VCFs

6 views
Skip to first unread message

Tam Ta

unread,
Feb 24, 2026, 6:12:38 PM (7 days ago) Feb 24
to slim-discuss
Hello Dr. Haller,

I am using SLiM to simulate offspring from specified parent pairs and calculate genetic diversity metrics from the output VCF. I'm running into an issue where the input VCFs have 1,816,054 variants, but the output offspring VCF only has 727,752 variants. The output VCF of one of the parents also has less variants than its input. I thought this may be due to fixed alleles, so I included sim.outputFixedMutations(), but there is nothing printed after simulation. What else could explain the loss of variants? Please see the attached file for my code. Thank you in advance for your time!

Sincerely,
Tam Ta
offspring_sim

Peter Ralph

unread,
Feb 24, 2026, 7:27:52 PM (7 days ago) Feb 24
to slim-d...@googlegroups.com
Hello Tam - SLiM only outputs variants in the VCF that are segregating. So, variants that have been *lost* won't appear either. Could this explain it? And, different samples might have different sets of sites segregating, so outputting a VCF from different sets of individuals (eg one of the original generation) will in general report variants at different numbers of sites.

-peter

--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/3009ca33-f400-4706-8285-3916dba4daaen%40googlegroups.com.

Ben Haller

unread,
Feb 24, 2026, 8:51:13 PM (7 days ago) Feb 24
to slim-d...@googlegroups.com
Hi Tam!

A variation on Peter's theme:

It looks like you're reading in a VCF for an individual in p1, and another VCF for an individual in p2, and then crossing those individuals to produce an offspring in p3 that you then output to VCF.  You should certainly not expect that offspring individual to contain all of the mutations from the p1 and p2 individuals; each parent will pass down only half of its genetic information, and by chance that might in fact be fewer than half of the segregating variants that the parent contains.

I would also note that your reproduction() callback will be called twice (since there is no subpop specifier on it, it will be called for both the p1 individual and the p2 individual), so you'll probably end up with two offspring in p3, which is probably not what you want.  :->

If further discrepancies exist that don't make sense to you, I'd suggest closely examining the input and output VCF files to try to see exactly which mutations are responsible for the discrepancy; that often clarifies things.  Also, sometimes VCF processing tools are inaccurate, especially if they're confused by the VCF format that SLiM outputs; so verifying by hand that your VCF tools are doing what you expect them to be doing is an important check.

Finally, I generally ask new users this: have you taken the SLiM workshop?  It's a good way to get familiar with the basics, and it's available for free online if you can't make it to an in-person session.

In any case, happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Tam Ta

unread,
Feb 28, 2026, 10:46:58 AM (3 days ago) Feb 28
to slim-discuss
Hi Peter and Ben,

Thank you for your thoughts! I have gone through the online workshop briefly but I'm a novice at coding so I admit much of it went over my head. I have altered the code to fix some of the issues you mentioned (see attached). The new output VCFs retained more of the variants than before. I've investigated the missing variants and they all stem from loci where both parents are 0|0. However, loci where both parents are 1|1 are retained as 1|1 in the offspring. Why do the 1|1 not come up in the simulation output with sim.outputFixedMutations()? And why are the 0|0 lost? Also, how does SLiM generate new genotype in the offspring for loci where both parents are NA (aka missing genotype)? Apologies for the barrage of questions. I really appreciate your help.

Sincerely,
Tam Ta
offspring_sim

Ben Haller

unread,
Feb 28, 2026, 12:12:24 PM (3 days ago) Feb 28
to slim-d...@googlegroups.com
Hi Tam!


Thank you for your thoughts! I have gone through the online workshop briefly but I'm a novice at coding so I admit much of it went over my head.

I would recommend that you spend the time to get up to speed.  Studying the R language might be helpful (for SLiM, and for grad school in general actually), and there are tons of R tutorials and such online.  I guarantee that it will pay off for you.  I'd also recommend that you go back to the Eidos language section of the workshop and do it again, very slowly, and really work to solidify those concepts.  I see that you're signed up for a SLiM workshop this summer.  People who are novices at coding often struggle in the workshop; so that you don't waste your time/money, I would urge you to get more familiar with coding in Eidos before the workshop.


I have altered the code to fix some of the issues you mentioned (see attached).

Given the comment "// offspring", I'm guessing that you intend to place the offspring in p2, but in fact you place them in p1 and never use p2.  Your chromosome creation code would also be *much* shorter if you used a for loop; see recipe 8.3.5 for an example.


The new output VCFs retained more of the variants than before. I've investigated the missing variants and they all stem from loci where both parents are 0|0. However, loci where both parents are 1|1 are retained as 1|1 in the offspring. Why do the 1|1 not come up in the simulation output with sim.outputFixedMutations()?

Your model is a nonWF model, and mutations do not become Substitution objects at fixation unless the concertToSubstitution property is set to T, in nonWF models.  This fact, and the reasons for it, are covered pretty extensively in the workshop.  The outputFixedMutations() in fact outputs Substitution objects (and could thus be better-named, but that ship sailed a very long time ago); there are no Substitution objects in your simulation.


And why are the 0|0 lost?

Well, if both parents are 0|0, and the offspring is thus 0|0, does the mutation exist at all, in fact?  Probably the mutation was never even created.  Mutations listed as 0|0 for all individuals in the VCF will not be created, I think.  But even if the mutation was created, if it is 0|0 in all of the individuals being output it will not be included in the output, I think; outputIndividualsToVCF() only emits mutations that are segregating in the sample, if I recall correctly.

Also, how does SLiM generate new genotype in the offspring for loci where both parents are NA (aka missing genotype)?

NA is not allowed in SLiM.  Either a haplotype contains a given mutation or it does not; there is no indeterminate state, no missing data, and no imputation done by SLiM.  The VCF file you load needs to specify every genotype.


Apologies for the barrage of questions. I really appreciate your help.

I'm happy to help, and I hope that my answers above are helpful – but I don't usually have the time to answer questions that are covered in the workshop and the manual.  I realize that you're signed up for the workshop this summer, and so it wouldn't really make sense for you to do the whole SLiM Workshop with the online materials right now.  Nevertheless, please understand that I can't help you with novice questions nonstop in the intervening months.  :->  You might want to wait until you take the workshop to try to get into SLiM; or you might want to spend some more time with the online workshop materials and the manual, and try to answer your questions yourself more before asking here.  And as I wrote, you certainly want to become more proficient at coding before the workshop, so that you get what you want to get out of the workshop.  :->  That said, I don't mind a question here now and then; of course that is what this list is for.

Good luck, and happy modeling!

Cheers,
-B.

Reply all
Reply to author
Forward
0 new messages