VcfEval and overlapping vars

91 views
Skip to first unread message

Brendan O'Fallon

unread,
Jun 9, 2023, 4:44:04 PM6/9/23
to RTG Users
Hi there, I'm seeing some unexpected behavior with VcfEval and overlapping variants. My calls VCF contains two variants, a 2-bp MNV and a SNV at the same position (see below), while the baseline VCF (from GIAB) contains the same variation, but written as a het SNV (A->C) at the first position and hom SNV (C->A) at the second position.


Both the calls and baseline VCF represent equivalent variation (unless I'm missing something), but VcfEval isn't able to detect the matches properly and emits several FPs and FNs in the output.


Calls VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
3 2856015 . AC CA 10 PASS . GT:DP:PS 0|1:43:2856014
3 2856016 . C A 10 PASS . GT:DP:PS 1|0:43:2856014

Baseline VCF (NA12878 GIAB benchmark VCF, INFO field omitted):
3 2856015 rs9846630 A C 50 PASS . GT:PS:DP:ADALL:AD:GQ 0/1:.:840:132,183:164,204:1090
3 60556936 rs2856016 A G 50 PASS . GT:PS:DP:ADALL:AD:GQ 1/1:.:560:0,144:73,291:345

VcfEval's combined output VCF looks like:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BASELINE CALLS
3 2856015 . A C . . SYNC=2856015;BASE=FN_CA GT 0/1 .
3 2856015 . AC CA . . SYNC=2856015;CALL_WEIGHT=2.00;CALL=FP_CA GT . 0|1
3 2856016 . C A . . SYNC=2856015;BASE=FN_CA;CALL=FP GT 1/1 1|0

and the summary.txt indicates 2 false positive and 2 false negative alleles, which seems confusing because I think both VCFs represent the same haplotypes.

Reformatting the calls VCF to look like the baseline VCF resolves the issue, but that's not trivial to do in general. Are there options or strategies to get VcfEval to recognize these as equivalent?
Thanks in advance!

Len Trigg

unread,
Jun 9, 2023, 4:53:14 PM6/9/23
to Brendan O'Fallon, RTG Users
Hi Brendan,

So, the default vcfeval interpretation of the meaning of a 0 in a GT is that the caller is asserting the bases of the corresponding haplotype are actually same as reference (the same as how a 1 asserts the bases are the same as the ALT), however to get the haplotype to match you need the interpretation of a 0 needs to be "I'm actually not making a statement about these haplotype bases", for there not to be a conflict between the two variants. If you use the --ref-overlap option it should do what you want.

Cheers,
Len.

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/06bb470f-3cd3-4d4e-8084-8e74865e3068n%40realtimegenomics.com.

Brendan O'Fallon

unread,
Jun 9, 2023, 5:22:29 PM6/9/23
to RTG Users, Len Trigg, RTG Users, Brendan O'Fallon
 Indeed that did the trick! Thanks for the help. Replacing the 0 allele in the GT field with a '.' also worked.
 Thank you!
Reply all
Reply to author
Forward
0 new messages