Hi there, I'm seeing some unexpected behavior with VcfEval and overlapping variants. My calls VCF contains two variants, a 2-bp MNV and a SNV at the same position (see below), while the baseline VCF (from GIAB) contains the same variation, but written as a het SNV (A->C) at the first position and hom SNV (C->A) at the second position.
Both the calls and baseline VCF represent equivalent variation (unless I'm missing something), but VcfEval isn't able to detect the matches properly and emits several FPs and FNs in the output.
Calls VCF:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
3 2856015 . AC CA 10 PASS . GT:DP:PS 0|1:43:2856014
3 2856016 . C A 10 PASS . GT:DP:PS 1|0:43:2856014
Baseline VCF (NA12878 GIAB benchmark VCF, INFO field omitted):
3 2856015 rs9846630 A C 50 PASS . GT:PS:DP:ADALL:AD:GQ 0/1:.:840:132,183:164,204:1090
3 60556936 rs2856016 A G 50 PASS . GT:PS:DP:ADALL:AD:GQ 1/1:.:560:0,144:73,291:345
VcfEval's combined output VCF looks like:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT BASELINE CALLS
3 2856015 . A C . . SYNC=2856015;BASE=FN_CA GT 0/1 .
3 2856015 . AC CA . . SYNC=2856015;CALL_WEIGHT=2.00;CALL=FP_CA GT . 0|1
3 2856016 . C A . . SYNC=2856015;BASE=FN_CA;CALL=FP GT 1/1 1|0
and the summary.txt indicates 2 false positive and 2 false negative alleles, which seems confusing because I think both VCFs represent the same haplotypes.
Reformatting the calls VCF to look like the baseline VCF resolves the issue, but that's not trivial to do in general. Are there options or strategies to get VcfEval to recognize these as equivalent?
Thanks in advance!