Hello.
I have found that vcfeval sometimes returns a matching variant as both false positive and false negative when variants overlap. Note that I am using version: rtg-tools-3.6.1 and I am see the same behaviour both with and without the --ref-overlap option set.
As a simple example I have a "test.vcf" which has just 2 variants in it:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CPCT11111111R CPCT11111111T
19 31097189 . AAAATAAAAT A,* 583.54 PASS AC=1,2;AF=0.250,0.500;AN=4;ANN=A|intron_variant|MODIFIER|ZNF536|ENSG00000198597|transcript|ENST00000592773|protein_coding|1/1|c.169+56769_169+56777delAAATAAAAT||||||WARNING_TRANSCRIPT_INCOMPLETE;BaseQRankSum=-8.700e-02;ClippingRankSum=0.577;DP=59;FS=0.000;MLEAC=1,2;MLEAF=0.250,0.500;MQ=58.67;MQRankSum=2.01;QD=15.36;ReadPosRankSum=-8.570e-01;SOR=0.560;GoNLv5_AC=.,.;GoNLv5_AF=.,. GT:AD:DP:GQ:PL 2/2:0,0,4:4:12:135,135,135,12,12,0 0/1:20,14,0:34:99:478,0,774,539,816,1355
19 31097198 rs201946519 T TA,* 314.54 PASS AC=1,2;AF=0.250,0.500;AN=4;ANN=TA|intron_variant|MODIFIER|ZNF536|ENSG00000198597|transcript|ENST00000592773|protein_coding|1/1|c.169+56777_169+56778insA||||||WARNING_TRANSCRIPT_INCOMPLETE;BaseQRankSum=-6.920e-01;ClippingRankSum=0.083;DB;DP=59;FS=1.478;MLEAC=1,2;MLEAF=0.250,0.500;MQ=58.67;MQRankSum=-3.901e+00;QD=10.15;ReadPosRankSum=-2.023e+00;SOR=0.443;GoNLv5_AC=.,.;GoNLv5_AF=.,. GT:AD:DP:GQ:PGT:PID:PL 2/2:0,0,4:4:12:.:.:135,135,135,12,12,0 0/1:20,7,0:27:99:0|1:31097098_AAAT_A:209,0,783,270,804,1075
Note that there is 9 bases difference between the 2 positions, and the
If I run a vcfeval command on this file compared to itself, ie.:
rtg vcfeval -b test.vcf.gz -c test.vcf.gz -o TESTcompare -t ../SDF --ref-overlap --sample=CPCT11111111R
I expect to get 2 true-positives, but instead I get:
Threshold True-pos False-pos False-neg Precision Sensitivity F-measure
----------------------------------------------------------------------------
12.000 1 1 1 0.5000 0.5000 0.5000
None 1 1 1 0.5000 0.5000 0.5000
--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
Visit this group at https://groups.google.com/a/realtimegenomics.com/group/rtg-users/.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CPCT11111111R CPCT11111111T
1 22757877 . GTATA G,GTA 2779.21 PASS AC=1,3;AF=0.250,0.750;AN=4;BaseQRankSum=1.27;ClippingRankSum=1.58;DP=88;FS=0.000;MLEAC=1,3;MLEAF=0.250,0.750;MQ=60.00;MQRankSum=0.992;QD=25.90;ReadPosRankSum=1.50;SOR=1.342;GoNLv5_AC=.,.;GoNLv5_AF=.,. GT:AD:DP:GQ:PL 2/2:1,1,12:14:9:523,303,341,20,9,0 1/2:1,7,35:45:99:2282,1163,1051,355,0,199
1 22757881 rs61769183 A G 1328.21 PASS AC=3;AF=0.750;AN=4;BaseQRankSum=0.037;ClippingRankSum=0.647;DB;DP=87;FS=4.743;MLEAC=3;MLEAF=0.750;MQ=60.00;MQRankSum=-3.540e-01;QD=21.77;ReadPosRankSum=2.35;SOR=0.225 GT:AD:DP:GQ:PL 1/1:1,13:14:7:342,7,0 0/1:12,35:47:99:1015,0,253
[peter@hmf_datastore GIAB1GIAB2compare]$ rtg bgzip test3.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CPCT11111111R CPCT11111111T1 22757877 . GTATA G,GTA 2779.21 PASS AC=1,3;AF=0.250,0.750;AN=4;BaseQRankSum=1.27;ClippingRankSum=1.58;DP=88;FS=0.000;MLEAC=1,3;MLEAF=0.250,0.750;MQ=60.00;MQRankSum=0.992;QD=25.90;ReadPosRankSum=1.50;SOR=1.342;GoNLv5_AC=.,.;GoNLv5_AF=.,. GT:AD:DP:GQ:PL 2/2:1,1,12:14:9:523,303,341,20,9,0 1/2:1,7,35:45:99:2282,1163,1051,355,0,199
1 22757881 rs61769183 A G 1328.21 PASS AC=3;AF=0.750;AN=4;BaseQRankSum=0.037;ClippingRankSum=0.647;DB;DP=87;FS=4.743;MLEAC=3;MLEAF=0.750;MQ=60.00;MQRankSum=-3.540e-01;QD=21.77;ReadPosRankSum=2.35;SOR=0.225 GT:AD:DP:GQ:PL 1/1:1,13:14:7:342,7,0 0/1:12,35:47:99:1015,0,253
[peter@hmf_datastore GIAB1GIAB2compare]$ rtg bgzip test3.vcf
the variant at pos = 22757877 shows as both false positive and false negative when I compare this file to itself, again using --ref-overlap. Any ideas on this one?
I see the same problem, in a case where there should be no conflict, just heterozygous non-ref alleles, e.g.chr21 24801607 . G T 523.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.789;ClippingRankSum=-0.477;DP=24;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=0.477;QD=21.82;ReadPosRankSum=-0.894;SOR=0.730 GT:AD:DP:GQ:PL 0/1:9,15:24:99:552,0,295
chr21 48016066 . ACG A 626.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.583;ClippingRankSum=0.617;DP=36;ExcessHet=3.0103;FS=7.768;MLEAC=1;MLEAF=0.500;MQ=60.00;MQRankSum=-0.217;QD=17.91;ReadPosRankSum=-0.683;SOR=1.865 GT:AD:DP:GQ:PL 0/1:20,15:35:99:664,0,818If I rerun vcfeval on fp vs. fn, then it filters it down to just the conflicting cases (e.g. where it calls 3 overlapping variants)Ravi
--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+unsubscribe@realtimegenomics.com.