Understanding TPC>TPB

30 views
Skip to first unread message

Luka Topalovic

unread,
Oct 14, 2023, 11:45:06 PM10/14/23
to RTG Users
Hi, I tried comparing Deepvariant (long read sequencing) vs GATK HC (short read sequencing) on HG002 sample, where HC is baseline and Deepvariant is call VCF:

./rtg-tools-3.12.1/RTG.jar vcfeval -b HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz -c HG002.m84011_220902_175841_s1.GRCh38.deepvariant.phased.filtered.pass.vcf.gz -t GRCh38.sdf -o WGS_vs_Revio

This is what I got: 

Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
   29.000            3902625        3903326     294655     145612     0.9298       0.9640     0.9466
     None            4035537        4037948    1202303      12700     0.7706       0.9969     0.8692

I understand that TPB represents number of calls in HC, with applied filtering for long indels, GT=0/0, etc, but how can number of TPC be larger than TPB? I am attaching log file if that helps.

Thanks for all the help, 
Luka
vcfeval.log

Sean Irvine

unread,
Oct 15, 2023, 5:28:56 PM10/15/23
to Luka Topalovic, RTG Users
Hi Luka,

The simple answer is that callers can represent variants differently and there is not necessarily a one-to-one relationship between them. In your case what it means is the DeepVariant has a tendency to decompose valls slightly more than the baseline. It is partly because of this "problem" that a tool like vcfeval is necessary to do accurate comparisons.  As a trivial example, one caller might represent a MNP as a single event while another might split it into individual SNPs. Things get much more complicated when you consider indels.

There is more detail in the user manual.

Sean.


--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/05aaec47-78c7-44c9-8670-65237921c637n%40realtimegenomics.com.
Reply all
Reply to author
Forward
0 new messages