Understanding TPC>TPB

30 views

Skip to first unread message

Luka Topalovic

unread,

Oct 14, 2023, 11:45:06 PM10/14/23

to RTG Users

Hi, I tried comparing Deepvariant (long read sequencing) vs GATK HC (short read sequencing) on HG002 sample, where HC is baseline and Deepvariant is call VCF:

./rtg-tools-3.12.1/RTG.jar vcfeval -b HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz -c HG002.m84011_220902_175841_s1.GRCh38.deepvariant.phased.filtered.pass.vcf.gz -t GRCh38.sdf -o WGS_vs_Revio

This is what I got:

Threshold True-pos-baseline True-pos-call False-pos False-neg Precision Sensitivity F-measure
----------------------------------------------------------------------------------------------------
29.000 3902625 3903326 294655 145612 0.9298 0.9640 0.9466
None 4035537 4037948 1202303 12700 0.7706 0.9969 0.8692

I understand that TPB represents number of calls in HC, with applied filtering for long indels, GT=0/0, etc, but how can number of TPC be larger than TPB? I am attaching log file if that helps.

Thanks for all the help,
Luka

vcfeval.log

Sean Irvine

unread,

Oct 15, 2023, 5:28:56 PM10/15/23

to Luka Topalovic, RTG Users

Hi Luka,

The simple answer is that callers can represent variants differently and there is not necessarily a one-to-one relationship between them. In your case what it means is the DeepVariant has a tendency to decompose valls slightly more than the baseline. It is partly because of this "problem" that a tool like vcfeval is necessary to do accurate comparisons. As a trivial example, one caller might represent a MNP as a single event while another might split it into individual SNPs. Things get much more complicated when you consider indels.

There is more detail in the user manual.

Sean.

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/05aaec47-78c7-44c9-8670-65237921c637n%40realtimegenomics.com.

Reply all

Reply to author

Forward

0 new messages