How to calculate True negative (TN) using rtg vcfeval tool ?

KIRTI BHADHADHARA

unread,

Feb 12, 2019, 5:07:46 PM2/12/19

to l...@realtimegenomics.com, RTG Users, Sean Irvine

Respected,

I used rtg vcfeval tool. It gives FP, FN, TP but does not give TN.

How to calculate TN using rtg vcfeval tool ?

I am working on human data (vcf file) having reference as hg19

Thank You.

KIRTI

Sean Irvine

unread,

Feb 12, 2019, 5:33:25 PM2/12/19

to KIRTI BHADHADHARA, Len Trigg, RTG Users

Computing true negatives is not really possible with vcfeval (or other tools for that matter) for a couple of reasons.

First, the truth sets typically do not include any indication of the set of negatives (the closest we may get is in the cases of truth sets which contain a set of confidence regions, where it can be assumed that no other variants may be present inside the specified regions).

Second, the set of true-negatives is actually infinite, for example, at any point on the genome you can imagine an arbitrary insertion of any number of bases. The fact that a given caller, did not produce this insertion would then be a true-negative. More generally, any particular call could occupy multiple reference bases.

Sean.

KIRTI BHADHADHARA

unread,

Feb 12, 2019, 11:06:20 PM2/12/19

to Sean Irvine, Len Trigg, RTG Users

Thank you.

caw...@gmail.com

unread,

Apr 29, 2019, 2:50:20 PM4/29/19

to RTG Users, kirti...@gmail.com, l...@realtimegenomics.com

I came here with a similar question, partly because I've always had problems conceiving how TN could be calculated in this context - thanks for the explanation! In the RTG manual introduction it's mentioned that both specificity and sensitivity can be calculated for the evaluation against a truth set. Specificity is not explicitly part of the vcveval output - and without the TN value, how can specificity be calculated?

Heng Li

unread,

Apr 29, 2019, 5:02:31 PM4/29/19

to caw...@gmail.com, RTG Users, kirti...@gmail.com, l...@realtimegenomics.com

The number of true negatives can't be calculated exactly, but it can be estimated to great accuracy – it's the length of confident regions minus the number of true positives. For human, the rate of variant is ~0.1%. The estimate above is at most 0.1% off the true value. In practice, we almost never care about this ~0.1% difference.

Precision is a misleading measurement. It is greatly affected by the number of true positives and is thus meaningless across datasets. For example, if someone tells me the precision is 99% in GIAB, I don't know what's the precision in an African genome (should be higher) or worse, in a cancer genome (should be much lower). In contrast, specificity is largely independent of the number of true positives. If FPR is 1e-5 in European, it is about 1e-5 in African and roughly 1e-5 in a cancer dataset or even in another species. Specificity/FPR more directly tells us the capability of a variant calling pipeline. Furthermore, in practice, what we really care about is specificity, not precision. With FPR=1e-5, we expect 10 errors per megabase. With FDR=1%, we can infer nothing.

Heng

> --
> You received this message because you are subscribed to the Google Groups "RTG Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
> Visit this group at https://groups.google.com/a/realtimegenomics.com/group/rtg-users/.

caw...@gmail.com

unread,

May 13, 2019, 11:15:54 AM5/13/19

to RTG Users, caw...@gmail.com, kirti...@gmail.com, l...@realtimegenomics.com, h...@jimmy.harvard.edu

Heng, thank you for the alternative calculation. Assuming that I interpreted you correctly, my TN value is in the millions (~15Mb high confidence regions - 12.5K true variants) , making my specificity value > 99.99%. Is this in line with your expectations? Admittedly, it looks great on paper, but I have to believe people will start questioning how our benchmarking evaluations are consistently so good!

> To unsubscribe from this group and stop receiving emails from it, send an email to rtg-...@realtimegenomics.com.

Reply all

Reply to author

Forward