Heng Li
unread,Apr 29, 2019, 5:02:31 PM4/29/19Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to caw...@gmail.com, RTG Users, kirti...@gmail.com, l...@realtimegenomics.com
The number of true negatives can't be calculated exactly, but it can be estimated to great accuracy – it's the length of confident regions minus the number of true positives. For human, the rate of variant is ~0.1%. The estimate above is at most 0.1% off the true value. In practice, we almost never care about this ~0.1% difference.
Precision is a misleading measurement. It is greatly affected by the number of true positives and is thus meaningless across datasets. For example, if someone tells me the precision is 99% in GIAB, I don't know what's the precision in an African genome (should be higher) or worse, in a cancer genome (should be much lower). In contrast, specificity is largely independent of the number of true positives. If FPR is 1e-5 in European, it is about 1e-5 in African and roughly 1e-5 in a cancer dataset or even in another species. Specificity/FPR more directly tells us the capability of a variant calling pipeline. Furthermore, in practice, what we really care about is specificity, not precision. With FPR=1e-5, we expect 10 errors per megabase. With FDR=1%, we can infer nothing.
Heng
> --
> You received this message because you are subscribed to the Google Groups "RTG Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
rtg-users+...@realtimegenomics.com.
> Visit this group at
https://groups.google.com/a/realtimegenomics.com/group/rtg-users/.