Hi Felix,
You can find the version using "
rtg version".
In order to categorize variants as SNPs or indels, either the baseline representation or the calls representation could be used. These two representations can be quite different. For example, a SNP in one representation could correspond to an insertion followed by a deletion in the other. Real examples can be considerably more complicated with multiple call events matching multiple baseline events, but in such a way that it is not possible to match any subset of the events.
In the "snp_roc.tsv.gz" and "non_snp_roc.tsv.gz" outputs of vcfeval we opted to categorize variants with respect to the calls (because the calls have the scores needed for producing the curves). This means we do have a well-defined sets of true-positive and false-positive calls (i.e. the call either matched something in the baseline or it did not), and hence a well-defined precision = tp / (tp+fp). But there is no correspondingly well-defined definition for the false-negatives (because the representation problem means we cannot decide how many fn's are SNPs versus indels -- at least not without introducing a representation bias to the result), hence we cannot calculate a recall using recall = tp / (tp + fn). For this reason we don't include precision, recall, F-measure in the split out files. Since the "weighted_roc.tsv.gz" file covers all variants the decision problem does not arise and precision, recall, F-measure can be calculated.
(It would be possible to flip this around and categorize events with respect
to the baseline. Then you again have well-defined true-positives, and
now the false-negatives are well-defined, but the false-positives are no
longer well-defined and the baseline variants do not have the score.)
To produce approximate curves from "snp_roc.tsv.gz" and "non_snp_roc.tsv.gz", rocplot uses total-baseline-events - true-positives as a proxy for the false-negatives.
Sean.