Dear everyone,
I have a question on using exome samples with the rtg vcfeval tool, and I was hoping you could help me make this decision. I have a sample for which the exome has been sequenced (using a twist kapa bait set). I have run vcfeval on this sample in three ways, 1) against the golden truth set of a nist sample (whole genome coordinates), 2) on the twist bait capture bed file (should be all regions present in the exome fastq files), and 3) on all nist golden truth set bed regions that occur within the twist bait set bed file (bedtools intersect, keeping all nist base pairs overlapping within the exome bed file sequences).
The output of run 1 is not very good, but this is expected, as there are a lot of missing variants not capture by the exome approach, . For the second run using the twist bed file, precision and f-measure are around 0.9, and there are a bunch of false positives more than expected (false positives are about 15% of the total number of true positive baseline). For the third run, with the most restricted bed file, I get very high numbers, showing that the nist regions within the exome bait set have a very high precision.
In my mind I should be using the twist bait set bed file, as it represents the sequences that are represented in the data, however I see a high number of false positives when I do this. When I intersect the two files I see a lot fewer false positives, but I don't know whether I am misrepresenting the data in that way. Could I ask for your thoughts on this?
Thank you for your time.Best, Peter--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/63bd174d-8e84-4bbd-ae17-5a8a12776777n%40realtimegenomics.com.