true positives classified as false negatives by bndeval

29 views
Skip to first unread message

Asher Bryant

unread,
Jun 29, 2023, 1:37:36 AM6/29/23
to RTG Users
Hello again! Thank you again for your help so far!

I believe bndeval may be misclassifying true positives as false negatives.

I am running svdecompose on a caller vcf and comparing the svdecompose vcf with bndeval with a 1000 bp tolerance. 
Comparing svdecompose & bndeval results with my own script that decomposes the SVs and then merges the breakpoints to the truthset, I see that 23 of the breakpoints that are in svdecompose and well within 1000 bp of the thruthset breakpoint are categorized as false negatives.

I've attached the 
- truthset vcf, truthset_somaticSVs_COLO829_hg38lifted.vcf;
- caller vcf, SEVERUS_somatic_colo829_tumor_xy.haplotagged.vcf;
- svdecompose vcf, SEVERUS_somatic_colo829_tumor_xy.haplotagged_nochr_srt_dc.vcf; 
- false negative vcf, fn.vcf; 
- csv of the truthset (columns with "_truthset") merged with my script results (columns with "_Severus") & svdecompose (columns with "_svdecompose"), truth_pandas_svdecompose.csv.  

Please let me know if you need any other information/files.  

Best wishes,
Asher
truth_pandas_svdecompose.csv
truthset_somaticSVs_COLO829_hg38lifted.vcf
SEVERUS_somatic_colo829_tumor_xy.haplotagged_nochr_srt_dc.vcf
SEVERUS_somatic_colo829_tumor_xy.haplotagged.vcf
fn.vcf

Sean Irvine

unread,
Jul 4, 2023, 10:32:56 PM7/4/23
to Asher Bryant, RTG Users
Hi Asher,

I hope I have interpreted your question correctly, but I believe the problem is that the calls file has SVTYPE=BND breakend events using a symbolic allele <BND> rather than the adjacencies format as defined in the VCF 4.2 specification.  Such records are not processed by bndeval.  You can see this by running bndeval with "--output-mode annotate", then looking at the resulting annotation which is IGN (for ignored) in such records.

To look at a specific example, I took the following record from the truth:

1       224458901       truthset_4_1    A       ]1:224612418]A  .       PASS    MATEID=truthset_4_2;SVLEN=153517;SVTYPE=DUP;SUPP_SEQ=ILL,ONT,PB;SUPP_VAL=PCR,CAPTURE,BIONANO;GENE=.;CLUSTER=truthset_5 GT    0/1

While, in the calls the corresponding event appears to be:

1       224458900       SEVERUS_BND73   N       <BND>   60      PASS    CHR2=1;CLUSTERID=severus_30;END=224612418;HVAF=0.00|0.26|0.00;MAPQ=60;SUPPREAD=54;SVLEN=153518;SVTYPE=BND      GT:GQ:VAF:DR:DV 1|0:667:0.26:153:54

This looks like it should be a straightforward match, but according to the VCF specification when SYTYPE=BND is given, the ALT field should be an adjacency specification (like in the truth) and not the symbolic allele <BND>.  When bndeval sees this call it is simply ignored, consequently at output the truth event remains unmatched and is output as a false-negative:

1       224458900       SEVERUS_BND73   N       <BND>   60      PASS    CHR2=1;CLUSTERID=severus_30;END=224612418;HVAF=0.00|0.26|0.00;MAPQ=60;SUPPREAD=54;SVLEN=153518;SVTYPE=BND;CALL=IGN      GT:GQ:VAF:DR:DV 1|0:667:0.26:153:54

(Note addition of CALL=IGN above from bndeval indicates this record was ignored.)

Because this <BND> symbolic allele is strictly speaking invalid VCF, it is also not understood by svdecompose. It is something we could perhaps look at supporting in the future, but at the moment your best option is probably to script something yourself that converts all the symbolic <BND> alleles into valid adjacency specifications.

Regards,
Sean.

--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/1e961252-85b7-4d35-b2de-91129698b821n%40realtimegenomics.com.
Reply all
Reply to author
Forward
0 new messages