Hello Tobias, hello everyone,
I am using delly (mostly in lr mode) and came across compatibility problems with with the benchmarking tool
witty.er (
https://github.com/Illumina/witty.er) due to the CN tag. The problems are same as described in the
witty.er github repo beginning with this comment:
I am using the GIAB v0.6 SV HG002 benchmark data set with HG002 ultralong ONT reads.
In the following, my understanding how how this problem came to be.
- Delly has been around for some time (published in 2012), and delly used the CN tag before it was included in the vcf specification. CN in delly is the read-depth-based estimation of copy number. INS always have CN=2 (at least for diploid genomes), whereas CN varies for DEL. In the long read mode, So far I get only PRECISE deletions. In my understanding, CN was meant to guide interpretation of IMPRECISE calls from short-read data.
- The vcf specification v4.2 is ambiguous about the purpose and usage of the CN tag. I don't see where exactly its usage is reserved for CNVs
/// Gets the CN tag. Only CNV has CN
Now the following questions:
- Removing the CN tag from delly output eliminates
witty.er issues, resulting in competitive performance compared to other callers. Removing CN tag does not affect benchmarking of delly with truvari. Does CN serve a purpose in lr mode, besides consistency check and quality control?
- Was the possibility considered to rename the CN tag, in order to avoid confusion with the CNV-specific CN? Maybe only if a custom switch is supplied to delly, to keep backwards compatibility?
- A minor point: in case a CN can not be estimated, it is currently set to "-1", I would expect "."
Thanks in advance
Dmitriy