delly CN tag, vcf specs and witty.er compatibility

184 views
Skip to first unread message

Dmitriy Drichel

unread,
Oct 20, 2020, 6:22:03 AM10/20/20
to delly-users
Hello Tobias, hello everyone,

I am using delly (mostly in lr mode) and came across compatibility problems with with the benchmarking tool witty.er (https://github.com/Illumina/witty.er) due to the CN tag. The problems are same as described in the witty.er github repo beginning with this comment:
I am using the GIAB v0.6 SV HG002 benchmark data set with HG002 ultralong ONT reads.

In the following, my understanding how how this problem came to be.

 - Delly has been around for some time (published in 2012), and delly used the CN tag  before it was included in the vcf specification. CN in delly is the read-depth-based estimation of copy number. INS always have CN=2 (at least for diploid genomes), whereas CN varies for DEL. In the long read mode, So far I get only PRECISE deletions. In my understanding, CN was meant to guide interpretation of IMPRECISE calls from short-read data.

- In 2014, a proposal was made to include SVTYPE CNV, with CN being CNV-equivalent to GT: https://sourceforge.net/p/vcftools/mailman/message/32129088/ 

- The vcf specification v4.2 is ambiguous about the purpose and usage of the CN tag. I don't see where exactly its usage is reserved for CNVs

- witty.er treats the CN tag as intended in the original proposal, see comment https://github.com/Illumina/witty.er/blob/:
/// Gets the CN tag. Only CNV has CN

Now the following questions:

- Removing the CN tag from delly output eliminates witty.er issues, resulting in competitive performance compared to other callers. Removing CN tag does not affect benchmarking of delly with truvari. Does CN serve a purpose in lr mode, besides consistency check and quality control?

- Was the possibility considered to rename the CN tag, in order to avoid confusion with the CNV-specific CN? Maybe only if a custom switch is supplied to delly, to keep backwards compatibility?

- A minor point: in case a CN can not be estimated, it is currently set to "-1", I would expect "."


Thanks in advance

Dmitriy

Tobias Rausch

unread,
Oct 20, 2020, 6:31:34 AM10/20/20
to Dmitriy Drichel, delly-users
Good point, FORMAT:CN will sooner or later create problems. I renamed this field to RDCN.


Best, Tobias


--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delly-users/5acf6444-1d7d-4691-84e2-67ce1effedc0n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages