VCF fields

3,088 views
Skip to first unread message

Lauren Chong

unread,
Jul 23, 2014, 3:55:47 PM7/23/14
to delly...@googlegroups.com

Hi Tobias,


I am interpreting some results from Delly output and I have questions about some fields.


The "SRQ" attribute in the INFO column is described as "Split-read consensus alignment quality". Can you give me some more information on this? Is the quality measured as a score from 0-1?


I'm also a bit confused about some of the FORMAT/DATA fields, specifically the DR/DV/RR/RV counts. How is "high-quality" defined? And in particular, what exactly do DV and RV represent? Are they related to the PE and SR fields in the INFO column?


Apologies if this is explained elsewhere--I wasn't sure where to look for more explanation!


Thank you,

Lauren Chong

Tobias Rausch

unread,
Aug 5, 2014, 4:42:25 AM8/5/14
to Lauren Chong, delly...@googlegroups.com
Hi Lauren,

The consensus alignment quality is indeed a score between 0 and 1 where 1 indicates 100% identity to the reference. Nearby SNPs, InDels and micro-insertions at the breakpoint can lower this score but only for mis-assemblies it should be very poor. Delly currently drops consensus alignments with a score < 0.8 and then falls back to the paired-end prediction.

The genotyping takes into account all paired-ends with a mapping quality greater 20 by default. This can be changed in later Delly version on the command line using '-u'. DR & DV represent counts for reference and SV supporting paired-ends, which are then fed into the genotyping model to derive genotype likelihoods, a genotype quality and the final genotype call. For precise events (INFO:PRECISE), Delly uses RR & RV, which are the reference and SV allele supporting reads (not pairs) because the exact breakpoint sequence is known. The sum of all DV counts (across all input samples) should be close to INFO:PE because for the SV discovery Delly pools all paired-ends from all samples. The numbers are not identical because the SV discovery stops searching for additional support once enough confident abnormal paired-ends have been found (to save runtime) and likewise the genotyping is a bit more strict for quality (20 by default) and insert size.

I hope these explanations help a bit with filtering the Delly SV calls, the genotyping for translocations improved quite a bit in Delly v0.5.6.

-Tobias



--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.
To post to this group, send email to delly...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lauren Chong

unread,
Aug 5, 2014, 1:29:05 PM8/5/14
to delly...@googlegroups.com, lcch...@gmail.com
Thank you Tobias, that's really helpful and will be very useful in my filtering!

Lauren Chong

unread,
Sep 23, 2014, 1:12:44 PM9/23/14
to delly...@googlegroups.com, lcch...@gmail.com
Hi Tobias,

Can you explain what the CT field in the VCF INFO column represents? In particular, what do 5to3, 3to5, 3to3 and 5to5 mean and how are they generated?

Thanks,
Lauren

Tobias Rausch

unread,
Sep 23, 2014, 3:32:55 PM9/23/14
to Lauren Chong, delly...@googlegroups.com
Hi Lauren,

Especially for complex structural variants (SVs) it's helpful to think of the delly output as describing distal connections (CT) in the genome. Once there are double-strand breaks, the dangling ends are joined together. For intra-chromosomal SVs, you can have a 3' to 5' (deletion-type) connection/join, a 5' to 3' (tandem-duplication type) connection, a 3' to 3' (tail-to-tail inversion or left-spanning inversion) and a 5' to 5' (head-to-head inversion or right-spanning inversion) connection. For translocations (inter-chromosomal SVs), you can again have all 4 possible connection types 3' to 5', 5' to 3', 3' to 3' and 5' to 5'. 

Best regards, Tobias

Lauren Chong

unread,
Sep 29, 2014, 7:35:39 PM9/29/14
to delly...@googlegroups.com, lcch...@gmail.com
Thanks Tobias!

Is there a way to relate the CT information to strand information? For instance, if I want to integrate my Delly output with output from other callers that provide strand orientation ("+" or "-" for each call), is it possible to extract this from the Delly VCF?

Thanks again,
Lauren

Tobias Rausch

unread,
Sep 30, 2014, 7:58:12 AM9/30/14
to Lauren Chong, delly...@googlegroups.com
I am not sure what other tools report as strand information but if "+" just means that the leftmost read is aligned to the forward strand and the rightmost read on the reverse strand (standard illumina paired-end layout) then this would be incomplete. 

Best, Tobias


--

Daniel Jeffares

unread,
Jul 28, 2015, 5:41:30 AM7/28/15
to delly-users, lcch...@gmail.com, rausc...@gmail.com, Clemency Jolly, Fritz Sedlazeck
Hi Tobias,

I'm a little confused about the CT INFO codes, and how the start and end coordinates are defined. I wonder if we could have some help? I've made a powerpoint file showing the junctions, and how I suspect they are defined (attached as powerpoint and pdf). Some specific questions are:

1. Why are all DELs marked as CT=3to5, while DUPs are CT=5to3? These junctions look the same to me (a least in my diagram?).
2. For the 3to3 junction of inversions, it seems that the start (POS) and end (END=) coordinates defined in the vcf correspond to positions B and D in my diagram. But what about the 5to5 junctions? What do start and end define?
3. I don't understand how we can get CT=3to5 TRA junctions.

Clemency, Fritz and I would be very grateful for you help,

regards,
Daniel Jeffares
delly-junction-types.pptx
delly-junction-types.pptx.pdf

Tobias Rausch

unread,
Jul 28, 2015, 3:02:37 PM7/28/15
to Daniel Jeffares, delly-users, Lauren Chong, Clemency Jolly, Fritz Sedlazeck
Hi Daniel,

These connection types (CT) are probably most relevant for complex re-arrangements in cancer. Another way to think about these is in terms of the paired-end orientation (Medvedev et al. 2010: PMID: 2080529, Figure 3). Different names exist for these connection types in the literature, e.g. Stephens et al. (PMID: 21215367) used head-to-head inversions, tail-to-tail inversions, and so on.

Intra-chromosomal rearrangements:
(1) deletion-type = tail-to-head = 3' to 5' = [+-]
(2) tandem-duplication type = head-to-tail = 5' to 3' = [-+]
(3) left-spanning inversion type = tail-to-tail = 3' to 3' = [++]
(4) right-spanning inversion type = head-to-head = 5' to 5' = [--]

For inter-chromosomal rearrangements the pairs can also map [+-], [-+], [++] and [--] just on 2 different chromosomes and hence, you have all 4 possible connections types for translocations in Delly. The easiest example is a single chromosome A where a virus B integrated. If B was integrated in forward orientation you will see a 3' to 5' translocation and a 5' to 3' translocation at the integration site. If B was integrated in inverted orientation you will see a 3' to 3' translocation and a 5' to 5' translocation at the integration site.

Best, Tobias
 

Daniel Jeffares

unread,
Aug 27, 2015, 12:08:42 PM8/27/15
to delly-users, d...@katipo.org, lcch...@gmail.com, clemency...@ucl.ac.uk, fritz.s...@gmail.com
Thanks Tobias,

So is this correct for translocations?

name  chr1:chr2
3to5    --->:--->  (like a deletion)
3to3    --->:<---  (like the 5' end of an inversion)
5to5    <---:--->  (like the 3' end of an inversion)
5to3    <---:<---  (also like a deletion but both reads from the -ve strand)

cheers,

Dan

Tobias Rausch

unread,
Sep 9, 2015, 10:32:44 AM9/9/15
to Daniel Jeffares, delly-users

Sorry, I forgot to post this to the delly-users list:

For a translocation, you have 2 double strand breaks, one on chrA and one on chrB. This creates 4 "dangling" ends, chrA_left, chrA_right, chrB_left, chrB_right. For a translocation you can join chrA_left with chrB_left (3to3), chrA_left with chrB_right (3to5), chrA_right with chrB_left (5to3) and chrA_right with chrB_right (5to5). In fact for a typical reciprocal translocation in prostate cancer (where two chromosomes exchange their end) Delly calls 2 translocations at the breakpoint, one 3to5 and one 5to3. But obviously not all translocations are reciprocal.

-Tobias


Reply all
Reply to author
Forward
0 new messages