Adjustment of GQ(FORMAT:GQ)

302 views
Skip to first unread message

Jingu Lee

unread,
Nov 17, 2014, 9:04:13 AM11/17/14
to delly...@googlegroups.com
Hi Tobis.

My name is Jingu Lee and I'm a student having a major of bioinformatics.
I'm trying to use DELLY the best I can, but I have one question when using genotyping options '-u'.

Using my bam file, I got results that only had both PASS from filter and PRECISE from INFO.
After then, I wanted to extract some significant data from IMPRECISE data of PASS but there were large amount of data.
So I used genotyping options '-u' to reduce amount of data. As you know, the default value of '-u' is 20 and I increase this limit from 20 to 90.

However, after using this option(-u 90), the results is so strange.

GT:GL:GQ:FT:RC:DR:DV:RR:RV     ./.:.,.,.:0:LowQual:0:0:0:0:0 (All results show this value.)

Did I use wrong method of genotyping options?

Additionally, I have another question. Do you have any method to reduce amount of IMPRECISE data? Or Do you know how to change the results from IMPRECISE data to PRECISE data?
(Actually, I have more curious this questions than first question)

Thanks.
Best regards.

Jingu Lee.

Brad Chapman

unread,
Nov 17, 2014, 9:39:00 AM11/17/14
to Jingu Lee, delly...@googlegroups.com

Jingu;
Tobias will probably have more insight into your general questions but
from my perspective as a user I can help with the specific problem you saw:

> So I used genotyping options '-u' to reduce amount of data. As you know,
> the default value of '-u' is 20 and I increase this limit from 20 to 90.
>
> However, after using this option(-u 90), the results is so strange.
>
> GT:GL:GQ:FT:RC:DR:DV:RR:RV ./.:.,.,.:0:LowQual:0:0:0:0:0 (All results
> show this value.)

`-u 90` removes any reads which do not have a mapping quality of 90 or
more. For most aligners you won't have any reads mapping at this quality
so this effectively removes everything, resulting in the no calls with
no read support that you see.

> *Additionally, I have another question. Do you have any method to reduce
> amount of IMPRECISE data? Or Do you know how to change the results from
> IMPRECISE data to PRECISE data?*
> *(Actually, I have more curious this questions than first question)*

IMPRECISE indicates that Delly cannot precisely define the breakpoints
of the event and is more of an indicator that you need additional
evidence. I don't know of any way to tune Delly to be more precise via
the parameters. You likely would need more data to more precisely define
the breakpoints.

Hope this helps some,
Brad

Huy Vuong

unread,
Nov 17, 2014, 10:25:36 PM11/17/14
to delly...@googlegroups.com, lee.ji...@gmail.com
Hi Tobias, 
I am also curious to understand how DELLY determine PRECISE and IMPRECISE. There is a -m (--min-flank: minimum flanking sequence size) parameter which I believe you can tune. When you increase the default value (13), you can convert PRECISE calls to IMPRECISE. However, decreasing the default value even to 0 can't convert all IMPRECISE to PRECISE calls in my data. Would you please explain this -m parameter? Thanks

I have simulated a 500bp homozygous deletion and run DELLY. It reported only an IMPRECISE breakpoint no matter how I change the parameter (-s, -m). IGV of the deletion region showed many soft-clipped reads to support the breakpoints. How come DELLY didn't report it as a PRECISE deletion? Thanks. 

From delly_del.vcf:
chr10   123297686       DEL00000007     N       <DEL>   .       PASS    IMPRECISE;CIEND=-15,15;CIPOS=-15,15;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv0.5.9;CHR2=chr10;END=123298138;SVLEN=452;CT=3to5;PE=102;MAPQ=70      GT:GL:GQ:FT:RC:DR:DV:RR:RV  1/1:-2689.79,-117.999,0:1180:PASS:1:0:392:0:0
 
Best,
Huy 
askDelly.PNG

Tobias Rausch

unread,
Nov 18, 2014, 4:01:55 AM11/18/14
to Huy Vuong, delly...@googlegroups.com, 이진구
Hi Huy, 

Delly clusters abnormal paired-ends and every single cluster gives rise to an IMPRECISE SV call. For every IMPRECISE SV call Delly then tries to identify split-reads and one parameter that influences this split-read search is -m. In short, Delly computes a consensus sequence out of all split-read candidates and then aligns this consensus sequence to the reference requiring at least -m XXX many aligned bases to the left and right. When I wrote Delly's split-read module the average read length was 36bp-50bp so I fully agree it's a bit outdated by now and the plan is to indeed soon replace this by building a consensus from soft-clipped reads only, which seem to be fairly abundant when you have >=100bp reads. Nevertheless, Delly will always report only a subset of all paired-end calls at breakpoint resolution (PRECISE) because SVs tend to occur in repetitive regions.

Best, Tobias



--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.
To post to this group, send email to delly...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Huy Vuong

unread,
Nov 18, 2014, 6:16:47 PM11/18/14
to delly...@googlegroups.com, huy....@gmail.com, lee.ji...@gmail.com
Thank for your explanation, Tobias. May I have a follow up question, for longer read length (e.g 100 bp reads), which values of -m parameter I should use to get the most PRECISE calls? 

Huy

Tobias Rausch

unread,
Nov 20, 2014, 3:55:31 PM11/20/14
to Huy Vuong, delly...@googlegroups.com, 이진구
I frankly never evaluated this so far, depends also on how repetitive the breakpoints are. Just to re-emphasize this, we actually never filter structural variants for PRECISE/IMPRECISE because many IMPRECISE calls can be true. Filtering by VCF FILTER equals PASS and genotype quality is more useful, I think, unless you are primarily interested in the exact breakpoint sequence.

Best, Tobias

Huy Vuong

unread,
Nov 21, 2014, 4:44:28 PM11/21/14
to delly...@googlegroups.com, huy....@gmail.com, lee.ji...@gmail.com
Thanks, Tobias. In my simulation, 9/10 deletions (500bp deletion) reported by DELLY as PRECISE. Only 1/10 deletion is IMPRECISE. I am curious to know why. The simulated breakpoint is not in the repetitive region, so this could be due to difficulty in building consensus sequence with split read support as you explained. Related to IMPRECISE breakpoints, would you please explain the CIEND (PE confidence interval around END) and CIPOS (PE confidence interval around POS) information in the VCF file? 
In my previous example: 

chr10   123297686       DEL00000007     N       <DEL>   .       PASS    IMPRECISE;CIEND=-15,15;CIPOS=-15,15;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv0.5.9;CHR2=chr10;END=123298138;SVLEN=452;CT=3to5;PE=102;MAPQ=70      GT:GL:GQ:FT:RC:DR:DV:RR:RV  1/1:-2689.79,-117.999,0:1180:PASS:1:0:392:0:0

Does CIPOS=-15,15 mean that the breakpoint is within the range (POS - 15, POS + 15) with 95 (standard?) confidence interval? Thanks. 

Best regards,
Huy

Shweta Chavan

unread,
Apr 6, 2015, 2:46:07 PM4/6/15
to delly...@googlegroups.com, huy....@gmail.com, lee.ji...@gmail.com
Hello Tobias,

>Related to IMPRECISE breakpoints, would you please explain the CIEND (PE confidence interval around END) and CIPOS (PE confidence interval around POS) information in the VCF file? 

I have this same question, could you please provide us some insights into interpreting CIEND and CIPOS?

Thanks,
Shweta

Tobias Rausch

unread,
Apr 8, 2015, 3:34:48 PM4/8/15
to Shweta Chavan, delly...@googlegroups.com, Huy Vuong, 이진구
Using paired-end mapping one can narrow down the approximate location of the structural variant breakpoint. Delly currently takes the maximum possible offset from either SV boundary as the confidence interval so it should be rather conservative.

-Tobias

Shweta Chavan

unread,
Apr 8, 2015, 4:29:13 PM4/8/15
to Tobias Rausch, delly...@googlegroups.com, Huy Vuong, 이진구
Thank you Tobias, this helps for sure.

Shweta

-Shweta
Simplicity is the ultimate sophistication

Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and privileged information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message..
Reply all
Reply to author
Forward
0 new messages