vcf format question

465 views
Skip to first unread message

Airan

unread,
Oct 15, 2014, 11:41:18 AM10/15/14
to delly...@googlegroups.com
Hi Tobias,

I'm just trying to understand well the format of delly vcf output and I have some questions. If I understand OK (correct me if I'm wrong please):

- FILTER = LowQual if PE<3 OR/AND MAPQ<20
- PRECISE/IMPRECISE --> Precise: SV has split-read support (SR>0).

But, inside FORMAT column, we also have the FT parameter for each sampel, which can be PASS/LowQual (the same as the filter column). Depending on which parameters FT becomes PASS or LowQual?

I hope I've explained.

Another thing, well, this is only a suggestion, it could be nice if you could add the allele frequency (AF in vcf format) for each genotype in addition to  GT (Genotype).

Thanks in advance

이승철

unread,
Oct 16, 2014, 2:52:25 AM10/16/14
to delly...@googlegroups.com
Hi Airan.

I also have a same questions.

In my case, there are many variants that is PASS whereas FT parameter is LowQual. 

which one is more influence largely on variants?

2014년 10월 16일 목요일 오전 12시 41분 18초 UTC+9, Airan 님의 말:

Tobias Rausch

unread,
Oct 16, 2014, 4:42:09 AM10/16/14
to 이승철, delly...@googlegroups.com
Hi Airan,

Yes, FILTER = LowQual if PE<3 OR MAPQ<20 (for translocations: PE<5 OR MAPQ<20). Precise variants have split-read support (SR>0).

All VCF INFO fields give information about the structural variant site whereas all genotype tags (FORMAT fields) are about the genotype of one particular sample. An example is for instance a deletion site  occurring in 100 out of 1000 samples so at 10% frequency. The deletion site will be very confident because Delly accumulates evidence from 100 deletion carriers. However, a single sample having this deletion might have only 1 read supporting the deletion and 6 reads supporting the reference so the genotype could be het. or hom. reference depending on the quality of the read pairs. These reference and alternative read counts + qualities are used to compute genotype likelihoods for hom. ref., het. and hom. alt. (GLs). The final genotype (GT) is simply derived from the best GL and GQ is a phred-scaled genotype quality reflecting the confidence in this genotype. If GQ<15 the genotype is flagged as LowQual.

I mostly worked on the genotyping lately so if you plan to filter based on the genotypes please update to the latest Delly version, v0.5.9.

-Tobias


--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.
To post to this group, send email to delly...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages