confused about vcf representation for multiple samples

525 views
Skip to first unread message

xiaok...@gmail.com

unread,
Oct 29, 2014, 3:59:11 PM10/29/14
to delly...@googlegroups.com
Hi  Tobias,

I tried to use delly to perform sv analysis for multiple samples together. command is like this: delly_v0.5.5_parallel_linux_x86_64bit -t DUP -o sv_INV.vcf -g /human/hg19.fa -q 30  tumor.bam normal1.bam normal2.bam ....
However, I am getting confused about the results represented using vcf (below is a sample excerpt) for multiple samples. I have several questions (forgive me if they seem to be stupid questions).
1. why the REF are all 'N' and QUAL are all '.' for all my vcf results.
2. how to interpret the samples columns. for example, how can I tell the inv event (for example, INV00038236N) happens in which sample?
3. how to use the format values of "GT:GL:GQ:FT:RC:DR:DV:RR:RV" to interpret sv event for each sample.
4. Do we have a detailed manual for delly? 
Thank you very much.

-W

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT tumor normal1 normal2 normal3 normal4 normal5 normal6 normal7 normal8
chr1 3326456 INV00038236 N <INV> . PASS CIEND=-369,369;CIPOS=-369,369;CHR2=chr1;END=230467789;PE=4;MAPQ=44;CT=5to5;IMPRECISE;SVLEN=227141333;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1:-11.1982,-0.901323,0.0:9:LowQual:92999353:0:3:0:0 1/1:-4.39998,-0.301013,0.0:3:LowQual:94481145:0:1:0:0 ./.:.,.,.:0:LowQual:94137321:0:0:0:0 1/1:-5.59862,-0.600682,0.0:6:LowQual:96112771:0:2:0:0 0/0:0.0,-1.09295,-24.8844:11:LowQual:47411589:10:1:0:0 0/0:0.0,-1.79645,-16.1903:18:PASS:46844335:6:0:0:0 0/0:0.0,-2.68925,-22.68:27:PASS:47292959:9:0:0:0 0/0:0.0,-3.28493,-26.6736:33:PASS:47292243:11:0:0:0
chr1 6099470 INV00073918 N <INV> . LowQual CIEND=-430,430;CIPOS=-430,430;CHR2=chr1;END=30028154;PE=2;MAPQ=36;CT=5to5;IMPRECISE;SVLEN=23928684;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1:-5.19758,-0.599639,0.0:6:LowQual:10186700:0:2:0:0 1/1:-11.9978,-1.20194,0.0:12:LowQual:10903074:0:4:0:0 1/1:-6.3992,-0.601262,0.0:6:LowQual:10314769:0:2:0:0 1/1:-7.99689,-0.89998,0.0:9:LowQual:10694871:0:3:0:0 0/0:0.0,-8.72898,-123.099:87:PASS:5512227:29:0:0:0 0/0:0.0,-8.42803,-119.799:84:PASS:5209758:28:0:0:0 0/0:0.0,-9.3305,-126.999:93:PASS:5431453:31:0:0:0 0/0:0.0,-5.71888,-79.6993:57:PASS:5409010:19:0:0:0
chr1 8446627 INV00026047 N <INV> . LowQual CIEND=-378,378;CIPOS=-378,378;CHR2=chr1;END=78543590;PE=2;MAPQ=44;SR=83;SRQ=1.0;CONSENSUS=TCTTTTTCATGAAGGGCCTGTGTAAGTCTTTTGCTTATTTTTCAATGGATTGTCCTTTTCTTTTTCATTTGTAAGAGTTCGGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAGGAG;CT=5to5;PRECISE;SVLEN=70096963;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 0/1:-23.283,0.0,-36.1828:233:PASS:32244281:0:0:10:7 0/1:-56.3806,0.0,-10.781:108:PASS:33437139:0:0:4:15 0/1:-32.0848,0.0,-20.285:203:PASS:32688986:0:0:6:9 0/1:-57.7745,0.0,-30.375:304:PASS:33528008:0:0:9:16 0/0:0.0,-12.6425,-184.199:126:PASS:16661735:36:0:42:0 0/1:-20.1551,0.0,-152.052:202:PASS:16221321:37:0:39:8 0/0:0.0,-9.33139,-136.399:93:PASS:16563798:31:0:31:0 0/1:-32.6605,0.0,-109.56:327:PASS:16530166:22:1:28:11
chr1 16022604 INV00041912 N <INV> . LowQual CIEND=-392,392;CIPOS=-392,392;CHR2=chr1;END=178948185;PE=2;MAPQ=34;SR=9;SRQ=0.927885;CONSENSUS=CCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCAAGCGATTTTCTTTGGTAGTATATTTTAATTCCTTGTTTTTTCAGTGTATCTATTACGGGTTTTT;CT=3to3;PRECISE;SVLEN=162925581;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 0/1:-15.5922,0.0,-73.4763:156:PASS:63258726:20:0:24:8 0/1:-24.7764,0.0,-72.9699:248:PASS:64770122:20:0:22:9 0/0:0.0,-4.50439,-54.2889:45:PASS:64770898:28:0:15:0 0/1:-62.3687,0.0,-48.572:486:PASS:65883151:0:0:16:20 0/1:-7.15811,0.0,-179.644:72:PASS:32466838:49:0:49:6 0/1:-9.46092,0.0,-126.763:95:PASS:32086360:34:0:33:6 0/1:-2.36261,0.0,-138.761:24:PASS:32395787:23:0:37:4 0/0:0.0,-12.0363,-164.595:120:PASS:32396291:46:0:40:0
chr1 16801817 INV00052896 N <INV> . PASS CIEND=-301,301;CIPOS=-301,301;CHR2=chr1;END=17273996;PE=5;MAPQ=31;SR=4;SRQ=1.0;CONSENSUS=ATTATTAATCTGTATCATTATTATTATTATTATTATTATTATTATTATTATTATTATTATTGAGACGGAGTCTTGCTCTGTTGCCCAGGCTGGAGTGCAGT;CT=3to3;PRECISE;SVLEN=472179;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 0/1:-4.99624,0.0,-6.29657:50:PASS:97546:4:0:2:2 0/1:-2.49796,0.0,-3.79829:25:PASS:116989:1:0:1:1 0/0:0.0,-0.601698,-7.49964:6:LowQual:161498:0:0:2:0 0/1:-2.69796,0.0,-3.79816:27:PASS:111912:0:0:1:1 0/1:-21.4761,0.0,-67.0685:215:PASS:78300:14:0:24:9 0/1:-19.0791,0.0,-66.768:191:PASS:57664:15:0:25:9 0/1:-11.5858,0.0,-41.2866:116:PASS:54648:7:0:15:6 0/1:-25.9767,0.0,-55.6726:260:PASS:54681:11:0:21:11
chr1 16832462 INV00053890 N <INV> . PASS CIEND=-394,394;CIPOS=-394,394;CHR2=chr1;END=147995130;PE=5;MAPQ=38;CT=5to5;IMPRECISE;SVLEN=131162668;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1:-6.49823,-0.600293,0.0:6:LowQual:49370721:0:2:0:0 1/1:-6.59924,-0.601302,0.0:6:LowQual:50368168:0:2:0:0 1/1:-16.1907,-1.79685,0.0:18:PASS:50413370:0:6:0:0 1/1:-8.09469,-0.897785,0.0:9:LowQual:51311536:0:3:0:0 0/0:0.0,-0.298281,-2.19725:3:LowQual:25271250:1:0:0:0 0/0:0.0,-0.298281,-2.19725:3:LowQual:24980080:1:0:0:0 0/1:-1.60241,0.0,-3.49829:16:PASS:25209113:2:1:0:0 ./.:.,.,.:0:LowQual:25213953:0:0:0:0
chr1 22020908 INV00043280 N <INV> . LowQual CIEND=-378,378;CIPOS=-378,378;CHR2=chr1;END=78735810;PE=2;MAPQ=44;SR=3;SRQ=0.925;CONSENSUS=GATTACAGGCACATGCCACCACACCCAGCTAATTTTTGTATTTTTTGTAGAGACAGTGGTCTTGGTATGTTGTCCATGCTGGTCTCAAACTCCTGGCCAC;CT=5to5;PRECISE;SVLEN=56714902;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 0/1:-11.9959,0.0,-3.19593:32:PASS:26632576:17:0:1:3 0/1:-7.89693,0.0,-3.49694:35:PASS:27374588:2:0:1:2 ./.:.,.,.:0:LowQual:26945514:5:0:0:0 0/0:0.0,-0.602025,-8.79997:6:LowQual:27595359:5:0:2:0 0/1:-14.8939,0.0,-6.49401:65:PASS:13637830:30:0:2:4 0/1:-10.395,0.0,-6.09502:61:PASS:13373035:41:0:2:3 0/1:-5.3941,0.0,-14.194:54:PASS:13593949:23:0:4:2 0/1:-12.4939,0.0,-6.39672:64:PASS:13575650:25:0:2:4
chr1 27232146 INV00030022 N <INV> . PASS CIEND=-253,253;CIPOS=-253,253;CHR2=chr1;END=33516414;PE=35;MAPQ=39;SR=12;SRQ=1.0;CONSENSUS=AAGCCATCAAGATTCCTCTGTGAGGGATACCCTCCCTATACCAGGGCAAAAAATTACCCCTGATTACTTTCATTCTCATTGCACTGACCGGAACTTAGTTACAATGCCACGACTAGCAGCCGGGGT;CT=5to5;PRECISE;SVLEN=6284268;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1:-47.1997,-3.31103,0.0:33:PASS:2716531:23:15:0:11 0/1:-41.0838,0.0,-14.8839:149:PASS:2891135:23:10:5:11 0/0:0.0,-0.902329,-11.0992:9:LowQual:2756135:31:0:3:0 0/1:-14.7949,0.0,-2.89504:29:PASS:2841712:29:10:1:4 0/1:-9.49291,0.0,-14.693:95:PASS:1489538:51:0:4:3 0/1:-9.59395,0.0,-10.394:96:PASS:1414630:50:0:3:3 0/1:-9.99302,0.0,-13.3929:100:PASS:1472556:45:0:4:3 0/1:-1.99402,0.0,-18.6939:20:PASS:1468666:38:0:5:1
chr1 28727947 INV00049051 N <INV> . LowQual CIEND=-364,364;CIPOS=-364,364;CHR2=chr1;END=81351537;PE=2;MAPQ=44;SR=45;SRQ=0.884615;CONSENSUS=AAGCTCCGCCTCCCGGGTTCACGCCATTCTCCTGCCTCAGCCTCCCAAGTAGCAATTTTCAAGTGAACTTTTAGACTCTCTTAAAGTTAGGATGGGATTATAGCTTTTTAAATGTATAAAGTAACTATTT;CT=3to3;PRECISE;SVLEN=52623590;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 0/1:-24.2712,0.0,-81.8705:243:PASS:24972063:28:0:21:8 0/1:-19.6703,0.0,-89.3694:197:PASS:25470425:20:0:23:7 0/1:-28.9695,0.0,-83.3694:290:PASS:25234704:31:0:21:9 0/0:0.0,-6.62228,-96.7996:66:PASS:25814782:42:0:22:0 0/0:0.0,-11.7395,-171.599:117:PASS:12647128:47:0:39:0 0/1:-97.4409,0.0,-111.441:974:PASS:12482049:47:1:32:29 0/0:0.0,-10.2344,-149.599:102:PASS:12625372:41:0:34:0 0/1:-71.1509,0.0,-102.35:712:PASS:12611389:31:0:29:21
chr1 35007146 INV00036939 N <INV> . PASS CIEND=-333,333;CIPOS=-333,333;CHR2=chr1;END=218230521;PE=7;MAPQ=40;CT=5to5;IMPRECISE;SVLEN=183223375;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 1/1:-8.39994,-0.601999,0.0:6:LowQual:73400404:0:2:0:0 ./.:.,.,.:0:LowQual:73798054:0:0:0:0 ./.:.,.,.:0:LowQual:74429421:0:0:0:0 1/1:-3.99996,-0.300987,0.0:3:LowQual:75678394:0:1:0:0 0/0:0.0,-16.2534,-220.598:163:PASS:37140271:54:0:0:0 0/0:0.0,-13.243,-188.498:132:PASS:37005060:44:0:0:0 0/0:0.0,-0.728807,-103.899:7:LowQual:37126866:27:2:0:0 0/0:0.0,-11.7386,-159.298:117:PASS:37160600:39:0:0:0
chr1 46149176 INV00079395 N <INV> . LowQual CIEND=-392,392;CIPOS=-392,392;CHR2=chr1;END=189247039;PE=2;MAPQ=44;SR=41;SRQ=0.983974;CONSENSUS=ATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGAGATTACAGGCGTGAGCCACCGCGCCCGGCCTGAATATTTTTTATAACCTCTCTAAATCTTTTAAAAAATTAATTTCATTCAGTAACACATTAGTCCAGTAACAGAAGGA;CT=3to3;PRECISE;SVLEN=143097863;SVTYPE=INV;SVMETHOD=EMBL.DELLYv0.5.5;SOMATIC GT:GL:GQ:FT:RC:DR:DV:RR:RV 0/1:-6.99389,0.0,-15.7939:70:PASS:54623914:16:0:4:2 0/0:0.0,-0.602025,-8.79997:6:LowQual:55050544:11:0:2:0 0/1:-7.0949,0.0,-11.6949:71:PASS:55743916:11:0:3:2 0/1:-0.788952,0.0,-39.5887:8:LowQual:56601000:10:0:10:1 0/1:-32.7821,0.0,-43.1786:328:PASS:27661349:31:0:12:9 0/1:-15.1919,0.0,-14.7918:148:PASS:27624697:24:0:4:4 0/1:-28.8884,0.0,-11.8886:119:PASS:27675359:18:1:4:8 0/1:-69.7754,0.0,-14.4763:145:PASS:27705263:22:0:5:19

xiaok...@gmail.com

unread,
Oct 29, 2014, 4:38:58 PM10/29/14
to delly...@googlegroups.com
Also, another related question is how you filter the somatic sv event using these samples information/columns?
Thank you.
-W

Tobias Rausch

unread,
Oct 30, 2014, 11:12:31 AM10/30/14
to xiaok...@gmail.com, delly...@googlegroups.com
There is now a python script to fix the REF allele:

python addRefAllele.py -v in.vcf -r <ref.fa> -o out.vcf

QUAL is optional and Delly doesn't set this value but this might change in future releases. 

The sample columns contain the genotypes as described in the VCF-Spec (http://samtools.github.io/hts-specs/VCFv4.1.pdf).
Some further information on this is in the VCF header (and previous posts in this delly group).

There is no detailed Delly manual but there is a preliminary python script to filter somatic variants (python/somaticFilter.py).

-Tobias




--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.
To post to this group, send email to delly...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

xiaok...@gmail.com

unread,
Oct 30, 2014, 12:11:13 PM10/30/14
to delly...@googlegroups.com, xiaok...@gmail.com
Tobias,

Thank you for your prompt answer.
But I am still confused how I can tell where the event, such as INV00038236N occur? I mean in which sample? Does it occur in at least one sample among all the samples tested? If so, what samples are they? I cannot find such information. I tried to figure out this from VCF headers but I cannot. Maybe I missed  sth. The vcf header is as below. Could you clarify a bit on this? I am pretty new to this new version may be I got stupid for some points. Thank you very much.

##fileformat=VCFv4.1
##fileDate=20141009
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="PE confidence interval around END">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="PE confidence interval around POS">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=PE,Number=1,Type=Integer,Description="Paired-end support of the structural variant">
##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends">
##INFO=<ID=SR,Number=1,Type=Integer,Description="Split-read support">
##INFO=<ID=SRQ,Number=1,Type=Float,Description="Split-read consensus alignment quality">
##INFO=<ID=CONSENSUS,Number=1,Type=String,Description="Split-read consensus sequence">
##INFO=<ID=CT,Number=1,Type=String,Description="Paired-end signature induced connection type">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of the SV">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic structural variant.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Log10-scaled genotype likelihoods for RR,RA,AA genotypes">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=FT,Number=1,Type=String,Description="Per-sample genotype filter">
##FORMAT=<ID=RC,Number=1,Type=Integer,Description="Normalized high-quality read count for the SV">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# high-quality reference pairs">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality variant pairs">
##FORMAT=<ID=RR,Number=1,Type=Integer,Description="# high-quality reference junction reads">
##FORMAT=<ID=RV,Number=1,Type=Integer,Description="# high-quality variant junction reads">
##FILTER=<ID=LowQual,Description="PE support below 3 or mapping quality below 20.">
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=TRA,Description="Translocation">
##ALT=<ID=INS,Description="Insertion">
Reply all
Reply to author
Forward
0 new messages