Dear Tobias,
I'm calling somatic SVs in WGS data of murine hematopoietic stem cell clones with their respective germline controls. In some of my samples, I get final variants that are in close proximity or overlapping (mostly deletions), for example:
2 121200399 DEL00002896 T <DEL> 1140 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121204332;PE=19;MAPQ=60;CT=3to5;CIPOS=-369,369;CIEND=-369,369;RDRATIO=0.626451;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-101.938,0,-221.938:10000:PASS:591:643:499:1:40:20:0:0
2 121209355 DEL00002906 G <DEL> 2820 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121215666;PE=47;MAPQ=60;CT=3to5;CIPOS=-222,222;CIEND=-222,222;RDRATIO=0.636963;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-253.402,0,-259.002:10000:PASS:912:1038:696:1:48:47:0:0
2 121216175 DEL00002910 A <DEL> 2460 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121227644;PE=41;MAPQ=60;CT=3to5;CIPOS=-78,78;CIEND=-78,78;RDRATIO=0.83588;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-219.51,0,-247.809:10000:PASS:1204:1784:924:2:47:41:0:0
2 121221254 DEL00002912 A <DEL> 2460 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121227644;PE=41;MAPQ=60;CT=3to5;CIPOS=-381,381;CIEND=-381,381;RDRATIO=0.914554;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-219.811,0,-242.11:10000:PASS:540:988:546:2:46:41:0:0
2 121244099 DEL00002921 G <DEL> 2880 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121248162;PE=48;MAPQ=60;CT=3to5;CIPOS=-347,347;CIEND=-347,347;RDRATIO=0.777474;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-259.101,0,-259.101:10000:PASS:411:621:386:2:48:48:0:0
2 121248293 DEL00002928 A <DEL> 2640 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121253600;PE=44;MAPQ=60;CT=3to5;CIPOS=-419,419;CIEND=-419,419;RDRATIO=0.892257;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-234.8,0,-288.8:10000:PASS:480:880:505:2:53:44:0:0
2 121252038 DEL00002932 G <DEL> 2820 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121256522;PE=47;MAPQ=60;CT=3to5;CIPOS=-353,353;CIEND=-353,353;RDRATIO=0.847392;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-252.499,0,-276.499:10000:PASS:443:742:429:2:51:47:0:0
2 121253733 DEL00002934 T <DEL> 2100 PASS IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121259143;PE=35;MAPQ=60;CT=3to5;CIPOS=-421,421;CIEND=-421,421;RDRATIO=0.8788;SOMATIC;AC=1;AN=2 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-183.208,0,-296.808:10000:PASS:563:900:464:2:54:35:0:0
Since they are all called imprecisely, I suspected that this might be one large deletion or at least not 8 separate ones. Is there a good way to decide whether that is the case?
I also looked at the regions in IGV, and I don't see any evidence for some of the deletions. These regions that are called as deletions have coverages which correspond to our mean coverage (~ 40x), but are framed between peaks with abnormally high coverage (~ 200x, see screenshot), so I suspect that these are just faulty calls.
Is this expected behavior around these piles of reads and should I inspect all calls manually? Or might there be a different problem?
Thanks a lot for your help!
Viktoria
Workflow: delly call -x <EXCLUDE_REGIONS> -o <COLONY1.bcf> -g <REF.fa> <COLONY1.bam> <CONTROL1.bam>
delly filter -f somatic -o <COLONY1.pre.bcf> -a 0.0 -s <SAMPLES.tsv> -v 0 <COLONY1.bcf>
delly call -o <COLONY1.gt.bcf> -x <EXCLUDE_REGIONS> -g <REF.fa> -v <COLONY1.pre.bcf> <COLONY1.bam> <CONTROL1.bam> ... <CONTROL17.bam>
delly filter -f somatic -o <COLONY1.bcf> -a 0.3 -r 0.75 --pass -s <SAMPLES.tsv> -v 10 -c 0 <COLONY1.gt.bcf>
followed by filtering for genotype qualities of >= 15 and coverages between 10-100x