Consecutive deletions, missing evidence in IGV

114 views
Skip to first unread message

Viktoria Flore

unread,
Mar 8, 2023, 5:11:17 AM3/8/23
to delly-users
Dear Tobias,
I'm calling somatic SVs in WGS data of murine hematopoietic stem cell clones with their respective germline controls. In some of my samples, I get final variants that are in close proximity or overlapping (mostly deletions), for example:

2       121200399       DEL00002896     T       <DEL>   1140    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121204332;PE=19;MAPQ=60;CT=3to5;CIPOS=-369,369;CIEND=-369,369;RDRATIO=0.626451;SOMATIC;AC=1;AN=2     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-101.938,0,-221.938:10000:PASS:591:643:499:1:40:20:0:0
2       121209355       DEL00002906     G       <DEL>   2820    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121215666;PE=47;MAPQ=60;CT=3to5;CIPOS=-222,222;CIEND=-222,222;RDRATIO=0.636963;SOMATIC;AC=1;AN=2     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-253.402,0,-259.002:10000:PASS:912:1038:696:1:48:47:0:0  
2       121216175       DEL00002910     A       <DEL>   2460    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121227644;PE=41;MAPQ=60;CT=3to5;CIPOS=-78,78;CIEND=-78,78;RDRATIO=0.83588;SOMATIC;AC=1;AN=2  GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-219.51,0,-247.809:10000:PASS:1204:1784:924:2:47:41:0:0
2       121221254       DEL00002912     A       <DEL>   2460    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121227644;PE=41;MAPQ=60;CT=3to5;CIPOS=-381,381;CIEND=-381,381;RDRATIO=0.914554;SOMATIC;AC=1;AN=2     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-219.811,0,-242.11:10000:PASS:540:988:546:2:46:41:0:0
2       121244099       DEL00002921     G       <DEL>   2880    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121248162;PE=48;MAPQ=60;CT=3to5;CIPOS=-347,347;CIEND=-347,347;RDRATIO=0.777474;SOMATIC;AC=1;AN=2     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-259.101,0,-259.101:10000:PASS:411:621:386:2:48:48:0:0
2       121248293       DEL00002928     A       <DEL>   2640    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121253600;PE=44;MAPQ=60;CT=3to5;CIPOS=-419,419;CIEND=-419,419;RDRATIO=0.892257;SOMATIC;AC=1;AN=2     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-234.8,0,-288.8:10000:PASS:480:880:505:2:53:44:0:0
2       121252038       DEL00002932     G       <DEL>   2820    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121256522;PE=47;MAPQ=60;CT=3to5;CIPOS=-353,353;CIEND=-353,353;RDRATIO=0.847392;SOMATIC;AC=1;AN=2     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-252.499,0,-276.499:10000:PASS:443:742:429:2:51:47:0:0
2       121253733       DEL00002934     T       <DEL>   2100    PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=121259143;PE=35;MAPQ=60;CT=3to5;CIPOS=-421,421;CIEND=-421,421;RDRATIO=0.8788;SOMATIC;AC=1;AN=2       GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-183.208,0,-296.808:10000:PASS:563:900:464:2:54:35:0:0

Since they are all called imprecisely, I suspected that this might be one large deletion or at least not 8 separate ones. Is there a good way to decide whether that is the case?

I also looked at the regions in IGV, and I don't see any evidence for some of the deletions. These regions that are called as deletions have coverages which correspond to our mean coverage (~ 40x), but are framed between peaks with abnormally high coverage (~ 200x, see screenshot), so I suspect that these are just faulty calls.
Is this expected behavior around these piles of reads and should I inspect all calls manually? Or might there be a different problem?

Thanks a lot for your help!
Viktoria



Screenshot 2023-03-08 at 10.29.35.png


Workflow:
delly call -x <EXCLUDE_REGIONS> -o <COLONY1.bcf> -g <REF.fa> <COLONY1.bam> <CONTROL1.bam>
delly filter -f somatic -o <COLONY1.pre.bcf> -a 0.0 -s <SAMPLES.tsv> -v 0 <COLONY1.bcf>
delly call -o <COLONY1.gt.bcf> -x <EXCLUDE_REGIONS> -g <REF.fa> -v <COLONY1.pre.bcf> <COLONY1.bam> <CONTROL1.bam> ... <CONTROL17.bam>
delly filter -f somatic -o <COLONY1.bcf> -a 0.3 -r 0.75 --pass -s <SAMPLES.tsv> -v 10 -c 0 <COLONY1.gt.bcf>

followed by filtering for genotype qualities of >= 15 and coverages between 10-100x

Viktoria Flore

unread,
Mar 8, 2023, 5:15:22 AM3/8/23
to delly-users
PS: We are using a transgenic mouse model with an inducible knockout that I clearly see in IGV. However, Delly does not call the deletions in the respective samples, what might be the reason for this?

Tobias Rausch

unread,
Mar 8, 2023, 2:40:54 PM3/8/23
to Viktoria Flore, delly-users
Dear Viktoria,

Looks to me like a processed pseudogene that is not in the reference, i.e. a so-called retroposed processed gene transcript. Here is a nice figure for that:


If such a processed pseudogene is not in the reference you get higher coverage for the exons of the source gene and the exons are connected via deletion-type SVs and thus, you get a lot of adjacent deletion SV calls spanning the introns of the source gene.

Best, Tobias



--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/delly-users/e557cc2b-c600-4429-9a32-e3896b77df87n%40googlegroups.com.

Tobias Rausch

unread,
Mar 8, 2023, 2:47:25 PM3/8/23
to Viktoria Flore, delly-users
Dear Viktoria,

Regarding the inducible knockout, the reference sequence usually lacks the genomic constructs used for the inducible knockout and thus, paired-ends and single reads do not "reach" the deletion breakpoint in reference coordinates. Another reason could be more complex induced SVs that are split by delly into multiple primitive SV types. For large deletions, a copy-number caller like 'delly cnv' probably picks up such deletions because it looks at read-depth instead of breakpoint spanning reads.

Best, Tobias

Viktoria Flore

unread,
Mar 10, 2023, 8:56:08 AM3/10/23
to delly-users
Dear Tobias,

Thanks so much for your quick answer. I did indeed find several variant calls in regions that seem to be pseudogenes, thanks for the hint!

For some of my samples with "suspicious" calls, I found another problem while digging further into them: I have several variants, sometimes in adjacent regions, that are called as heterozygous somatic deletions (imprecise calls), but they overlap precise homozygous calls which are found both in the colony and its germline control (generally spanning shorter sequences). The precise calls fit perfectly to what I see in IGV (see example).
I saw some old threads on here with similar questions in which you recommended a python filtering script to deal with redundant calls, but I can't seem to find the script anymore. What would be your recommendation to go forward with these calls / merge them? Thanks again for your help!

Example:
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample_colony-t11_HSC-F2LL-BBS33-pipc11 sample_germline1_HSC-F2LL-BBS11-tail
7       35326203        DEL00356953     T       <DEL>   180     PASS    IMPRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=35326509;PE=3;MAPQ=60;CT=3to5;CIPOS=-50,50;CIEND=-50,50      GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/1:-30.8825,0,-60.8825:10000:PASS:7875:5014:6218:1:11:6:0:0    0/0:0,-5.1175,-102:51:PASS:6362:4001:5088:1:17:0:0:0
7       35326305        DEL00356954     T       <DEL>   1200    PASS    PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.5;END=35326524;PE=0;MAPQ=0;CT=3to5;CIPOS=-15,15;CIEND=-15,15;SRMAPQ=60;INSLEN=0;HOMLEN=15;SR=20;SRQ=0.978947;CONSENSUS=GCTTAGTTTACCTAACAAAATGTTCCCTGGTTCCATCCATACTGTTGAAAATGTCAAGATTCCTCATATAGCCCAGGCTGTCCTTGAACTCGTGACATTGGTGAGGATCACCTTGAACTCCTTATCAACCTGCCTCTACCTTCTAAGTGGTGCAACAAGATTTAAGGATTTCAATCTTTTTTTTTTTTCC;CE=1.94468     GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-142.295,-11.434,0:114:PASS:5413:255:4448:0:0:0:0:38        1/1:-130.894,-10.5301,0:105:PASS:4287:278:3672:0:0:0:0:35
Screenshot 2023-03-10 at 14.49.31.png
Reply all
Reply to author
Forward
0 new messages