Genotype with Delly2

1,142 views
Skip to first unread message

Ana Pinharanda

unread,
Jan 26, 2016, 6:54:12 AM1/26/16
to delly-users
Dear Tobias,

I want to use Delly to genotype a SV discovery file created by merging Delly, Pindel and CNVnator outputs from several different samples (similar pipeline to Zichner et al 2013).

The SV discovery vcf looks like this:

##fileformat=VCFv4.1                                                                    

##fileDate=20160125                                                                     

##ALT=<ID=DEL,Description="Deletion">                                                                   

##ALT=<ID=DUP,Description="Duplication">                                                                        

##ALT=<ID=INV,Description="Inversion">                                                                  

##ALT=<ID=TRA,Description="Translocation">                                                                      

##ALT=<ID=INS,Description="Insertion">                                                                  

##FILTER=<ID=LowQual,Description="PE/SR support below 3 or mapping quality below 20.">                                                                  

##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation">                                            

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">                                                              

##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">                                                                        

##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">                                                              

##INFO=<ID=INSLEN,Number=1,Type=Integer,Description="Predicted length of the insertion">                                                                

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">                                                                    

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  H.m.cydno

Hmel201002      1027921 cydno_DUP1      N       <DUP>   .       PASS    IMPRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201002;END=1031652 GT      0/1




I have then ran the following

./delly
-t DUP \
-g /path/to/genome/fasta
-v [sv dup vcf]
-o all_samples_w_list_mp_scf_90_all_pipeline.vcf \
[list_of_bams_to_genotype] \ 



And I get that none of my samples has enough confidence in the genotype.

##bcftools_viewVersion=1.0-11-g243d691+htslib-1.0-1-g1f1e3f6

##bcftools_viewCommand=view all_samples_w_list_cp_scf_90_all_pipeline.vcf

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  c511.SRX398362.10113.Hmel2.rmdup.realign        c512.SRX398363.10113.Hmel2.rmdup.realign        c513_REP.SRX398364.10113.Hmel2.rm

Hmel201002      311415  DUP00000011     N       <DUP>   0       LowQual IMPRECISE;SVTYPE=DUP;SVMETHOD=EMBL.DELLYv0.7.2;CHR2=Hmel201002;END=311416;INSLEN=0;PE=0;MAPQ=0;CT=NtoN;CIPOS=0,0;CIEND=0,0 




This happens even if I just run the genotyper on the Delly vcf original output with bams from samples that had SVs called by the Delly run.

Do you have any idea on what I might be able to solve this problem? Am I missing something in the input?
I originally called the SVs with the older version of Delly (0.6.1).


Thank you so much in advance,

Kind regards,

Ana 





Ana Pinharanda

unread,
Jan 26, 2016, 11:23:39 AM1/26/16
to delly-users
Dear Tobias,

Just a quick update,

Actually when I run the actual Delly output file through the genotyping the error message is slightly different than what I had previously said.

Previously I was trying with the variants supported by less than 3 reads already filtered out.
Having another go at trying to just run the actual unfiltered output this is what I get:


alp66@butterfly--bio:/disk2/alp66/delly_v2_genome/marcus_melp_v2$ /whale-data/alp66/bin/svprops/src/svprops delly.dup.melp.marcus.sv.vcf.gz > sv_dup_del.tab

Segmentation fault (core dumped)

Ana Pinharanda

unread,
Jan 26, 2016, 11:40:53 AM1/26/16
to delly-users
Dear Tobias,

Sorry, for clogging the group with messages.
I have realised my last post related to svprops (not genotyping). 

Apologize for that. However, I also do not seem to be able to run svprops. Not sure if you have any pointers on how I could solve the issue.

Thank you in advance for your time,

Best,

Ana



terça-feira, 26 de Janeiro de 2016 às 11:54:12 UTC, Ana Pinharanda escreveu:

Ana Pinharanda

unread,
Jan 26, 2016, 12:11:40 PM1/26/16
to delly-users
PS.

 sampleprops works fine.


Thanks again,

Ana



--
You received this message because you are subscribed to the Google Groups "delly-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.
To post to this group, send email to delly...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tobias Rausch

unread,
Jan 27, 2016, 9:23:04 AM1/27/16
to Ana Pinharanda, delly-users
Dear Ana,

The INFO fields are only populated during SV discovery, re-genotyping is meant to populate the genotype fields for every sample that you included in your re-genotype run. It looks like you created a consensus SV site list from Delly, Pindel and CNVnator and indeed if you keep the Delly format, Delly should be able to re-genotype this SV site list. Please post not columns 1-8 but the latter columns because that is were Delly's re-genotyping puts the support for every sample. 

Regarding svprops, this tool is still in development but if you can share the re-genotyped VCF I can check why it is crashing. It is probably because you seem to have provided some default values in your merged site list that never occur in Delly such as CT=NtoN for a duplication. But I am happy to check, glad to see people start using this tool to check their SVs.

Best, Tobias



Ana Pinharanda

unread,
Jan 29, 2016, 11:33:43 AM1/29/16
to Tobias Rausch, delly-users
Dear Tobias,

I have been doing some simple experiments based on your reply.

I came to the conclusion that if I use exactly Delly’s output to genotype and then the genotype vcf to run svprops, it doesn’t crash and everything looks fine.

This means that the problem is in the vcf I am using as the input of the Delly genotyper like you have suggested.
Could you please tell me what are required fields for the genotype module of Delly work?

Because I have merged different predictions from different programs and then ran a local assembler to increase the sv breakpoint confidence I am missing a lot of information that is present in the “standard” Delly output…

This is what the vcf with the putative calls and breakpoints looks like:

##fileformat=VCFv4.1                                                    
##fileDate=20160125                                                     
##ALT=<ID=DEL,Description="Deletion">                                                   
##ALT=<ID=DUP,Description="Duplication">                                                        
##ALT=<ID=INV,Description="Inversion">                                                  
##ALT=<ID=TRA,Description="Translocation">                                                      
##ALT=<ID=INS,Description="Insertion">                                                  
##FILTER=<ID=LowQual,Description="PE support below 3 or mapping quality below 20.">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type        of      structural      variant">                               
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type      of      approach        used    to      detect  SV">    
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for END coordinate in case of a translocation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End   position        of      the     structural      variant">               
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
Hmel201001      7416    melp_DUP1       N       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201001;END=7950
Hmel201002      121880  melp_DUP2       N       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201002;END=127904
Hmel201002      1514588 melp_DUP3       N       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201002;END=1516308
Hmel201002      1812501 melp_DUP4       N       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201002;END=1831898
Hmel201002      1816740 melp_DUP5       N       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201002;END=1829262
Hmel201002      1953623 melp_DUP6       N       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201002;END=1954612
Hmel201002      1959754 melp_DUP7       N       <DUP>   .       PASS    PRECISE;SVTYPE=DUP;SVMETHOD=WholePipeline;CHR2=Hmel201002;END=1966198




Thank you so much in advance for your help,

All the best,

Ana

Ana Pinharanda

unread,
Feb 3, 2016, 12:45:27 PM2/3/16
to delly-users, rausc...@gmail.com
Dear Tobias,

I was just wondering if you had time to look into my question.
Maybe I didn't explain it too well, so here it goes:

What are the minimum fields required for a vcf that is to be the input of Delly2 (re)genotype?


Thank you so much in advance for your time,

All the best,

Ana


To unsubscribe from this group and stop receiving emails from it, send an email to delly-users+unsubscribe@googlegroups.com.

Tobias Rausch

unread,
Feb 4, 2016, 6:40:45 AM2/4/16
to Ana Pinharanda, delly-users
The required INFO fields are:

For breakpoint precise SVs:
PRECISE, SVTYPE, END, PE, MAPQ, SR, SRQ, CONSENSUS, CIPOS, CIEND, CT, CHR2

For imprecise SVs:
IMPRECISE, SVTYPE, END, PE, MAPQ, CIPOS, CIEND, CT, CHR2

Best, Tobias




To unsubscribe from this group and stop receiving emails from it, send an email to delly-users...@googlegroups.com.

Ana Pinharanda

unread,
Feb 6, 2016, 1:17:28 PM2/6/16
to delly-users
Thank you so much for the help Tobias - It is all working great now.

Best,

Ana

terça-feira, 26 de Janeiro de 2016 às 11:54:12 UTC, Ana Pinharanda escreveu:

Ana Pinharanda

unread,
Feb 8, 2016, 9:32:49 AM2/8/16
to delly-users, Tobias Rausch
Dear Tobias,

Would it be possible for you to describe with a bit more detail what are the fields in the svprops statistics.
I think I am pretty sure of most of them but it would be extremely helpful to know in grater detail what/how are these calculated:

$11 refratio
$12 altratio
$17 rdratio


My knowledge of C is virtually null so I couldn’t come to a conclusion by looking at the code.

Thank you again,

Best,

Ana
 

Tobias Rausch

unread,
Feb 15, 2016, 4:05:16 AM2/15/16
to Ana Pinharanda, delly-users
Dear Ana,

refratio is the median support of the variant, using DV/(DR+DV) for impresice SVs and RV/(RR+RV) for precise SVs, among all samples with a homozygous reference genotype. This should be ideally 0 (no support for the alternative allele) in hom. reference samples.

altratio is the same but for all heterozygous carriers (GT=0/1). For a typical het. germline variant this value should be ~0.5

rdratio is the median read-depth ratio between heterozygous carriers and homozygous reference samples. This value should be ~0.5 for het. deletions and ~1.5 for het. duplications (which is also used in Delly's cnv filtering script).

Best, Tobias


Reply all
Reply to author
Forward
0 new messages