Re: [bedtools-discuss] intersectbed with vcf structural variants

281 views
Skip to first unread message

Aaron Quinlan

unread,
May 1, 2013, 9:34:25 AM5/1/13
to bedtools...@googlegroups.com
Hi Jonathan,

(1) Can you confirm that intersectbed will work for an arbitrary length (potentially very large deletions) using the "short deletion" format?  

That is correct, this is not yet supported.

(2) Also, does intersectbed simply calculate the length of the deletion from the alleles in the REF and ALT column and ignore the SVLEN, END and other info tags?

Currently, yes, though it shouldn't be hard to improve.  We have been holding off on this because we are building a new abstract system for supporting all of the different genomics file formats.  That said, this is simple enough that it might be worth including before the tech. I just mentioned is available.

(3) Is there any plan to support the alternate form of deletions in the future?
Yep, definitely on the list.


On Apr 30, 2013, at 10:12 PM, jonathan...@gmail.com wrote:

Hi Aaron,

I have a vcf that uses the structural variant format of describing a simple, albeit large deletions.  The example below is directly from the 1000 genomes VCFv4.1 specification:

cat example.vcf
##fileformat=VCFv4.1
##fileDate=20100501
##reference=1000GenomesPilot-NCBI36
##assembly=ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/sv/breakpoint_assemblies.fasta
##INFO=<ID=BKPTID,Number=.,Type=String,Description="ID of the assembled alternate allele in the assembly file">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END for imprecise variants">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS for imprecise variants">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##ALT=<ID=DEL,Description="Deletion">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype quality">
#CHROM  POS ID REF ALT           QUAL  FILTER  INFO                                                                                          FORMAT        NA00001
2 321682    .  T   <DEL>         6     PASS    IMPRECISE;SVTYPE=DEL;END=321887;SVLEN=-205;CIPOS=-56,20;CIEND=-10,62                          GT:GQ         0/1:12

cat example.bed
2    321685    321690

intersectBed -a example.bed -b example.vcf -f 1.0 -wa -wb

[ returns nothing ]

It appears that bedtools does not recognize this alternative means of describing a deletion (i.e., with the ##ALT=<ID=DEL,Description="Deletion"> header and <DEL> in the alternate allele column.  I tried simply replacing the REF column with the 206 bp sequence of bases and the ALT column with a single base per the typical short deletion format and all seemed to work well.

Three questons:
(1) Can you confirm that intersectbed will work for an arbitrary length (potentially very large deletions) using the "short deletion" format? 

(2) Also, does intersectbed simply calculate the length of the deletion from the alleles in the REF and ALT column and ignore the SVLEN, END and other info tags?

(3) Is there any plan to support the alternate form of deletions in the future?

Thanks in advance.

Jonathan


--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Aaron Quinlan

unread,
May 1, 2013, 8:22:17 PM5/1/13
to bedtools...@googlegroups.com, jonathan...@gmail.com
Yep, it should indeed.







On May 1, 2013, at 10:16 AM, jonathan...@gmail.com wrote:

Just need a clarification on question (1):

My plan is to convert all deletions, including those with megabase-sized deletions to the standard, "small deletion" format.  What I want to do is identify all features (in a .bed file) that are within the deleted regions.  This should work with the current intersectbed, right?

Jonathan

Albert Vilella

unread,
Jul 31, 2013, 6:58:16 AM7/31/13
to bedtools...@googlegroups.com, jonathan...@gmail.com, Aaron Quinlan
Hi,

Is it the case that it is still not possible to compare vcf files for Structural Variants?

A.
Reply all
Reply to author
Forward
0 new messages