output overlap for each region in a file

7 views
Skip to first unread message

francy....@gmail.com

unread,
Sep 20, 2015, 11:40:45 PM9/20/15
to bedops-discuss
Dear Experts,

I have a sorted large BED file (main.bed) and several regions defined in a file (regions.bed) which can be overlapping with each other. I am trying to find a quick way to output the overlaps from the main BED file with the regions defined in regions.bed, and create a separate BED output file for each of these regions. 

"bedops --everything" seems maybe would work, but  is there a way to output different intersects between main.bed and regions.bed in a different file for each line of regions.bed?

Thank you very much for any suggestions,
Fra

Alex Reynolds

unread,
Sep 21, 2015, 1:50:33 AM9/21/15
to francy....@gmail.com, bedops-discuss

You could pipe the results of BEDOPS bedmap to split:

$ bedmap --echo --echo-map regions.bed main.bed | split -l 1 -a 5 - bedmap_result_

For each line in regions.bed, you would get files called bedmap_result_aaaaabedmap_result_aaaab, and so on.

See man split for more information on the -l and -a options used in this example.

francesca casalino

unread,
Sep 21, 2015, 9:04:02 AM9/21/15
to Alex Reynolds, bedops-discuss

This is not giving me what I was looking for though, it gives me many output files with 1 line indicating the overlap between the two bed files, but what I was hoping to get is, in each output file, the subset of overlaps in main.bed that are included in each region defined by regions.bed. So I am looking to obtain one output file for each region which should contain many entries since main.bed is based on a 1 base pair while regions is many base pairs. This will also mean that several entries in the output files will be duplicated with other output files since the regions can be overlapping... Is this possible to do?

One example is:

cat main.bed

    chr11    13302   13303   1 

    chr11    13327   13328   1 

   chr11    13980   13981   1  

    chr11    30923   30924   1  

    chr11    51476   51477   1  

    chr11    51479   51480   1   

cat regions.bed: 

    chr11   13202      14981       2     

    chr11   13980        41480        2      

    chr11    30923        51480        2      

 

And the output should be:

cat res.region1 

    chr11    13302   13303   1 

    chr11    13327   13328   1 

   chr11    13980   13981   1  

cat res.region2

   chr11    13980   13981   1  

    chr11    30923   30924   1  

cat res.region3

    chr11    30923   30924   1  

    chr11    51476   51477   1  

    chr11    51479   51480   1 

francesca casalino

unread,
Sep 21, 2015, 9:04:43 AM9/21/15
to Alex Reynolds, bedops-discuss
Thank you Alex!

But this is not giving me what I was looking for though, it gives me many output files with 1 line indicating the overlap between the two bed files, but what I was hoping to get is, in each output file, the subset of overlaps in main.bed that are included in each region defined by regions.bed. So I am looking to obtain one output file for each region which should contain many entries since main.bed is based on a 1 base pair while regions is many base pairs. This will also mean that several entries in the output files will be duplicated with other output files since the regions can be overlapping... Is this possible to do?

One example is:

cat main.bed

    chr11    13302   13303   1 

    chr11    13327   13328   1 

   chr11    13980   13981   1  

    chr11    30923   30924   1  

    chr11    51476   51477   1  

    chr11    51479   51480   1   

cat regions.bed: 

    chr11   13202      14981       2     

    chr11   13980        41480        2      

    chr11    30923        51480        2      

 

And the output should be:

cat res.region1 

    chr11    13302   13303   1 

    chr11    13327   13328   1 

   chr11    13980   13981   1  

cat res.region2

   chr11    13980   13981   1  

    chr11    30923   30924   1  

cat res.region3

    chr11    30923   30924   1  

    chr11    51476   51477   1  

    chr11    51479   51480   1 

Reply all
Reply to author
Forward
0 new messages