Re: Digest for bedops-discuss@googlegroups.com - 4 updates in 1 topic

9 views
Skip to first unread message

Shane Neph

unread,
Sep 21, 2015, 9:05:00 PM9/21/15
to bedops-...@googlegroups.com
How about:
bedmap --echo --echo-map regions.bed main.bed

For each row of regions.bed, you will get a semicolon separated list of regions from main.bed that overlap.  Add --skip-unmapped if you don't want an output for the case where an element of regions.bed has no overlapping element in main.bed.  Here, I included --echo which will output information from regions.bed too.
Example output:

chr11 13202 14981 2|chr11 13302 13303 1;chr11 13327 13328 1;chr11 13980 13981 1
chr11 13980 41480 2|chr11 13980 13981 1;chr11 30923 30924 1
chr11 30923 51480 2|chr11 30923 30924 1;chr11 51476 51477 1;chr11 51479 51480 1

You can use --delim to change the '|' or --multidelim to change the ';'.  Follow-on cut or awk statements would likely follow.  I'd argue that you should try to keep everything in one file rather than multiple - it requires a bit more coding but scales much better if regions.bed has a lot of entries.

Shane

On Mon, Sep 21, 2015 at 4:13 PM, <bedops-...@googlegroups.com> wrote:
francy....@gmail.com: Sep 20 08:40PM -0700

Dear Experts,
 
I have a sorted large BED file (main.bed) and several regions defined in a
file (regions.bed) which can be overlapping with each other. I am trying to
find a quick way to output the overlaps from the main BED file with the
regions defined in regions.bed, and create a separate BED output file for
each of these regions.
 
"bedops --everything" seems maybe would work, but is there a way to output
different intersects between main.bed and regions.bed in a different file
for each line of regions.bed?
 
Thank you very much for any suggestions,
Fra
Alex Reynolds <alexpr...@gmail.com>: Sep 20 10:50PM -0700

You could pipe the results of BEDOPS *bedmap* to *split*:
 
$ bedmap --echo --echo-map regions.bed main.bed | split -l 1 -a 5 -
bedmap_result_
 
For each line in *regions.bed*, you would get files called
*bedmap_result_aaaaa*, *bedmap_result_aaaab*, and so on.
 
See *man split* for more information on the *-l* and *-a* options used in
this example.
 
francesca casalino <francy....@gmail.com>: Sep 21 09:04AM -0400

This is not giving me what I was looking for though, it gives me many
output files with 1 line indicating the overlap between the two bed files,
but what I was hoping to get is, in each output file, the subset of
overlaps in main.bed that are included in each region defined by
regions.bed. So I am looking to obtain one output file for each region
which should contain many entries since main.bed is based on a 1 base pair
while regions is many base pairs. This will also mean that several entries
in the output files will be duplicated with other output files since the
regions can be overlapping... Is this possible to do?
 
One example is:
 
cat main.bed
 
chr11 13302 13303 1
 
chr11 13327 13328 1
 
chr11 13980 13981 1
 
chr11 30923 30924 1
 
chr11 51476 51477 1
 
chr11 51479 51480 1
 
cat regions.bed:
 
chr11 13202 14981 2
 
chr11 13980 41480 2
 
chr11 30923 51480 2
 
 
 
And the output should be:
 
cat res.region1
 
chr11 13302 13303 1
 
chr11 13327 13328 1
 
chr11 13980 13981 1
 
cat res.region2
 
chr11 13980 13981 1
 
chr11 30923 30924 1
 
cat res.region3
 
chr11 30923 30924 1
 
chr11 51476 51477 1
 
chr11 51479 51480 1
 
francesca casalino <francy....@gmail.com>: Sep 21 09:04AM -0400

Thank you Alex!
 
But this is not giving me what I was looking for though, it gives me many
output files with 1 line indicating the overlap between the two bed files,
but what I was hoping to get is, in each output file, the subset of
overlaps in main.bed that are included in each region defined by
regions.bed. So I am looking to obtain one output file for each region
which should contain many entries since main.bed is based on a 1 base pair
while regions is many base pairs. This will also mean that several entries
in the output files will be duplicated with other output files since the
regions can be overlapping... Is this possible to do?
 
One example is:
 
cat main.bed
 
chr11 13302 13303 1
 
chr11 13327 13328 1
 
chr11 13980 13981 1
 
chr11 30923 30924 1
 
chr11 51476 51477 1
 
chr11 51479 51480 1
 
cat regions.bed:
 
chr11 13202 14981 2
 
chr11 13980 41480 2
 
chr11 30923 51480 2
 
 
 
And the output should be:
 
cat res.region1
 
chr11 13302 13303 1
 
chr11 13327 13328 1
 
chr11 13980 13981 1
 
cat res.region2
 
chr11 13980 13981 1
 
chr11 30923 30924 1
 
cat res.region3
 
chr11 30923 30924 1
 
chr11 51476 51477 1
 
chr11 51479 51480 1
 
You received this digest because you're subscribed to updates for this group. You can change your settings on the group membership page.
To unsubscribe from this group and stop receiving emails from it send an email to bedops-discus...@googlegroups.com.

Shane Neph

unread,
Sep 22, 2015, 6:53:34 AM9/22/15
to bedops-...@googlegroups.com
Consider:
bedmap --echo-ref-name --echo-map regions.bed main.bed \
  | awk -F"|" 'BEGIN {OFS="\t"} ; { lng=split($2, arr, ";"); for(i=1; i<=lng; ++i) { print arr[i]"\t"$1 } }' \
  | sort-bed -

The output is all in 1 file and otherwise duplicate entries from main.bed have a column that shows to which element in regions.bed it belongs to.

Example output:
chr11   13302   13303   1   chr11:13202-14981                                                                                                                                                                      
chr11   13327   13328   1   chr11:13202-14981                                                                                                                                                                      
chr11   13980   13981   1   chr11:13202-14981                                                                                                                                                                      
chr11   13980   13981   1   chr11:13980-41480                                                                                                                                                                      
chr11   30923   30924   1   chr11:13980-41480                                                                                                                                                                      
chr11   30923   30924   1   chr11:30923-51480                                                                                                                                                                      
chr11   51476   51477   1   chr11:30923-51480                                                                                                                                                                      
chr11   51479   51480   1   chr11:30923-51480

Shane

Reply all
Reply to author
Forward
0 new messages