BEDTools midpoint command

745 views
Skip to first unread message

ara...@bu.edu

unread,
Aug 17, 2015, 3:41:12 PM8/17/15
to bedtools-discuss
Hi,

I was wondering if there was an easy/straightforward way to calculate the midpoint for each record in a BED file?  Currently, I'm using a combination shell and R script to do the following:

For each BED record:
Calculate the peak width (end - start)
(For peak widths that are odd numbers, I do peak_width + 1)
Calculate the midpoint: peak_midpoint <- Peak_Data$"start" + (Peak_Data$Peak_Width/2)
I then add a certain number of base pairs to the left and right of the midpoint location to get midpoint centered windows.

Obtaining these midpoint centered windows is useful for various analysis including motif enrichment/distribution, aggregate plots, etc.

Is there a more sophisticated way of doing these operations using BEDtools commands?  Would it be useful to have a BEDTools command that does this (with various options)?

Thanks,
Andy

​--
Andy Rampersaud
Graduate Student, Bioinformatics
Waxman Lab, Boston University

pierre khoueiry

unread,
Aug 18, 2015, 11:29:14 AM8/18/15
to bedtools...@googlegroups.com
Does this satisfy your criteria. one liner awk:

$ cat a.bed
chr1    10      20      bed1
chr1    10      21      bed2

$ awk -vOFS="\t" -vEXT=4 'width=$3-$2 {if(width % 2 != 0) {width+=1} ; mid=$2+width/2; print $1,mid-EXT,mid+EXT,$4}' a.bed
chr1    11      19      bed1
chr1    12      20      bed2

you can creat a shell script and pass the bed file and extension size as argument et voila.
P.

--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
======================
pierre khoueiry, Ph.D.
bioinformatics
Heidelberg, Germany
======================

ara...@bu.edu

unread,
Aug 20, 2015, 4:39:22 PM8/20/15
to bedtools-discuss
Hi Pierre,

Thanks for the one liner awk command!  The command does indeed work for obtaining the midpoint centered regions for a BED file.  It looks like I added 1 to the peak width to deal with a a 1-based start coordinate but other than that my script(s) give the same result as the one liner.  I will currently keep using my scripts (just because they do additional summaries that are useful to check the width before and after) but I will definitely keep these types of awk commands in mind for doing operations concisely.

Thanks for your help,
Andy
Reply all
Reply to author
Forward
0 new messages