sortBed

1,039 views
Skip to first unread message

Gaurav Singhal

unread,
Apr 4, 2011, 12:06:32 AM4/4/11
to bedtools-discuss
Hi,

It seems like sortBed sorts the first col (chromosome#)
lexicographically. This means entries of chr1 are followed by chr10
and chr11 and so on. chr2 comes after chr19. Is there a special reason
to sort the first col lexicographically or is it a bug ?

thanks

Gaurav

cjav

unread,
Apr 13, 2011, 5:53:41 PM4/13/11
to bedtools-discuss
Can't here looking for an answer to this, did you find a way to
correctly sort bed files with names for chromosomes like chr1, chr2,
chr10, chr11, etc?

On Apr 4, 12:06 am, Gaurav Singhal <gaurav.singhal...@gmail.com>
wrote:
> Hi,
>
> It seems likesortBedsorts the first col (chromosome#)

liguo wang

unread,
Apr 13, 2011, 6:36:14 PM4/13/11
to bedtools...@googlegroups.com
In most cases, you don't need to sort bed files that way. But if you really want to, below is a quick (and dirty) solution (on Linux or MAC):

1)  perl -p -i.bak -e 's/^chr([1-9])\b/chr0$1/g'  test.bed   # change chr1, chr2 into chr01, chr02, etc. 
2) sort -k1,1 -k2,2n    test.bed >test.sorted.bed #sort 
3) perl -p -i.bak -e 's/^chr0([1-9])\b/chr$1/g'   test.sorted.bed #change chr01, chr02 back into chr1, chr2, etc

Files like "test.bed.bak" and "test.sorted.bed.bak" are your orignal backup files. Just delete these files if everything is OK


-Liguo Wang

Baylor College of Medicine

Aaron Quinlan

unread,
Apr 13, 2011, 6:47:10 PM4/13/11
to bedtools...@googlegroups.com
Hi,

In the thread below, a few approaches are discussed, Gordon Assaf proposes the most facile, which is to use the new "sort -V" option in GNU sort.

http://groups.google.com/group/bedtools-discuss/browse_thread/thread/f4b4c03319b0de52?hl=en_US

Best,
Aaron

Carlos Javier Borroto

unread,
Apr 13, 2011, 9:10:07 PM4/13/11
to bedtools...@googlegroups.com
Oh thank you, I actually finally came with a similar solution:
cat my.bed | sed -e 's/^chr//' | sort -k 1,1 -k2,2 -n | sed -e 's/^/chr/'

BTW I need this to build a bigBed file to load to UCSC Genome Browser
and they ask for the bigBed file to be sorted this way.

thanks,
--
Carlos Borroto
Baltimore, MD

Carlos Javier Borroto

unread,
Apr 13, 2011, 9:11:32 PM4/13/11
to bedtools...@googlegroups.com
I'm on a Mac, I guess I don't have the right version of sort, cause -k
1,1V doesn't work here.

Thanks,


--
Carlos Borroto
Baltimore, MD

Aaron Quinlan

unread,
Apr 13, 2011, 9:13:37 PM4/13/11
to bedtools...@googlegroups.com
That's right, the "-V" option came in version 7.0. However, one can download and install much newer versions of GNU utils here:

http://www.gnu.org/software/software.html

Best,
Aaron

Assaf Gordon

unread,
Apr 13, 2011, 9:22:09 PM4/13/11
to bedtools...@googlegroups.com
If you're using Jim Kent's command line programs to generate a bigBed,
then you probably have "bedSort" too (note that it's Kent's "bedSort", not Aaron's "sortBed").
It should (I think) sort a bed file in a way that's appropriate for bedtoBigBed program.

-gordon

liguo wang

unread,
May 1, 2011, 1:58:50 PM5/1/11
to bedtools...@googlegroups.com
Hi,
Some one may already asked this question. But I wonder what's the difference between sortBed(BED-tools), bedSort (Kent's utilities) and linux built-in sort command? Which one is faster for big file?

Thanks,

-Liguo



Aaron Quinlan

unread,
May 1, 2011, 3:45:50 PM5/1/11
to bedtools...@googlegroups.com
Hi Liguo,

sortBed and bedSort are conceptually the same --- they are both meant as convenience tools.  The best way to sort a BED file is with plain old GNU sort.  sortBed from BEDTools will probably use the most memory of the 3 and I would expect GNU sort to be the fastest and scale the best for larger files, especially if you provide ample memory via the "-S" option.  sortBed is likely the slowest of the 3.

Best,
Aaron


liguo wang

unread,
May 1, 2011, 4:46:54 PM5/1/11
to bedtools...@googlegroups.com
Very helpful. Thanks,

-Liguo



Reply all
Reply to author
Forward
0 new messages