bedtools multiinter strange behavior (on sorted files)

49 views
Skip to first unread message

shmuel...@gmail.com

unread,
Jun 16, 2020, 12:35:07 PM6/16/20
to bedtools-discuss
Hi,

The following (sorted) input

> cat ttt1.bed
chr11   1       1000
X       1       1000

> cat ttt2.bed
chr12   2000    3000
X       500     600

produces no overlap.


> bedtools multiinter -i ttt1.bed ttt2.bed

chr11   1       1000    1       1       1       0
X       1       1000    1       1       1       0
chr12   2000    3000    1       2       0       1
X       500     600     1       2       0       1


Changing "X" to "chrX" gives the correct result. 

chr11   1       1000    1       1       1       0
chr12   2000    3000    1       2       0       1
chrX    1       500     1       1       1       0
chrX    500     600     2       1,2     1       1
chrX    600     1000    1       1       1       0

Aaron Quinlan

unread,
Jun 16, 2020, 8:15:46 PM6/16/20
to bedtools...@googlegroups.com
Bedtools assumes that the chromosome column is sorted lexicographically (e.g., with UNIX `sort -k1,1`). Under this assumption, your input files are not sorted lexicographically by chromosome.

You should be fine if you always do: `sort -k1,1 -k2,2n x.bed > x.sorted.bed` before using this tool.

Best,
Aaron
--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bedtools-discuss/ca07cbc4-c1bc-4d82-ae88-3fe1bab7bd8co%40googlegroups.com.

Sam

unread,
Jul 2, 2020, 10:09:07 AM7/2/20
to bedtools-discuss
Thanks.

Actually, the files were sorted in the manner you have suggested.  Only thing, they were sorted when the locale was en_US.UTF-8, which has produced the sorting order I have posted.

Changing the locale settings LC_ALL=C has fixed the sorting. 

Best, 
Sam
Reply all
Reply to author
Forward
0 new messages