bug with coordinates larger than one billion

1,605 views
Skip to first unread message

olivi...@hotmail.fr

unread,
Dec 15, 2014, 4:26:38 AM12/15/14
to bedtools...@googlegroups.com
Hello,

I used mergeBed with this file :
chr1 10 20
chr1 15 19

The output was OK :
chr1 10 20

But with values higher than 1 billion, mergeBed had got problems:

>more b.bed
chr1 10 20
chr1 1000000000 2000000000
chr1 1500000000 1500000000

mergeBed -i file.bed process does not finish, whereas clusterBed worked well :

>clusterBed -i b.bed
chr1 10 20 1
chr1 1000000000 2000000000 2
chr1 1500000000 1500000000 2

With intersectBed, I got an error too :
>cp b.bed b2.bed
>intersectBed -a b.bed -b b2.bed 
ERROR: Received illegal bin number 91552 from getBin call.
ERROR: Unable to add record to tree.

I used bedtools v2.20.1 and v2.22.0.
Is there a different way to store integers between the tools ? Can you fix the problem ?
Thanks in advance

Aaron Quinlan

unread,
Dec 18, 2014, 10:06:23 AM12/18/14
to bedtools...@googlegroups.com, Kindlon, Neil Edward (nek3d)
Hi Olivier,

Most of the tools in bedtools use 32 bit unsigned integers to represent chromosome coordinates, since most genomes do not have gigabase sized chromosomes.  What species are you working with?
--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Marshall

unread,
Dec 18, 2014, 12:55:56 PM12/18/14
to bedtools...@googlegroups.com
On 15 Dec 2014, at 09:26, olivi...@hotmail.fr wrote:
> But with values higher than 1 billion, mergeBed had got problems:
>
> >more b.bed
> chr1 10 20
> chr1 1000000000 2000000000
> chr1 1500000000 1500000000
[...]
> >intersectBed -a b.bed -b b2.bed
> ERROR: Received illegal bin number 91552 from getBin call.
> ERROR: Unable to add record to tree.

getBin() computes the .bai BAM index bin number, which is what limits chromosomes in .bai indices to 2^29 (~0.5 billion) bases. So the error here is because your region's coordinates are larger than that.

In samtools we're moving to a generalisation of the same binning algorithm in large part to support chromosomes bigger than half a billion [1].

I guess bedtools is using the binning to speed up the intersect here. If just the < 37450 test were relaxed, I think this would work -- except that the binning would provide no speedup for regions beyond 2^29 as all intervals beyond there would lie in the smallest bottom-row bins.

To fix this, the binning algorithm used could be changed to a CSI-style one with parameters that allow it to be useful for the whole range of unsigned 32-bit chromosome lengths and still work well for shorter human-sized (2^29) chromosomes. Or the parameters could be chosen adaptively depending on each chromosome's length. This is a somewhat non-trivial engineering problem that we're still grappling with in htslib and samtools too...

Cheers,

John

[1] See the references to CSI in SAMv1.pdf §5.

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Aaron Quinlan

unread,
Dec 18, 2014, 12:58:24 PM12/18/14
to bedtools...@googlegroups.com
Hi John,

getBin() computes the .bai BAM index bin number, which is what limits chromosomes in .bai indices to 2^29 (~0.5 billion) bases.  So the error here is because your region's coordinates are larger than that.

Yes that is partly it, but that is only for the tools that use the binning algorithm.  Other tools strictly limit the positions to 9 digits.

I guess bedtools is using the binning to speed up the intersect here.  If just the < 37450 test were relaxed, I think this would work -- except that the binning would provide no speedup for regions beyond 2^29 as all intervals beyond there would lie in the smallest bottom-row bins.

To fix this, the binning algorithm used could be changed to a CSI-style one with parameters that allow it to be useful for the whole range of unsigned 32-bit chromosome lengths and still work well for shorter human-sized (2^29) chromosomes.  Or the parameters could be chosen adaptively depending on each chromosome's length.  This is a somewhat non-trivial engineering problem that we're still grappling with in htslib and samtools too…

I agree and this is something we have simply just “punted” on for the time being!


- Aaron






senas...@gmail.com

unread,
Aug 5, 2015, 11:18:12 AM8/5/15
to bedtools-discuss
Hi, 

I would also like to report the same errors here. I am trying to intersect opposum CTCF sites against repeats, and get the same errors. 

../bedtools2/bin/bedtools intersect  -a CTCF_opossum.bed  -b monDom5.rmsk  -wo 

ERROR: Received illegal bin number 37450 from getBin call.

ERROR: Unable to add record to tree.


I tried both :

bedtools v2.24.0

bedtools v2.21.0


The opossum genome has large chromosomes (>0.5 Billion), with chr1 at 0.7 billion. 

mysql  --user=genome --host=genome-mysql.cse.ucsc.edu -A -D monDom5 -e 'select chrom,size from chromInfo' > monDom5.genome


chrom   size

chr1    748055161

chr2    541556283

chr3    527952102

chr4    435153693

chr8    312544902

chr5    304825324

chr6    292091736

chr7    260857928

chrUn   103241611

chrX    79335909

chrM    17079


This is just to report similar error (hopefully it just happens for opossum at the moment)


Thanks, 


Ben

jjenn...@gmail.com

unread,
Feb 3, 2017, 7:41:08 AM2/3/17
to bedtools-discuss, senas...@gmail.com
Hi Ben,

I am running into this same issue and am curious how you got around it?

Thanks!

Jenny

rol...@ebi.ac.uk

unread,
Sep 28, 2017, 11:27:09 AM9/28/17
to bedtools-discuss
Hi all,

I just ran into the same problem with, again, opossum. Down the line as more genomes are sequenced it might be the case for more, especially the marsupials.

For now I'm getting around it by manually splitting the chromosomes before the bedtools analysis and than stitching them back together after. Not ideal, but it does the job. 

I would love to see bedtools start to handle long chromosomes.

Best,

Maša

col...@gmail.com

unread,
Feb 21, 2018, 11:21:22 AM2/21/18
to bedtools-discuss
Hi everyone

Same issue, wheat genome (several chromosomes above 700 Mb and a couple above 800 Mb)

With bedtools intersect.

Cheers

iqraish...@gmail.com

unread,
May 21, 2019, 10:51:03 AM5/21/19
to bedtools-discuss
Hi everyone ....i am also facing same error with bedtools intersect ( wheat genome )
ERROR: Received illegal bin number 37458 from getBin call.

ERROR: Unable to add record to tree.
Reply all
Reply to author
Forward
0 new messages