Problematic input to interval tree yields incorrect overlaps

Spencer Kimball

unread,

Oct 3, 2014, 4:07:43 PM10/3/14

to biogo...@googlegroups.com

In the attached unittest, a sequence of intervals is added to an interval tree. The test fails with an obvious error: we try to verify there are two overlapping intervals but there is only one. Notice that there are many values inserted into the interval tree in advance of the failure condition occurring. Removing some of the intervening insertions sometimes "fixes" the error condition and sometimes doesn't affect it. This makes me believe it has something to do with the LLRB rebalancing. The error condition appears sensitive to the structure of the tree.

The failing example uses []byte as the interval start and end markers. I did a simple conversion of the byte slices in the example to integers which sort exactly the same and the test doesn't fail. That file is attached as well. The fact that I wasn't able to reproduce this with the integers even though they create the exact same underlying tree, makes me suspicious that I've done something wrong with the construction of the custom interval tree. However, I've checked it and double checked it.

Any help would be much appreciated.

interval_test.go

custom_int_interval_test.go

Dan Kortschak

unread,

Oct 3, 2014, 4:53:10 PM10/3/14

to Spencer Kimball, biogo...@googlegroups.com

Can you make a failing case with fewer insertions? I would suggest that you either add this into the interval tests and make use of the graphic output, or grab that code from the tests and use it here; looking at the trees can be very helpful, but ~1000 nodes is too big to fit on a screen nicely.

If I have time, I'll look into this, but that is more likely to happen if the failing case is smaller.

Dan

Spencer Kimball

unread,

Oct 3, 2014, 5:44:33 PM10/3/14

to Dan Kortschak, biogo...@googlegroups.com

Dan, thanks for the prompt response. I wrote a quick set of mods to subtract random insertions and keep the deletion if the test still failed. After a bit of runtime, the test is much reduced with only 51 insertions now.

interval_test.go

Dan Kortschak

unread,

Oct 3, 2014, 10:31:00 PM10/3/14

to Spencer Kimball, biogo...@googlegroups.com

Thanks, that is likely to be very helpful. Here is the tree (I will find
some time to look at the problem later) and it suggests some obvious
things to look at - I suspect it in the short-circuiting code, though I
have no idea why this only cropped up here.

BTW You are doing way too much work in your type definitions. You can
rewrite the types as (copied directly out of my test version of your
code):

type Key []byte

func (k Key) Compare(b Comparable) int {
return bytes.Compare(k, b.(Key))
}

type bugInterval struct {
start, end Key
id uintptr
}

func (i *bugInterval) Overlap(b Range) bool {
// Half-open interval indexing.
return i.end.Compare(b.Start()) > 0 && i.start.Compare(b.End()) < 0
}
func (i *bugInterval) ID() uintptr { return i.id }
func (i *bugInterval) Start() Comparable { return i.start }
func (i *bugInterval) End() Comparable { return i.end }
func (i *bugInterval) NewMutable() Mutable { return &bugInterval{i.start, i.end, i.id} }
func (i *bugInterval) String() string { return fmt.Sprintf("[%q,%q)#%d", i.start, i.end, i.id) }
func (i *bugInterval) SetStart(c Comparable) { i.start = c.(Key) }
func (i *bugInterval) SetEnd(c Comparable) { i.end = c.(Key) }

bug.svg

BugIntervalTest.dot

Dan Kortschak

unread,

Oct 3, 2014, 10:48:05 PM10/3/14

to Spencer Kimball, biogo...@googlegroups.com

On Sat, 2014-10-04 at 12:00 +0930, Dan Kortschak wrote:
> I suspect it in the short-circuiting code, though I have no idea why
> this only cropped up here.

No, it's due to range adjustment in the slow update case. If you add
itree.AdjustRange() to your code (what I usually use for batch insert, just because)
the problem goes away. A fix should relatively trivial when I get some time.

Would you please file an issue in the bug tracker and are you happy for
me to use your test data in the bug test for the fix?

Dan

Spencer Kimball

unread,

Oct 3, 2014, 10:58:45 PM10/3/14

to Dan Kortschak, biogo...@googlegroups.com

Wow, fast turnaround. This is great. I'll file an issue as you suggest when I'm at a proper computer tomorrow. Where would I add the adjusttree call btw?

In any case, I chose biogo.store for this task mainly because you're the only game in town for a go-based interval tree. But also because I've already used your llrb implementation and judged it superior to others based on unittests. But also because this implementation is well tested. What about this bug allowed it to slip through the cracks? I confess that I haven't taken the time to really understand the algorithm. Maybe I should. I'd like to avoid really putting my brain to work on the interval tree data structure of possible. Do you think it's solid, or does this bug make you nervous about anything fundamental?

Btw, using this for https://github.com/cockroachdb/cockroach

Dan Kortschak

unread,

Oct 3, 2014, 11:07:21 PM10/3/14

to Spencer Kimball, biogo...@googlegroups.com

On Fri, 2014-10-03 at 22:58 -0400, Spencer Kimball wrote:
> Where would I add the adjusttree call btw?

You only need to call AdjustRanges() before calls to Get or DoMatching|
DoMatchingReverse if you have inserted (theoretically when fast=true -
but now until I've fixed this). For my use cases (large sets of genomic
data), I stuff a whole heap of intervals in a tree, AdjustRanges() and
then use it to handle queries - this is probably why it has never come
up before (though I probably would not have noticed, so thank you for
raising it). For batch loads, this is much faster since we only do the
range adjustments once, at the end of the tree construction.

> In any case, I chose biogo.store for this task mainly because you're
> the only game in town for a go-based interval tree. But also because
> I've already used your llrb implementation and judged it superior to
> others based on unittests. But also because this implementation is
> well tested. What about this bug allowed it to slip through the
> cracks? I confess that I haven't taken the time to really understand
> the algorithm. Maybe I should.

Always a good idea. My implementation of the LLRB algorithm was actually
for teaching/learning, so it is written to closely match the description
given by Sedgewick.

> I'd like to avoid really putting my brain to work on the interval tree
> data structure of possible. Do you think it's solid, or does this bug
> make you nervous about anything fundamental?

There is nothing fundamentally wrong with it AFAIK, but the order of
balancing operations is probably what got me here. With your test case I
can follow the rotations and see how they interact with the range
adjustments. That is probably what is going wrong here.

Dan

Spencer Kimball

unread,

Oct 3, 2014, 11:12:19 PM10/3/14

to Dan Kortschak, biogo...@googlegroups.com

What I'm a bit stumped by is why the []byte version fails but the equivalent int version succeeds. That's a major problem for me. Seems like they should follow the same tree structure. Or is that incorrect?

On Friday, October 3, 2014, Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:

Dan Kortschak

unread,

Oct 3, 2014, 11:15:09 PM10/3/14

to Spencer Kimball, biogo...@googlegroups.com

On Fri, 2014-10-03 at 23:12 -0400, Spencer Kimball wrote:
> What I'm a bit stumped by is why the []byte version fails but the
> equivalent int version succeeds. That's a major problem for me. Seems
> like they should follow the same tree structure. Or is that incorrect?

I should check that, but I'm guessing not. Did you make the ranges in
the int version following exactly the same sort order for both ends?

Spencer Kimball

unread,

Oct 3, 2014, 11:17:01 PM10/3/14

to Dan Kortschak, biogo...@googlegroups.com

I thought so. What I did was multiply the leading int by 10 and add 2x the letter where "A" is 0 and "AA" is 1, "B" is 2 and "BB" is 3, etc. should have been an exact ordering but I didn't go crazy trying to ensure what I did was accurate.

On Friday, October 3, 2014, Dan Kortschak <dan.ko...@adelaide.edu.au> wrote:

Spencer Kimball

unread,

Oct 4, 2014, 12:06:41 AM10/4/14

to Dan Kortschak, biogo...@googlegroups.com

Oh now that I think about it the formulation I constructed is incorrect. The sort order changes of course. Damn. So it's still up in the air whether this sequence with equivalent integer values produces the same error result. I'll do it right tomorrow and get back to you. This is actually good news as it clears a respectable path to an explanation that isn't nuts.

Dan Kortschak

unread,

Oct 4, 2014, 6:36:15 AM10/4/14

to Spencer Kimball, biogo...@googlegroups.com

I have a minimal reproducer now - this returns no matches for the query
you give.

{start: Key("15.A"), end: Key("15.AA"), id: 9},
{start: Key("16.A"), end: Key("16.AA"), id: 10},
{start: Key("16.B"), end: Key("16.BB"), id: 11},
{start: Key("106.A"), end: Key("106.C"), id: 48},
{start: Key("106.A"), end: Key("106.AA"), id: 49},
{start: Key("106.A"), end: Key("106.AA"), id: 50},
{start: Key("106.A"), end: Key("106.AA"), id: 51},

The final rotations break the range invariant. I can't sort that out
with more than about this many nodes to think about, but this should be
fine. This is a bit more hairy than I thought it was going to be.

Spencer Kimball

unread,

Oct 6, 2014, 10:45:43 AM10/6/14

to Dan Kortschak, biogo...@googlegroups.com

Any luck with this so far?

Dan Kortschak

unread,

Oct 6, 2014, 3:59:56 PM10/6/14

to Spencer Kimball, biogo...@googlegroups.com

I have an integer keyed reproducer based on the case I posted and have instrumented the insertion code to see exectly what is going on. We've had a long weekend, so I'm going to sit down today with my favourite debugging tools (paper and pencil) and see what is going on.

Spencer Kimball

unread,

Oct 6, 2014, 4:03:06 PM10/6/14

to Dan Kortschak, biogo...@googlegroups.com

Excellent! Keep me posted and good luck.

Dan Kortschak

unread,

Oct 7, 2014, 7:45:57 PM10/7/14

to Spencer Kimball, biogo...@googlegroups.com

On Mon, 2014-10-06 at 16:01 -0400, Spencer Kimball wrote:
> Excellent! Keep me posted and good luck.
>

Paper and pencil (well, black and red pens) are the single greatest
cognitive aid in the history of humanity.

Please try the fix.

Dan

Spencer Kimball

unread,

Oct 7, 2014, 9:04:45 PM10/7/14

to Dan Kortschak, biogo...@googlegroups.com

You are the man. Will give it a shot tomorrow when I'm back in the saddle. This is a big fix. In your recognition of the error, what was your intuitive sense of possible other edge cases?

Spencer Kimball

unread,

Oct 7, 2014, 9:10:55 PM10/7/14

to Dan Kortschak, biogo...@googlegroups.com

The use case that triggered this particular sequence of events was a set of concurrent distributed txns contesting common keys and being forced to back off and retry. I could potentially expand the tests to generate a very large number of random-ish txn histories and verify (in this particular case) serializability. Could just let it run and hopefully find issues.

But not worth it if you think this fix is likely an isolated, non-correlated sort of edge case as opposed to one of a class.

I appreciate all of your fast work and attention to this!

Spencer

Dan Kortschak

unread,

Oct 7, 2014, 9:17:48 PM10/7/14

to Spencer Kimball, biogo...@googlegroups.com

On Tue, 2014-10-07 at 21:04 -0400, Spencer Kimball wrote:
> This is a big fix. In your recognition of the error, what was your
> intuitive sense of possible other edge cases?

Basically, the code as it was written was difficult to reason about and
was just incorrect - I'm quite surprised that it had not showed up
earlier.

The code now clearly does the operations that should happen during
rotations. I do not expect that there are other edge cases.

Dan

Dan Kortschak

unread,

Oct 7, 2014, 9:21:06 PM10/7/14

to Spencer Kimball, biogo...@googlegroups.com

On Tue, 2014-10-07 at 21:10 -0400, Spencer Kimball wrote:
> The use case that triggered this particular sequence of events was a
> set of concurrent distributed txns contesting common keys and being
> forced to back off and retry. I could potentially expand the tests to
> generate a very large number of random-ish txn histories and verify
> (in this particular case) serializability. Could just let it run and
> hopefully find issues.
>

If you can describe the shape of the data (distribution of left and
right ends and order) I can add that to the tests, otherwise I'd be
happy for you to hit the code hard just to make sure.

Add calls to t.isRanged() to check for the invariant being contradicted
- this is more reliable than performing a Get operation.
Dan

Reply all

Reply to author

Forward