Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] Fwd: interval trees

19 views
Skip to first unread message

Francois Berenger

unread,
Feb 9, 2012, 9:07:18 PM2/9/12
to caml...@inria.fr
-------- Original Message --------
Subject: interval trees
Date: Thu, 09 Feb 2012 17:30:21 +0900
From: Francois Berenger
To: batterie...@lists.forge.ocamlcore.org
CC: bio...@googlegroups.com

Hello,

I need to use an interval tree.

Biocaml has one, batteries have imap/iset, nice!

However, I have intervals of reals, not integers. :(

I want to build the tree (once), then query it with a real number
(many times) like in: which intervals contain the query real number?

Should I convert my floats to ints (by sorting them then ranking) before
inserting them into some existing interval tree for integers?
I am not so concerned about the pre-processing time.

Should I write from scratch?

Thanks for any suggestion,
F.

--
Caml-list mailing list. Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Richard W.M. Jones

unread,
Feb 10, 2012, 1:29:39 PM2/10/12
to Francois Berenger, caml...@inria.fr

On Fri, Feb 10, 2012 at 10:07:05AM +0900, Francois Berenger wrote:
> -------- Original Message --------
> Subject: interval trees
> Date: Thu, 09 Feb 2012 17:30:21 +0900
> From: Francois Berenger
> To: batterie...@lists.forge.ocamlcore.org
> CC: bio...@googlegroups.com
>
> Hello,
>
> I need to use an interval tree.
>
> Biocaml has one, batteries have imap/iset, nice!
>
> However, I have intervals of reals, not integers. :(
>
> I want to build the tree (once), then query it with a real number
> (many times) like in: which intervals contain the query real number?
>
> Should I convert my floats to ints (by sorting them then ranking) before
> inserting them into some existing interval tree for integers?
> I am not so concerned about the pre-processing time.
>
> Should I write from scratch?

I wrote a segment tree (integers, not floats), which is similar. It
wasn't very hard. The code is here if it helps:

http://git.annexia.org/?p=virt-mem.git;a=blob;f=lib/virt_mem_mmap.ml;hb=HEAD

Rich.

--
Richard Jones
Red Hat

Philippe Veber

unread,
Feb 11, 2012, 6:06:39 AM2/11/12
to Francois Berenger, caml...@inria.fr
Hi François,

The Biocaml_intervalTree module is merely a specialization of Set in the
standard library. It should be fairly easy to functorize it over a type
with a total order relation. I think you might even sed -e 's/int/float/g'
the current implementation (with a couple of additional and minor
modifications).

ph.

2012/2/10 Francois Berenger <bere...@riken.jp>

> -------- Original Message --------
> Subject: interval trees
> Date: Thu, 09 Feb 2012 17:30:21 +0900
> From: Francois Berenger
> To: batterie...@lists.forge.**ocamlcore.org<batterie...@lists.forge.ocamlcore.org>
> CC: bio...@googlegroups.com
>
> Hello,
>
> I need to use an interval tree.
>
> Biocaml has one, batteries have imap/iset, nice!
>
> However, I have intervals of reals, not integers. :(
>
> I want to build the tree (once), then query it with a real number
> (many times) like in: which intervals contain the query real number?
>
> Should I convert my floats to ints (by sorting them then ranking) before
> inserting them into some existing interval tree for integers?
> I am not so concerned about the pre-processing time.
>
> Should I write from scratch?
>
> Thanks for any suggestion,
> F.
>
> --
> Caml-list mailing list. Subscription management and archives:
> https://sympa-roc.inria.fr/**wws/info/caml-list<https://sympa-roc.inria.fr/wws/info/caml-list>
> Beginner's list: http://groups.yahoo.com/group/**ocaml_beginners<http://groups.yahoo.com/group/ocaml_beginners>
> Bug reports: http://caml.inria.fr/bin/caml-**bugs<http://caml.inria.fr/bin/caml-bugs>

Goswin von Brederlow

unread,
Feb 11, 2012, 12:34:18 PM2/11/12
to Philippe Veber, Francois Berenger, caml...@inria.fr
Philippe Veber <philipp...@gmail.com> writes:

> Hi François,
>
> The Biocaml_intervalTree module is merely a specialization of Set in the
> standard library. It should be fairly easy to functorize it over a type with a
> total order relation. I think you might even sed -e 's/int/float/g' the current
> implementation (with a couple of additional and minor modifications).
>
> ph.

Even better would be to polymorphize or functorize the code and send the
patch upstream.

MfG
Goswin

Goswin von Brederlow

unread,
Feb 11, 2012, 12:39:22 PM2/11/12
to Richard W.M. Jones, Francois Berenger, caml...@inria.fr
"Richard W.M. Jones" <ri...@annexia.org> writes:

> On Fri, Feb 10, 2012 at 10:07:05AM +0900, Francois Berenger wrote:
>> -------- Original Message --------
>> Subject: interval trees
>> Date: Thu, 09 Feb 2012 17:30:21 +0900
>> From: Francois Berenger
>> To: batterie...@lists.forge.ocamlcore.org
>> CC: bio...@googlegroups.com
>>
>> Hello,
>>
>> I need to use an interval tree.
>>
>> Biocaml has one, batteries have imap/iset, nice!
>>
>> However, I have intervals of reals, not integers. :(
>>
>> I want to build the tree (once), then query it with a real number
>> (many times) like in: which intervals contain the query real number?
>>
>> Should I convert my floats to ints (by sorting them then ranking) before
>> inserting them into some existing interval tree for integers?
>> I am not so concerned about the pre-processing time.
>>
>> Should I write from scratch?
>
> I wrote a segment tree (integers, not floats), which is similar. It
> wasn't very hard. The code is here if it helps:
>
> http://git.annexia.org/?p=virt-mem.git;a=blob;f=lib/virt_mem_mmap.ml;hb=HEAD
>
> Rich.

Anyone have something like this but for non-overlapping intervals and
allowing interval insertion and removal with merging and spliting of the
internaly used intervals?

MfG
Goswin

Eliot Handelman

unread,
Feb 11, 2012, 12:50:27 PM2/11/12
to caml...@inria.fr
On 11/02/2012 12:38 PM, Goswin von Brederlow wrote:
>
> Anyone have something like this but for non-overlapping intervals and
> allowing interval insertion and removal with merging and spliting of the
> internaly used intervals?

Cis from Sébastien Ferré?

http://www.irisa.fr/LIS/ferre/software.en.html

-- eliot

Edgar Friendly

unread,
Feb 11, 2012, 3:50:09 PM2/11/12
to caml...@inria.fr
On 02/11/2012 12:38 PM, Goswin von Brederlow wrote:
>> On Fri, Feb 10, 2012 at 10:07:05AM +0900, Francois Berenger wrote:
>>> I need to use an interval tree.
>>>
>>> Biocaml has one, batteries have imap/iset, nice!
>>>
> Anyone have something like this but for non-overlapping intervals and
> allowing interval insertion and removal with merging and spliting of the
> internaly used intervals?

Yes, IMap / ISet (borrowed from camomile and improved) do this. I
assume biocaml's is the same.

E.

Philippe Veber

unread,
Feb 12, 2012, 4:19:27 AM2/12/12
to Edgar Friendly, caml...@inria.fr
2012/2/11 Edgar Friendly <thele...@gmail.com>

> On 02/11/2012 12:38 PM, Goswin von Brederlow wrote:
>
>> On Fri, Feb 10, 2012 at 10:07:05AM +0900, Francois Berenger wrote:
>>>
>>>> I need to use an interval tree.
>>>>
>>>> Biocaml has one, batteries have imap/iset, nice!
>>>>
>>>> Anyone have something like this but for non-overlapping intervals and
>> allowing interval insertion and removal with merging and spliting of the
>> internaly used intervals?
>>
>
> Yes, IMap / ISet (borrowed from camomile and improved) do this. I assume
> biocaml's is the same.

Actually no, biocaml_intervalTree keeps the inserted intervals untouched,
it is in fact pretty similar to an interval multimap, with some specialized
operations. In cases when we want to describe a set of integers (vs a set
of intervals), we use ISet from Batteries. With these two structures we can
describe an interesting range of genome annotations.



>
>
> E.
>
>
> --
> Caml-list mailing list. Subscription management and archives:
> https://sympa-roc.inria.fr/**wws/info/caml-list<https://sympa-roc.inria.fr/wws/info/caml-list>
> Beginner's list: http://groups.yahoo.com/group/**ocaml_beginners<http://groups.yahoo.com/group/ocaml_beginners>
> Bug reports: http://caml.inria.fr/bin/caml-**bugs<http://caml.inria.fr/bin/caml-bugs>

Sebastien Ferre

unread,
Feb 13, 2012, 4:16:31 AM2/13/12
to caml...@inria.fr


On 02/11/2012 06:49 PM, Eliot Handelman wrote:
> On 11/02/2012 12:38 PM, Goswin von Brederlow wrote:
>>
>> Anyone have something like this but for non-overlapping intervals and
>> allowing interval insertion and removal with merging and spliting of the
>> internaly used intervals?
>
> Cis from Sébastien Ferré?
>
> http://www.irisa.fr/LIS/ferre/software.en.html

The Cis library (Cis for Compact Integer Sets) is
designed for representing sets of integers, but it
could easily be adapted to the insertion and
removal of intervals since it already handles
the merging and spliting og intervals.

Sébastien

Francois Berenger

unread,
Feb 14, 2012, 8:29:10 PM2/14/12
to caml...@inria.fr, bio...@googlegroups.com
Hello,

I did a naive implementation of interval trees for float intervals.

It is available here:
https://github.com/HappyCrow/interval-tree

I wonder if it is possible to construct the trees in a tail recursive
fashion. Maybe I knew how to do this when I was still at university.

Regards,
Francois.

Goswin von Brederlow

unread,
Feb 15, 2012, 10:22:19 AM2/15/12
to Francois Berenger, caml...@inria.fr, bio...@googlegroups.com
Francois Berenger <bere...@riken.jp> writes:

> Hello,
>
> I did a naive implementation of interval trees for float intervals.
>
> It is available here:
> https://github.com/HappyCrow/interval-tree
>
> I wonder if it is possible to construct the trees in a tail recursive
> fashion. Maybe I knew how to do this when I was still at university.
>
> Regards,
> Francois.

| Node of
(* x_mid left_list right_list left_tree right_tree *)
float * interval list * interval list * interval_tree * interval_tree

Why interval list? You only need a single interval in leafes and none in
other nodes (although it can help to store min and max in each node).

You are also missing insert and remove operations, which is the actually
hard part in this. Inserting an interval might require merging the
rightmost interval left of the root and the leftmost interval right of
the root. So you would have 2 removals and one insertion of a combined
interval, which complicates balancing the whole thing efficiently.

That is the part I'm struggling with.

MfG
Goswin

Edgar Friendly

unread,
Feb 15, 2012, 12:22:47 PM2/15/12
to Goswin von Brederlow, Francois Berenger, caml...@inria.fr, bio...@googlegroups.com
I struggled with this too, but if you read the wikipedia page
http://en.wikipedia.org/wiki/Interval_tree, he's implemented a centered
interval tree. Yes, there's a lot of complications needed to insert and
remove, but what he's done works for static interval trees.

His lookup function is `int -> interval list`, not `int -> bool`, so he
must keep all the intervals that overlap the center point so he can return
them. It's useful to have them sorted by left endpoint as well as right
endpoint. I might have used arrays for them instead of lists so that
binary search is doable, but if they're small, it doesn't matter much.

E.

Francois Berenger

unread,
Feb 15, 2012, 9:32:44 PM2/15/12
to caml...@inria.fr
On 02/16/2012 12:21 AM, Goswin von Brederlow wrote:
> Francois Berenger<bere...@riken.jp> writes:
>
>> Hello,
>>
>> I did a naive implementation of interval trees for float intervals.
>>
>> It is available here:
>> https://github.com/HappyCrow/interval-tree
>>
>> I wonder if it is possible to construct the trees in a tail recursive
>> fashion. Maybe I knew how to do this when I was still at university.
>>
>> Regards,
>> Francois.
>
> | Node of
> (* x_mid left_list right_list left_tree right_tree *)
> float * interval list * interval list * interval_tree * interval_tree
>
> Why interval list? You only need a single interval in leafes and none in
> other nodes (although it can help to store min and max in each node).

Not in my case, there is a payload associated with each interval in my
application so I need to keep track of the individual intervals.

> You are also missing insert and remove operations,

I don't miss anything: I don't need these operations in my application.

> which is the actually
> hard part in this. Inserting an interval might require merging the
> rightmost interval left of the root and the leftmost interval right of
> the root. So you would have 2 removals and one insertion of a combined
> interval, which complicates balancing the whole thing efficiently.

I hope you have a good book on this.
By the way, the one I refer to in my code is quite nice.
At first I thought it was too theoretical for me, but in fact
they give algorithms in recursive form so the book can become pretty handy.

> That is the part I'm struggling with.

You might be intersted into having a look at
the Computational Geometry Algorithms Library: http://www.cgal.org/
It's open source and it's also an INRIA product,
which makes two good points. ;)

You might want to bind your code to this library (they introduced
some framework recently, SWIG if I remember well, so that it should be
easy to do wrappers for any language).
It's heavily templated C++ code, good luck if you read their code.

They have a lot of crazily useful data structures (interval skip lists,
k-d trees, segment trees, range trees, AABB trees), and their
code is impossible to crash when using some specific math kernels.

Regards,
F.

Francois Berenger

unread,
Feb 15, 2012, 9:42:44 PM2/15/12
to caml...@inria.fr
Hello,

Anyone can translate this into being tail recursive
if it's possible:

let rec interval_tree intervals =
match intervals with
[] -> Empty
| _ ->
let x_mid = median intervals in
let left, mid, right = partition intervals x_mid in
let left_list = L.sort leftmost_bound_first mid in
let right_list = L.sort rightmost_bound_first mid in
Node (x_mid,
left_list, right_list,
interval_tree left, interval_tree right)

I'm afraid my program could crash on a very long list of intervals.

Thanks a lot,

Francois Berenger

unread,
Feb 15, 2012, 9:48:34 PM2/15/12
to caml...@inria.fr
On 02/16/2012 02:22 AM, Edgar Friendly wrote:
> I struggled with this too, but if you read the wikipedia page
> http://en.wikipedia.org/wiki/Interval_tree, he's implemented a centered
> interval tree. Yes, there's a lot of complications needed to insert and
> remove, but what he's done works for static interval trees.
>
> His lookup function is `int -> interval list`

Precisely, it is:
type: interval_tree -> float -> interval list

> , not `int -> bool`, so he
> must keep all the intervals that overlap the center point so he can
> return them. It's useful to have them sorted by left endpoint as well
> as right endpoint. I might have used arrays for them instead of lists
> so that binary search is doable,

That would be a nice optimization for queries indeed.

At the moment, I'm more concerned about the program blowing up
during tree construction.

Regards,
F.

> but if they're small, it doesn't matter
> much.
>
> E.
>
> On Wed, Feb 15, 2012 at 10:21 AM, Goswin von Brederlow
> <goswi...@web.de <mailto:goswi...@web.de>> wrote:

Goswin von Brederlow

unread,
Feb 16, 2012, 4:00:04 AM2/16/12
to Francois Berenger, caml...@inria.fr
Francois Berenger <bere...@riken.jp> writes:

> Hello,
>
> Anyone can translate this into being tail recursive
> if it's possible:
>
> let rec interval_tree intervals =
> match intervals with
> [] -> Empty
> | _ ->
> let x_mid = median intervals in
> let left, mid, right = partition intervals x_mid in
> let left_list = L.sort leftmost_bound_first mid in
> let right_list = L.sort rightmost_bound_first mid in
> Node (x_mid,
> left_list, right_list,
> interval_tree left, interval_tree right)
>
> I'm afraid my program could crash on a very long list of intervals.
>
> Thanks a lot,
> F.

Each recursion halves the lists, right? That means you need a very very
very long list to exceed the recursion depth.

MfG
Goswin

Gabriel Scherer

unread,
Feb 16, 2012, 5:22:44 AM2/16/12
to Francois Berenger, caml...@inria.fr
I can't resist giving the usual tail-recursive CPS-transformed version
(untested):

let interval_tree intervals =
let rec interval_tree intervals k =
match intervals with
| [] -> k Empty
| _ ->
let x_mid = median intervals in
let left, mid, right = partition intervals x_mid in
let left_list = L.sort leftmost_bound_first mid in
let right_list = L.sort rightmost_bound_first mid in
interval_tree left (fun left_tree ->
interval_tree right (fun right_tree ->
k (Node(x_mid, left_list, right_list, left_tree, right_tree))))
in interval_tree intervals (fun t -> t)

But see Goswin's remark: if non-tailrec makes your stack grow in
log(n) only, there is no point in jumping through hoops to get a
tail-recursive version.

John Carr

unread,
Feb 16, 2012, 7:21:32 AM2/16/12
to Goswin von Brederlow, Francois Berenger, caml...@inria.fr

Goswin von Brederlow <goswi...@web.de> wrote:

> Francois Berenger <bere...@riken.jp> writes:
> >
> > let rec interval_tree intervals =
> > match intervals with
> > [] -> Empty
> > | _ ->
> > let x_mid = median intervals in
> > let left, mid, right = partition intervals x_mid in
> > let left_list = L.sort leftmost_bound_first mid in
> > let right_list = L.sort rightmost_bound_first mid in
> > Node (x_mid,
> > left_list, right_list,
> > interval_tree left, interval_tree right)
> >
> > I'm afraid my program could crash on a very long list of intervals.
>
> Each recursion halves the lists, right? That means you need a very very
> very long list to exceed the recursion depth.

How good is median selection? If it works like quicksort, approximating
the median, typical stack depth is logarithmic and worst case stack
depth is linear. If median selection is precise, creating a balanced
tree, I wouldn't worry about stack depth.

Francois Berenger

unread,
Feb 16, 2012, 8:00:35 PM2/16/12
to caml...@inria.fr
On 02/16/2012 07:21 PM, Gabriel Scherer wrote:
> I can't resist giving the usual tail-recursive CPS-transformed version
> (untested):

Thanks! That's the technique I was looking for
(Continuation Passing Style), as I may have to use
this on some other algorithms in the future.

Christophe Raffalli

unread,
Feb 17, 2012, 3:26:57 AM2/17/12
to caml...@inria.fr
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hello,

>
> Thanks! That's the technique I was looking for
> (Continuation Passing Style), as I may have to use
> this on some other algorithms in the future.
>
>> let interval_tree intervals =
>> let rec interval_tree intervals k =
>> match intervals with
>> | [] -> k Empty
>> | _ ->
>> let x_mid = median intervals in
>> let left, mid, right = partition intervals x_mid in
>> let left_list = L.sort leftmost_bound_first mid in
>> let right_list = L.sort rightmost_bound_first mid in
>> interval_tree left (fun left_tree ->
>> interval_tree right (fun right_tree ->
>> k (Node(x_mid, left_list, right_list, left_tree,
>> right_tree))))
>> in interval_tree intervals (fun t -> t)

There is still one non tail-rec call, and this will basically rebuild
the stack in the heap, with more GC work. So unless you know your trees
are non balanced and left branch are deeper, this code is probably worst.

In the old days (I think now OCaml is clever), It could save some stack
space (and this could retain some pointer), to do the last call by
another function :

let rec interval_tree intervals =
match intervals with
[] -> Empty
| _ ->
let x_mid = median intervals in
let left, mid, right = partition intervals x_mid in
let left_list = L.sort leftmost_bound_first mid in
let right_list = L.sort rightmost_bound_first mid in
last_call x_mid left_list right_list left right

and last_call x_mid left_list right_list left right =
Node (x_mid,
left_list, right_list,
interval_tree left, interval_tree right)

Here it makes sure mid and intervals are not stored in the stack frame
.. But it may be done anyway now, I am not sure.

Someone knowns the current rules to know what's retain on the stack in
today OCaml ?

Cheers,
Christophe

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8+C8MACgkQi9jr/RgYAS4nHwCfYX9mY8nHAkz65QQXDerPwfHd
tvgAoIjvVxlDONYRjwpSZS0/5LhhCo44
=OnkD
-----END PGP SIGNATURE-----
0 new messages