The two main candidates would be something like to boost::split_interval_set or
a centred interval tree
(along the lines of http://en.wikipedia.org/wiki/Interval_tree).
It seems to me that the characteristics of an interval tree differ sufficiently
from the containers defined in ICL that they might make a useful addition.
For the case where there is a potentially large number of overlapping intervals
a split_interval_set will be less efficient
than an interval tree. In the worst case intervals would be split down to unit
size.
Searching the archives I see there was some discussion of interval trees in the
context of ICL but the additional of an interval tree
container did not, as far as I can tell, come up (perhaps because it should
first appear in a boost release before there is any consideration
of adding to it?). I have a few questions:
1. Is there a specific name for the data structures provided by the ICL?
I have searched the literature and cannot find any mention of sets that split or
combine intervals this way.
It is either a genuinely novel concept, which I find difficult to believe, or I
haven't used the correct search terms (interval, extent or segment).
Either way, I would like to update wikipedia appropriately to help other lost
souls.
2. Are there any compelling reasons why interval trees are less useful than the
containers provided by the ICL?
3. Perhaps based on the answer to 2. Would interval trees be a sensible addition
to the ICL?
4. Are there any other classes of interval container worthy of mention?
Regards,
Bruce.
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
2010/12/14 Bruce Adams <torto...@yahoo.co.uk>:
>
> Hi,
> I have been working on a problem for which a suitable solution is an
> efficient container of intervals.
>
> The two main candidates would be something like to boost::split_interval_set or
> a centred interval tree
> (along the lines of http://en.wikipedia.org/wiki/Interval_tree).
> It seems to me that the characteristics of an interval tree differ sufficiently
> from the containers defined in ICL that they might make a useful addition.
No doubt, interval_trees would be a useful addition to the library.
For certain use cases they perform better than the interval container
currently available in the ICL.
> For the case where there is a potentially large number of overlapping intervals
> a split_interval_set will be less efficient
> than an interval tree. In the worst case intervals would be split down to unit
> size.
That depends what you intend to do with your interval container. One
of the fundamental ideas for containers of the ICL was that of a
compact and minimal representation of sets or maps using intervals. An
icl::interval_set for example instantly joins inserted intervals if
they overlap and thus always stays in a minimal form. Contrary to an
interval tree, it forgets the intervals that have been inserted into
it and it can not answer stabbing queries anymore.
> Searching the archives I see there was some discussion of interval trees in the
> context of ICL but the additional of an interval tree
> container did not, as far as I can tell, come up (perhaps because it should
> first appear in a boost release before there is any consideration
> of adding to it?).
There was an agreement among the reviewers that the library in the
current from is very useful, and efficient in many use cases, so it
should be accepted into boost. I have agreed to integrate an interval
tree implementation in the further evolution of the library. But it
might also be contributed by others.
> I have a few questions:
>
> 1. Is there a specific name for the data structures provided by the ICL?
As already mentioned, a basic idea for the library was to implement
just sets and maps in a compact form exploiting the the fact, that
elements occur in contiguous chunks: intervals.
So you could say the ICL implements sets and maps. They are called
interval_sets and interval_maps just as with bitsets the prefix bit...
points to the way or technique the sets are implemented. I am not
aware of a more specific name for this class of data structures other
than interval_set and interval_map which I find quite natural.
> I have searched the literature and cannot find any mention of sets that split or
> combine intervals this way.
> It is either a genuinely novel concept, which I find difficult to believe, or I
> haven't used the correct search terms (interval, extent or segment).
> Either way, I would like to update wikipedia appropriately to help other lost
> souls.
I admit, I never did an in-depth literature search to find out the
"right" name for what I am doing. If you find out what it is, let me
know ;-)
> 2. Are there any compelling reasons why interval trees are less useful than the
> containers provided by the ICL?
I don't say interval trees are less useful than icl containers. They
are different data structures with different strenghts and weaknesses.
> 3. Perhaps based on the answer to 2. Would interval trees be a sensible addition
> to the ICL?
Yes.
> 4. Are there any other classes of interval container worthy of mention?
Roughly... (1) A flat_interval_set/map that is implemented using a
vector can be more efficient, if there are mainly lookups and less
inserts/deletes on the object, after is has been created. (2) Another
interesting implementation of interval_sets/maps is a tree that only
stores the beginning points of intervals. The interval_set/map is
always infinite starting with (-inf, +inf) and gets only split by the
insertion of elements.
Cheers
Joachim
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de/
In the cases where I've needed them, only a container that preserved
the inserted intervals (and information about overlaps) would do (I
was storing musical notes, FWIW)
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
interesting
> (and information about overlaps)
isn't this the hairy part?
> would do (I was storing musical notes, FWIW)
Cheers,
Joachim
Not too hairy; an interval tree I made by modifying my STL's rbtree
implementation worked just fine.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
_______________________________________________
so why not add your interval tree implementation to Interval
Containers, if it is readily available?
Joachim
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de/
That code was written before Boost even existed, I think, and is owned
by my employer-at-the-time.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
_______________________________________________
:(
So, what would be a good way of doing this?
Instead of implementing a specific augmentation modifying an existing
rbtree, I think it'd be more fun, to write an augmented_rbtree on the
basis of a good rbtree implementation, and then write the
interval_tree as an instance of augmented_rbtree.
The augmented_rbtree would have a template parameter typename A, for
the sythesized attribute, the augmentation and a functor F to compute
the augmented node attribute a = F()(left, x, right). This way it
might be possible to write a template for a class of augmented trees,
that contains the interval_tree as a special case but can be used for
other problems as well.
Joachim
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de
Go for it, brother. Fun is what it's all about here at Boost!
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de
Hi everyone,
Bruce's post got me thinking about this problem again. Of course an
interval tree, such as one based on a red-black tree as Dave suggests,
would be a fantastic thing to have in Boost [1]. However something I
think would be even better would be some sort of "interval adaptor"
that would allow some existing non-interval container to be efficiently
adapted to store intervals. This could be used on top of a std::map,
but the reason why I'd like it is that it could also be used with
something like Beman's proposed b-tree[2].
But how to do it? I've used a few of my insomniac brain-cycles
thinking about this, and I may have finally come up with a solution.
My idea is to consider an interval (low,high) as a point in a 2D
plain. Then, a query to find all of the intervals overlapping (a,b) is
the same as the query to find all the points in the quarter-plane where
low<b and high>a.
Any 2D data structure could be used to store these (e.g. quadtrees
etc), but the aim here is to adapt a 1D data structure. I have used
the Z-curve method [3] to do this efficiently for some years now and it
works well. The idea is that you bitwise-interleave the two values and
use this as the key. You can then iterate over all points in a
rectangle by iterating over all points in the 1D range between the
interleaved bottom left and top right corner values. A restriction is
that the types must be suitable for this bitwise interleaving, i.e.
floating point types don't work.
It's unlikely that I am the first to think of this, but I've not found
yet any references. Perhaps I'm searching for the wrong things.
Any thoughts anyone?
Regards, Phil.
[1] For the record, there is a C++ red-black interval tree here:
http://cs.montana.edu/~bohannan
Boost people probably wouldn't find this useful in its present form,
but it could be useful for study.
[2] http://mysite.verizon.net/beman/btree/index.html
[3] http://en.wikipedia.org/wiki/Z-order_(curve)
Hi Phil,
I find your idea quite nifty and fascinating. The transformation from
intervals to 2D plane to z-curve ordering is quite powerful and opens
up new frames of thinking on the problem. I guess I need a little more
thinking time and intend to post another answer tomorrow.
Best regards,
Joachim
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de
That is an interesting idea. However, I see the method (or at least some
implementations
of it) are covered by a patent. Presumably that could affect its adoption into
boost?
I'm broadly against software patents but things get hairy if you give the
lawyers a hook
to fish with. Actually digging a bit further it could be that the patent is to
prevent evil lawyers
doing this as the claim is by an academic rather than say Oracle (unless he
works for Oracle!)
Looks like enough of it is prior art that something could be used.
I note that the current interval containers could also be considered / are
implemented as
adapters fitting over sets or maps so this would be a good fit. It might thus
be quicker to integrate.
Regards,
Bruce.
After I have spent some more time studying your proposal, I still
think it is a very smart method and a good alternative to the
"classical" interval tree implementation suggested by Dave.
The restriction imposed by bit-interleaving on the applicable element
types is justified if the data structure gains efficiency. Using meta
programming we can choose the best implementation dependent on the
element type. More unpleasant is the fact that an important part of
the method, the efficient search of the z-ordered values within the
quarter-plane defined by the query interval seems to be protected by a
software patent as Bruce pointed out. But may be you have a different
implementation that circumvents this obstacle.
> That is an interesting idea. However, I see the method (or at least some
> implementations
> of it) are covered by a patent. Presumably that could affect its adoption into
> boost?
>
> I'm broadly against software patents
+1
Best regards,
Joachim
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de
I think you mean the key type?
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
In terms of intervals as a set of elements: element_type. In terms of
a 2D plane: coordinate_type. In terms of a sorted container: key_type.
Since Phil started with intervals I chose "element_type" (of the
interval). The key values of Phils interval_adaper would be z-values
which differ from the element_type of the intervals as they have
double length due to bit-interleaving.
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de
Bruce Adams wrote:
> That is an interesting idea. However, I see the method (or at least some
> implementations of it) are covered by a patent.
I don't look at patents. Everything I've described is either my own
(re-)invention or taken from the 1981 paper "Multidimensional Range
Search in Dynamically Balanced Trees" by Tropf and Herzog that is
linked from the Z-curve Wikipedia page.
> Presumably that could affect its adoption into boost?
I doubt it.
Regards, Phil.
Can something that is published by Tropf 1981 be patented by Tropf 2004?
--
Interval Container Library [Boost.Icl]
http://www.joachim-faulhaber.de
IANAL
How about this scenario (purely hypothetical):
Tropf applied for a patent in 1980.
The patent application took longer than expected, and Tropf published something more or less closely related in 1981.
The patent was finally granted in 2004.
So it looks to me as if under these circumstances, the patent would be valid. The publication from 1981 would not qualify as prior art, not because it is written by the same author, but because it was published later. But I have to admit that it occurs unlikely to me that it took 20 years until a patent was granted.
Regards,
Thomas
If we're talking about patent number 7321890 (which was referenced on
the aforementioned Wikipedia page), Google says:
Filing date: Feb 18, 2004
Issue date: Jan 22, 2008
Application number: 10/781,488
The patent has only 6 claims, you'd probably have to read them and the
earlier paper to figure out what the differences are between the earlier
paper and the later patent. Or, you could write Tropf and ask him ;)
-Hal