[boost] accurate sum accumulator (kahan)

165 views
Skip to first unread message

Gaetano Mendola

unread,
Jul 25, 2010, 11:23:54 AM7/25/10
to bo...@lists.boost.org
Hi all,
I had to implement in terms of boost accumulators the Kahan
summation algorithm. I will paste at the end of this message
the accumulator, the feature and the extractor source code.

This accumulator reduces the numerical error obtained with
standard sequential sum, as example the sum of 1e6f elements
equal to 1e-6f should give 1.0f, this is the results obtained
with sum and sum_kahan:

tag::sum: 1.009038925
tag::sum_kahan: 1.000000000

I'm not sure if that sum_kahan is worth to be inserted in the
accumulators provided with boost, I'm not an expert of
boost:accumulators (I just followed the guide to make this new
accumulator) so not sure if you prefer to have a fast_sum
and an accurate_sum instead of sum and sum_kahan.

Regards
Gaetano Mendola

// File kahan_sum.hpp
#ifndef KAHAN_SUM
#define KAHAN_SUM

#include <boost/accumulators/framework/accumulator_base.hpp>
#include <boost/accumulators/framework/parameters/sample.hpp>
#include <boost/numeric/conversion/cast.hpp>

namespace boost {
namespace accumulators {
namespace impl {

template<typename Sample>
struct sum_kahan_accumulator : accumulator_base
{
typedef Sample result_type;

template<typename Args>
sum_kahan_accumulator(Args const & args)
: sum(args[sample | Sample()]),
compensation(boost::numeric_cast<Sample>(0.0))
{
}

template<typename Args>
void operator ()(Args const & args)
{
const Sample myTemp1 = args[sample] - this->compensation;
const Sample myTemp2 = this->sum + myTemp1;
this->compensation = (myTemp2 - this->sum) - myTemp1;
this->sum = myTemp2;
}

result_type result(dont_care) const
{
return this->sum;
}
private:
Sample sum;
Sample compensation;
};

}}}

namespace boost { namespace accumulators { namespace tag {

struct sum_kahan : depends_on< >
{
typedef impl::sum_kahan_accumulator< mpl::_1 > impl;
};

}}}

namespace boost {
namespace accumulators {
namespace extract {

extractor< tag::sum_kahan > const sum_kahan = {};

}
using extract::sum_kahan;
}}

#endif


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Paul A. Bristow

unread,
Jul 26, 2010, 9:59:29 AM7/26/10
to bo...@lists.boost.org

> -----Original Message-----
> From: boost-...@lists.boost.org [mailto:boost-...@lists.boost.org] On Behalf Of Gaetano Mendola
> Sent: Sunday, July 25, 2010 4:24 PM
> To: bo...@lists.boost.org
> Subject: [boost] accurate sum accumulator (kahan)
>
> Hi all,
> I had to implement in terms of boost accumulators the Kahan
> summation algorithm. I will paste at the end of this message
> the accumulator, the feature and the extractor source code.
>
> This accumulator reduces the numerical error obtained with
> standard sequential sum, as example the sum of 1e6f elements
> equal to 1e-6f should give 1.0f, this is the results obtained
> with sum and sum_kahan:
>
> tag::sum: 1.009038925
> tag::sum_kahan: 1.000000000
>
> I'm not sure if that sum_kahan is worth to be inserted in the
> accumulators provided with boost, I'm not an expert of
> boost:accumulators (I just followed the guide to make this new
> accumulator) so not sure if you prefer to have a fast_sum
> and an accurate_sum instead of sum and sum_kahan.

<snip>

I think it would be an excellent idea to include both.

I fear that tests on the appropriate NIST datasets would show the existing 'fast' algorithm to perform badly.

http://www.itl.nist.gov/div898/strd/

Whereas I would expect your Kahan algorithm version to do much better.

It would be in keeping with Boost tradition to be (or able to be) accurate.

Some people would also like to know the accurate_sum compute time cost over existing (fast) sum

(I presume there is some?).

It would need some documentation on when and why one might need to prefer the accurate version.

Paul

PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

---
Paul A. Bristow
Prizet Farmhouse
Kendal, UK LA8 8AB
+44 1539 561830, mobile +44 7714330204
pbri...@hetp.u-net.com

David Abrahams

unread,
Jul 26, 2010, 10:09:48 AM7/26/10
to bo...@lists.boost.org
At Mon, 26 Jul 2010 14:59:29 +0100,

Paul A. Bristow wrote:
>
>
>
> PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).

It's possible I'm completely off the mark here, but I think maybe we
want sum and quick_sum.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Paul A. Bristow

unread,
Jul 26, 2010, 11:20:16 AM7/26/10
to bo...@lists.boost.org
> -----Original Message-----
> From: boost-...@lists.boost.org [mailto:boost-...@lists.boost.org] On Behalf Of David Abrahams
> Sent: Monday, July 26, 2010 3:10 PM
> To: bo...@lists.boost.org
> Subject: Re: [boost] accurate sum accumulator (kahan)
>
> At Mon, 26 Jul 2010 14:59:29 +0100,
> Paul A. Bristow wrote:
> > PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).
>
> It's possible I'm completely off the mark here, but I think maybe we
> want sum and quick_sum.

This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Paul

David Abrahams

unread,
Jul 26, 2010, 11:51:28 AM7/26/10
to bo...@lists.boost.org
At Mon, 26 Jul 2010 16:20:16 +0100,

Paul A. Bristow wrote:
>
> > -----Original Message-----
> > From: boost-...@lists.boost.org [mailto:boost-...@lists.boost.org] On Behalf Of David Abrahams
> > Sent: Monday, July 26, 2010 3:10 PM
> > To: bo...@lists.boost.org
> > Subject: Re: [boost] accurate sum accumulator (kahan)
> >
> > At Mon, 26 Jul 2010 14:59:29 +0100,
> > Paul A. Bristow wrote:
> > > PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).
> >
> > It's possible I'm completely off the mark here, but I think maybe we
> > want sum and quick_sum.
>
> This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?

Technically, of course it would, but aren't they getting the wrong answer currently?

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

_______________________________________________

Matthias Troyer

unread,
Jul 26, 2010, 11:53:42 AM7/26/10
to bo...@lists.boost.org

On 26 Jul 2010, at 09:51, David Abrahams wrote:

> At Mon, 26 Jul 2010 16:20:16 +0100,
> Paul A. Bristow wrote:
>>
>>> -----Original Message-----
>>> From: boost-...@lists.boost.org [mailto:boost-...@lists.boost.org] On Behalf Of David Abrahams
>>> Sent: Monday, July 26, 2010 3:10 PM
>>> To: bo...@lists.boost.org
>>> Subject: Re: [boost] accurate sum accumulator (kahan)
>>>
>>> At Mon, 26 Jul 2010 14:59:29 +0100,
>>> Paul A. Bristow wrote:
>>>> PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).
>>>
>>> It's possible I'm completely off the mark here, but I think maybe we
>>> want sum and quick_sum.
>>
>> This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?
>
> Technically, of course it would, but aren't they getting the wrong answer currently?

I would still call it sum. We have accurate versus quick versions of variance and other accumulators that can be specified like "variance(accurate)" in the feature list (namespaces missing ... it has been a while since I used it last). I would just propose to add this sum as "sum(accurate)"

Matthias

Paul A. Bristow

unread,
Jul 26, 2010, 11:58:30 AM7/26/10
to bo...@lists.boost.org
> -----Original Message-----
> From: boost-...@lists.boost.org [mailto:boost-...@lists.boost.org] On Behalf Of David Abrahams
> Sent: Monday, July 26, 2010 4:51 PM
> To: bo...@lists.boost.org
> Subject: Re: [boost] accurate sum accumulator (kahan)
>
> At Mon, 26 Jul 2010 16:20:16 +0100,
> Paul A. Bristow wrote:
> > > At Mon, 26 Jul 2010 14:59:29 +0100,
> > > Paul A. Bristow wrote:
> > > > PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).
> > >
> > > It's possible I'm completely off the mark here, but I think maybe we
> > > want sum and quick_sum.
> >
> > This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?
>
> Technically, of course it would, but aren't they getting the wrong answer currently?

Yes (a bit, sometimes, probably) - but quicker ;-)

(Seriously - a change in accuracy might mean that tests just fail and this might cause more trouble than it was worth.
This being floating point, even the Kahan method doesn't guarantee a *exact* answer, just a more accurate one).

Paul

Eric Niebler

unread,
Jul 26, 2010, 12:23:44 PM7/26/10
to bo...@lists.boost.org
On 7/26/2010 11:51 AM, David Abrahams wrote:
> At Mon, 26 Jul 2010 16:20:16 +0100,
> Paul A. Bristow wrote:
>>
>>> -----Original Message-----
>>> From: boost-...@lists.boost.org [mailto:boost-...@lists.boost.org] On Behalf Of David Abrahams
>>> Sent: Monday, July 26, 2010 3:10 PM
>>> To: bo...@lists.boost.org
>>> Subject: Re: [boost] accurate sum accumulator (kahan)
>>>
>>> At Mon, 26 Jul 2010 14:59:29 +0100,
>>> Paul A. Bristow wrote:
>>>> PS As for naming, I'd vote for sum and accurate_sum (to keep backward compatibility).
>>>
>>> It's possible I'm completely off the mark here, but I think maybe we
>>> want sum and quick_sum.
>>
>> This would be ideal - but doesn't accumulator already use 'sum' so it would change behaviour of existing programs?
>
> Technically, of course it would, but aren't they getting the wrong answer currently?

I tend to agree with Dave, so long as the kahan sum accumulator is a
drop-in replacement for what we already have. Does it Just Work for
integral types, too?

--
Eric Niebler

Simonson, Lucanus J

unread,
Jul 26, 2010, 12:46:48 PM7/26/10
to bo...@lists.boost.org
David Abrahams wrote:
> At Mon, 26 Jul 2010 16:20:16 +0100,
> Paul A. Bristow wrote:
>>> At Mon, 26 Jul 2010 14:59:29 +0100,
>>> Paul A. Bristow wrote:
>>>> PS As for naming, I'd vote for sum and accurate_sum (to keep
>>>> backward compatibility).
>>>
>>> It's possible I'm completely off the mark here, but I think maybe we
>>> want sum and quick_sum.
>>
>> This would be ideal - but doesn't accumulator already use 'sum' so
>> it would change behaviour of existing programs?
>
> Technically, of course it would, but aren't they getting the wrong
> answer currently?

In the past I've had people tell me that they couldn't accept my code that gives the right answer to replace code that gives the wrong answer because it might cause their regression tests to fail. I don't think I need to tell you my opinion on that, but if you change the sum behavior you might get people who will complain. Anyone who writes a regression test that compares equality of a floating point summation result to a reference value is pretty much asking for what they would be getting. I say we do it. My position is that people use libraries expecting superior results to what they could achieve themselves with little effort (better algorithmic complexity/accuracy of outputs) and that such behavior should be the default.

Regards,
Luke

Jeffrey Lee Hellrung, Jr.

unread,
Jul 26, 2010, 12:55:54 PM7/26/10
to bo...@lists.boost.org

Without being too familiar with the accumulators framework, wouldn't
replacing sum with the kahan summation algorithm penalize
non-floating-point types? Or worse, it will cause compiler errors,
since kahan requires subtraction?

Or is the proposition to replace the (naive) summation algorithm with
kahan summation algorithm for only floating point types?

- Jeff

David Abrahams

unread,
Jul 26, 2010, 1:09:35 PM7/26/10
to bo...@lists.boost.org, bo...@lists.boost.org
The latter

[sent from tiny mobile device]

On Jul 26, 2010, at 12:55 PM, "Jeffrey Lee Hellrung, Jr." <jhel...@ucla.edu
> wrote:

> r is the proposition to replace the (naive) summation algorithm with
> kahan summation algorithm for only floating point types?

Gaetano Mendola

unread,
Jul 26, 2010, 1:59:36 PM7/26/10
to bo...@lists.boost.org
On 07/26/2010 06:55 PM, Jeffrey Lee Hellrung, Jr. wrote:
> Or is the proposition to replace the (naive) summation algorithm with
> kahan summation algorithm for only floating point types?

A possible scenario can be:

1) Leave the naive summation algorith as it is now
2) Create (with a typedef?) fast_sum that is the same of sum (naive)
2) Rename the sum_kahan as accurate_sum implemented as sum (naive)
3) Specialize the accurate_sum for floating points types implemented
as the real kahan's algorithm

however this can lead to have someone using accurate_sum for integral
type and find it out is not accurate at all !

IMHO "accurate" or "fast" are missleading. What if someone comes out
with an algorithm that gives better results and even faster than
actual implementation?

Think about how many sorting algorithms there are out there.

I think the best is to leave sum as it is now (no complains about
someone's regression tests failing), implement the sum_kahan only for
floating types and have it listed in documentation specifying that is
implemented only for floating types. The string Kahan in the name will
also give to users the hint on what it does and what the performance
loss is going to be.

BTW, what's the procedure to submit something officially to boost?

Regards
Gaetano Mendola

Giovanni Piero Deretta

unread,
Jul 26, 2010, 2:48:21 PM7/26/10
to bo...@lists.boost.org
On Mon, Jul 26, 2010 at 6:59 PM, Gaetano Mendola <men...@gmail.com> wrote:
> On 07/26/2010 06:55 PM, Jeffrey Lee Hellrung, Jr. wrote:
>>
>> Or is the proposition to replace the (naive) summation algorithm with
>> kahan summation algorithm for only floating point types?
>
> A possible scenario can be:
>
> 1) Leave the naive summation algorith as it is now
> 2) Create (with a typedef?) fast_sum that is the same of sum (naive)
> 2) Rename the sum_kahan as accurate_sum implemented as sum (naive)
> 3) Specialize the accurate_sum for floating points types implemented
>   as the real kahan's algorithm
>
> however this can lead to have someone using accurate_sum for integral
> type and find it out is not accurate at all !
>

wouldn't the native sum be accurate (actually exact) for integral
types as long as the sum doesn't overflow?

--
gpd

Eric Niebler

unread,
Jul 26, 2010, 2:52:38 PM7/26/10
to bo...@lists.boost.org
On 7/26/2010 1:59 PM, Gaetano Mendola wrote:
> On 07/26/2010 06:55 PM, Jeffrey Lee Hellrung, Jr. wrote:
>> Or is the proposition to replace the (naive) summation algorithm with
>> kahan summation algorithm for only floating point types?
>
> A possible scenario can be:
>
> 1) Leave the naive summation algorith as it is now
> 2) Create (with a typedef?) fast_sum that is the same of sum (naive)
> 2) Rename the sum_kahan as accurate_sum implemented as sum (naive)
> 3) Specialize the accurate_sum for floating points types implemented
> as the real kahan's algorithm
>
> however this can lead to have someone using accurate_sum for integral
> type and find it out is not accurate at all !
>
> IMHO "accurate" or "fast" are missleading. What if someone comes out
> with an algorithm that gives better results and even faster than
> actual implementation?
>
> Think about how many sorting algorithms there are out there.
>
> I think the best is to leave sum as it is now (no complains about
> someone's regression tests failing), implement the sum_kahan only for
> floating types and have it listed in documentation specifying that is
> implemented only for floating types. The string Kahan in the name will
> also give to users the hint on what it does and what the performance
> loss is going to be.

That's perfectly reasonable, but care must be taken that the sum_kahan
satisfies any dependency on the "sum" feature. That is, if someone asks
for "mean" (which depends on a "sum" feature) and "sum_kahan", they
don't get "mean", "sum", and "sum_kahan". The accumulators library makes
this pretty straightforward.

> BTW, what's the procedure to submit something officially to boost?

In this case, you would open a feature request on svn.boost.org ("New
Ticket"), assign to me, and attach your patch. If it looks good, I'll
merge it. Your patch should have docs and tests. Don't worry about
formatting the docs. I can do that. I wouldn't ask you to set up the doc
build chain; it's a headache.

Thanks!

--
Eric Niebler
BoostPro Computing
http://www.boostpro.com

Tomas Puverle

unread,
Jul 26, 2010, 3:09:41 PM7/26/10
to bo...@lists.boost.org
> PS As for naming, I'd vote for sum and accurate_sum (to keep backward
compatibility).

Please don't call them "accurate_sum" and "quick_sum" (as proposed by someone
else).

There are several different types of "accurate sum" algorithms with different
runtime trade-offs.

'sum' in itself is as accurate as it gets if all the numbers are normalized
in some way. As this is frequently required for the overall computation, it
would be wasteful to compute the secondary accumulator. There are other
scenarios where a plain 'sum' is good enough.

Thanks,

Tom

Topher Cooper

unread,
Jul 26, 2010, 3:14:04 PM7/26/10
to bo...@lists.boost.org
First off, note that in the test case part of the inaccuracy is due to the fact that 1F-6 cannot be represented exactly in floating point. Obviously, however you do it, if you add 1000000 numbers that are not precisely equal to 1/1000000 together then the correct answer will not be 1.0.

As for naming -- I'm always uncomfortable using a keyword like "accurate" -- since there are many ways to improve accuracy, with Kahan's algorithm being only one.

For example: for some situations presorting the numbers in ascending order can improve accuracy.

A two pass algorithm, when possible, will probably give you the same result as Kahan, with, ignoring paging, similar costs. First sum the numbers. Divide the result by the number of values. Sum the differences between each value and that mean value. Subtract the result from the original sum. Kahan requires 4 add/sub ops per value, the two-pass 3.

Another technique: recursively sum the lower half and the upper half and add the results. Using an explicit stack and doing this iteratively would probably improve on Kahan in both accuracy and time. This should not be combined with an ascending sort (or if you do, use interleaved splits rather than lower/upper). Ignoring the bookkeeping, this requires a single addition per value.

I'm not suggesting that any of these be added, but if they were -- what would they be called?

Topher

Gaetano Mendola

unread,
Jul 26, 2010, 4:03:11 PM7/26/10
to bo...@lists.boost.org
On 07/26/2010 09:14 PM, Topher Cooper wrote:
> I'm not suggesting that any of these be added, but if they were -- what would they be called?

This is exactly my point.

Regards
Gaetano Mendola

Matthias Troyer

unread,
Jul 26, 2010, 4:07:18 PM7/26/10
to bo...@lists.boost.org

On 26 Jul 2010, at 13:14, Topher Cooper wrote:

> First off, note that in the test case part of the inaccuracy is due to the fact that 1F-6 cannot be represented exactly in floating point. Obviously, however you do it, if you add 1000000 numbers that are not precisely equal to 1/1000000 together then the correct answer will not be 1.0.
>
> As for naming -- I'm always uncomfortable using a keyword like "accurate" -- since there are many ways to improve accuracy, with Kahan's algorithm being only one.
>
> For example: for some situations presorting the numbers in ascending order can improve accuracy.
>
> A two pass algorithm, when possible, will probably give you the same result as Kahan, with, ignoring paging, similar costs. First sum the numbers. Divide the result by the number of values. Sum the differences between each value and that mean value. Subtract the result from the original sum. Kahan requires 4 add/sub ops per value, the two-pass 3.
>
> Another technique: recursively sum the lower half and the upper half and add the results. Using an explicit stack and doing this iteratively would probably improve on Kahan in both accuracy and time. This should not be combined with an ascending sort (or if you do, use interleaved splits rather than lower/upper). Ignoring the bookkeeping, this requires a single addition per value.
>
> I'm not suggesting that any of these be added, but if they were -- what would they be called?

Why don't you just call it sum(kahan)

Matthias

Adam Wulkiewicz

unread,
Jul 26, 2010, 4:35:48 PM7/26/10
to bo...@lists.boost.org
Hi,

It's my first message so I'd like to introduce myself. My name's Adam
Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland.

I'd like to take part in evolution of Boost library. I've implemented
some space partitioning data structures and k-dimensional
generalizations of STL containers and thought that since these kind of
containers are useful to many people it would be nice to have them in Boost.

For people who don't know what are space partitioning data structures:
These data structures divides arbitrary k-dimensional space containing
some k-dimensional 'objects' to accelerate searching for an 'object' or
subset of 'objects' placed in a part of divided space. Just like fe. a
binary tree divides a set of numbers to accelerate searching.

These data structures are used fe. in:
- computer graphics to accelerate rendering (searching of 3D objects,
fe. triangles in raytracing or photons in photon mapping),
- games to accelerate testing for intersection of rays with a scene and
movable objects.

In these examples 'the space' is 'common' 3D euclidean space but you may
divide every space you like to accelerate searching: spherical space,
space containing features of an object used in image recognition process
and so on.

I've implemented:
- k-dimensional generalizations of STL containers (used by some space
partitioning data structures)
- k-dimensional space partitioning data structures
- regular grid
- kd-tree
- something I've called regular tree,
it's k-dimensional generalization of quadtree/octree,
so for k=2 it's quadtree, for k=3 it's octree,
for k=4 it's 16tree and so on.
- k-dimensional euclidean algebra/geometry data structures
- points, vectors etc.,
- unit tests.

There are boost-like names used in code allready. If you'd like to know
more just ask.

My question is, what do you think about it?
And, If you like my idea, how to start?
I'd like to show my code to you in order to receive your opinions and
suggestions. May I use boost repository or must I place my code elsewere?

Regards,
Adam Wulkiewicz

OvermindDL1

unread,
Jul 26, 2010, 5:00:47 PM7/26/10
to bo...@lists.boost.org

I am quite interested, I would have use for such a library.

Jeffrey Lee Hellrung, Jr.

unread,
Jul 26, 2010, 5:01:22 PM7/26/10
to bo...@lists.boost.org
On 7/26/2010 1:35 PM, Adam Wulkiewicz wrote:
> Hi,
>
> It's my first message so I'd like to introduce myself. My name's Adam
> Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland.
>
> I'd like to take part in evolution of Boost library. I've implemented
> some space partitioning data structures and k-dimensional
> generalizations of STL containers and thought that since these kind of
> containers are useful to many people it would be nice to have them in
> Boost.
[...]

I'm interested.

How does this compare to the facilities that CGAL offers? (I have taken
only a cursory glance at CGAL's space partitioning data structures...)

I believe you can upload to the Boost Vault
(http://www.boostpro.com/vault/), or just upload to your public_html
space that your university provides (if it does).

- Jeff

Adam Wulkiewicz

unread,
Jul 26, 2010, 6:59:33 PM7/26/10
to bo...@lists.boost.org
Jeffrey Lee Hellrung, Jr. wrote:
> On 7/26/2010 1:35 PM, Adam Wulkiewicz wrote:
>> Hi,
>>
>> It's my first message so I'd like to introduce myself. My name's Adam
>> Wulkiewicz, I'm a doctoral student of Technical University of Lodz,
>> Poland.
>>
>> I'd like to take part in evolution of Boost library. I've implemented
>> some space partitioning data structures and k-dimensional
>> generalizations of STL containers and thought that since these kind of
>> containers are useful to many people it would be nice to have them in
>> Boost.
> [...]
>
> I'm interested.
>
> How does this compare to the facilities that CGAL offers? (I have taken
> only a cursory glance at CGAL's space partitioning data structures...)
>
> I believe you can upload to the Boost Vault
> (http://www.boostpro.com/vault/), or just upload to your public_html
> space that your university provides (if it does).

I don't know about all features of CGAL either. I see that there is a
kd-tree but I think it's a point version. The rest of these algorithms/
data structures are desined just for 2D and 3D data (Which is wierd
because their AABBTree should be implemented as n-dimensional data
structure). This kind of library should be fully n-d, with
interchangable containers. What is more, there is no regular grid which
is good for dynamically changing data.

By the way, how the process of incorporation of new library looks like?
When the library is good for become a candidate? Where it should be
placed etc.?

Gaetano Mendola

unread,
Jul 26, 2010, 7:08:15 PM7/26/10
to bo...@lists.boost.org, Eric Niebler
On 07/26/2010 08:52 PM, Eric Niebler wrote:
> In this case, you would open a feature request on svn.boost.org ("New
> Ticket"), assign to me, and attach your patch. If it looks good, I'll
> merge it. Your patch should have docs and tests. Don't worry about
> formatting the docs. I can do that. I wouldn't ask you to set up the doc
> build chain; it's a headache.

Done.

Regards
Gaetano Mendola

Christian Holmquist

unread,
Jul 26, 2010, 10:20:32 PM7/26/10
to bo...@lists.boost.org
>
>
>
> I've implemented:
> - k-dimensional generalizations of STL containers (used by some space
> partitioning data structures)
> - k-dimensional space partitioning data structures
> - regular grid
> - kd-tree
> - something I've called regular tree,
> it's k-dimensional generalization of quadtree/octree,
> so for k=2 it's quadtree, for k=3 it's octree,
> for k=4 it's 16tree and so on.
> - k-dimensional euclidean algebra/geometry data structures
> - points, vectors etc.,
> - unit tests.
>
>
This would be immensely useful. Is your library made available online
somewhere already?
Cheers,
Christian

Eric Niebler

unread,
Jul 27, 2010, 9:04:54 AM7/27/10
to bo...@lists.boost.org
On 7/26/2010 7:08 PM, Gaetano Mendola wrote:
> On 07/26/2010 08:52 PM, Eric Niebler wrote:
>> In this case, you would open a feature request on svn.boost.org ("New
>> Ticket"), assign to me, and attach your patch. If it looks good, I'll
>> merge it. Your patch should have docs and tests. Don't worry about
>> formatting the docs. I can do that. I wouldn't ask you to set up the doc
>> build chain; it's a headache.
>
> Done.

(For those playing along at home, the ticket is here:
https://svn.boost.org/trac/boost/ticket/4471)

Thanks. This needs documentation, though. Like I said, I don't care
about formatting, just tell what what text you'd like to see inserted
where in the docs and I'll do it.

--
Eric Niebler
BoostPro Computing
http://www.boostpro.com

John Maddock

unread,
Jul 27, 2010, 12:17:18 PM7/27/10
to bo...@lists.boost.org
>>> This would be ideal - but doesn't accumulator already use 'sum' so it
>>> would change behaviour of existing programs?
>>
>> Technically, of course it would, but aren't they getting the wrong answer
>> currently?
>
> I tend to agree with Dave, so long as the kahan sum accumulator is a
> drop-in replacement for what we already have. Does it Just Work for
> integral types, too?

It's not required for integral types as these don't suffer from cancelation
errors.

John.

Gaetano Mendola

unread,
Jul 27, 2010, 1:14:47 PM7/27/10
to bo...@lists.boost.org, Eric Niebler
On 07/27/2010 03:04 PM, Eric Niebler wrote:
> just tell what what text you'd like to see inserted
> where in the docs and I'll do it.

At this page:

http://www.boost.org/doc/libs/1_43_0/doc/html/accumulators/user_s_guide.html#accumulators.user_s_guide.the_statistical_accumulators_library

I would add a link sum_kahan pointing to this:

===================================================
sum_kahan

Implements the summation kahan's algorithm, reducing the accumulation
error in case of floating point types.

Result Type

sample-type

Depends On

none
Variants

none
Initialization Parameters

none
Accumulator Parameters

none
Extractor Parameters

none
Accumulator Complexity

O(1), however compared to naive sum this accumulator
performs 4 sum operations for each element accumulated

Extractor Complexity

O(1)

Header

#include <boost/accumulators/statistics/sum_kahan.hpp>

Example

accumulator_set<float, stats<tag::sum_kahan> > acc;

BOOST_CHECK_EQUAL(0.0f, sum_kahan(acc));

for (size_t i = 0; i < 1e6; ++i) {
acc(1e-6f);
}

BOOST_CHECK_EQUAL(1.0f, sum_kahan(acc));
=========================================================


I'm not sure if this was what you meant by documentation.


Regards
Gaetano Mendola

David Abrahams

unread,
Jul 27, 2010, 2:29:19 PM7/27/10
to bo...@lists.boost.org
At Tue, 27 Jul 2010 17:17:18 +0100,

John Maddock wrote:
>
> >>> This would be ideal - but doesn't accumulator already use 'sum' so it
> >>> would change behaviour of existing programs?
> >>
> >> Technically, of course it would, but aren't they getting the wrong answer
> >> currently?
> >
> > I tend to agree with Dave, so long as the kahan sum accumulator is a
> > drop-in replacement for what we already have. Does it Just Work for
> > integral types, too?
>
> It's not required for integral types as these don't suffer from cancelation
> errors.

And I'm loathe to disagree with John and Matthias on matters of
numerics.

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

_______________________________________________

Simonson, Lucanus J

unread,
Jul 27, 2010, 2:48:08 PM7/27/10
to bo...@lists.boost.org
Adam Wulkiewicz wrote:
> By the way, how the process of incorporation of new library looks
> like?
> When the library is good for become a candidate? Where it should be
> placed etc.?

You have two options. First, you can go through the entire review process described here: http://www.boost.org/development/submissions.html
The process is long, expect it to take at least one year. I am keenly interested in seeing quality libraries like this developed under free license. My organization would use them and I get asked twice a year on average if my library has it and if not if it exists under free license elsewhere. I volunteer to be your review manager if the quality of the library meets my expectations so that is one stumbling block that won't trip you up. (Some worthy libraries have difficulty finding a review manager.) However, such a review would be quite controversial because some people will have a strong opinion that you should choose the other option.

The other option is to engage with the newly accepted Boost.Geometry library. They have already some plans to provide a simple 2D space partitioning tree that exists already in non-boost ready form. I don't know the status of their development on it because I don't monitor their list and they don't discuss it on the boost dev list. They are quite busy finishing implmenting their library and getting it release ready in terms of documentation and meeting the post conditions of their review. My understanding is that there is at least four or five developers active in the project, though most of the work is done by the primary developer. Engaging with them wouldn't necessarily shorten the time it takes to get your code released as part of boost, but it would allow you to avoid a formal review and instead just have to satisfy them that your code is ready to become a part of their library.

While I also have a geometric boost library about to be released as part of 1.44:
https://svn.boost.org/svn/boost/sandbox/gtl/doc/index.htm
I don't think you can add your containers as new features to my library because:
1. the community pretty much decided that Boost.Geometry is where such new development should go
2. it doesn't fit very well in my library, which is named Boost.Polygon. I provide a coordinate, point and rectangle concepts that could serve your needs, but you need spherical coordinate space concept as well, which Boost.Geometry already has.

That said I'm not opposed to people extending my library in general, quite the opposite. I can report the the Google Summer of Code project I'm currently mentoring has succeeded in producing a robust and optimal voronoi diagram of points sweepline implementation as is working toward voronoi of line segments and, by extension, polygons.

My advice to you is to talk to the Boost.Geometry people and find out what their schedule is for release of their library. If you are satisfied with adding a schedule dependency on their work then you write your interfaces to depend on their corrdinate, point, rectangle, sphere and coordinate system concepts and go through whatever process they have for accepting extensions. I think your code should go in the core and not an extension because it is generally useful for all geometry and even things that aren't commonly though of as geometry, like database indexing, but that will be for you to work out with them. They have put some thought into making their concepts extensible to n-dimensions, so hopefully you can benefit from that design work. In a way it would be a shame that they went to the effort and then not have that work used for your containers.

Regards,
Luke

Adam Wulkiewicz

unread,
Jul 27, 2010, 4:59:42 PM7/27/10
to bo...@lists.boost.org
Christian Holmquist wrote:
> This would be immensely useful. Is your library made available online
> somewhere already?

Not yet, but I think, I'll put it in the Vault for a while.

Regards,
Adam

Adam Wulkiewicz

unread,
Jul 27, 2010, 5:22:36 PM7/27/10
to bo...@lists.boost.org

Thank you for detailed answer and a proposition to be my review manager.

I think that the library I'd like to develop don't fit in
Boost.Geometry. The purposes of both are different. Space partitioning
(in mathematical meaning), is a separate problem. Functionality like
this may be used in library like Boost.Geometry but might be used with a
large number of different libraries as well. As I see it, it should be a
set of general, n-dimensional solutions. Therefore I think, that it
should be a separate library.

I'd rather choose option number one (thank you again) but I'll wait a
while for the feedback from the comunity.

Regards,
Adam

Demian Nave

unread,
Jul 27, 2010, 11:08:18 PM7/27/10
to bo...@lists.boost.org
On 7/26/2010 4:35 PM, Adam Wulkiewicz wrote:
> Hi,
>
> It's my first message so I'd like to introduce myself. My name's Adam
> Wulkiewicz, I'm a doctoral student of Technical University of Lodz, Poland.
>
> I'd like to take part in evolution of Boost library. I've implemented
> some space partitioning data structures and k-dimensional
> generalizations of STL containers and thought that since these kind of
> containers are useful to many people it would be nice to have them in
> Boost.
[...]

Adam,

I'd also be interested in this sort of library, especially if the data
structures can be used with arbitrary geometry types given the appropriate
operators. I'm particularly interested in generic data structures that can
be extended to implement space partitioning of NURBS curves and surfaces.

Cheers,
Demian

Thorsten Ottosen

unread,
Jul 28, 2010, 4:27:09 AM7/28/10
to bo...@lists.boost.org
Adam Wulkiewicz skrev:
> Hi,

> I've implemented:
> - k-dimensional generalizations of STL containers (used by some space
> partitioning data structures)
> - k-dimensional space partitioning data structures
> - regular grid
> - kd-tree
> - something I've called regular tree,
> it's k-dimensional generalization of quadtree/octree,
> so for k=2 it's quadtree, for k=3 it's octree,
> for k=4 it's 16tree and so on.
> - k-dimensional euclidean algebra/geometry data structures
> - points, vectors etc.,
> - unit tests.

I'm very interested as such data-structures also play a role in AI and
clustering/data mining.

You may also consider BSP trees now that you are at it.

-Thorsten

Barend Gehrels

unread,
Jul 28, 2010, 5:14:42 AM7/28/10
to bo...@lists.boost.org
Hi Adam,

First I'm very interested in your library.

Is it already uploaded into Vault? Could you publish the URL? If you
didn't upload it already, you might consider to use the Boost SVN
sandbox instead.

>
> I think that the library I'd like to develop don't fit in
> Boost.Geometry. The purposes of both are different. Space partitioning
> (in mathematical meaning), is a separate problem. Functionality like
> this may be used in library like Boost.Geometry but might be used with
> a large number of different libraries as well. As I see it, it should
> be a set of general, n-dimensional solutions. Therefore I think, that
> it should be a separate library.

I probably agree with this. However, I would like to investigate if a
coupling of these two libraries is possible, to provide the users of
Boost.Geometry with a good spatial index which works for them by default
and which can thus also be used for other purposes.
So Boost.Geometry might follow your library, or you might use some
concepts of Boost.Geometry, but I would like to see the sources to say
more about this.


Regards, Barend

Christoph Heindl

unread,
Jul 28, 2010, 5:20:21 AM7/28/10
to bo...@lists.boost.org
Adam,

> I've implemented:
> - k-dimensional generalizations of STL containers (used by some space
> partitioning data structures)
> - k-dimensional space partitioning data structures
>        - regular grid
>        - kd-tree
>        - something I've called regular tree,
>        it's k-dimensional generalization of quadtree/octree,
>        so for k=2 it's quadtree, for k=3 it's octree,
>        for k=4 it's 16tree and so on.
> - k-dimensional euclidean algebra/geometry data structures
>        - points, vectors etc.,
> - unit tests.
>
> There are boost-like names used in code allready. If you'd like to know more
> just ask.


I'm interested too.

I might be able to contribute as I'm using space partitioning
structures on an every day basis.

Here are a couple of questions
- what kind of split policies are implemented for the k-d tree?
- how do you cope with unbalanced trees caused by dynamic
insertion/deletion of points?
- what queries are supported (i.e nearest neighbors, arbitrary
volumes, parametric volumes) ?
- do you offer incremental queries that can be stopped before
evaluating the entire result set ?
- did you consider to implement parallel queries ?
- where's the code?

Best regards,
Christoph

Simonson, Lucanus J

unread,
Jul 28, 2010, 12:03:15 PM7/28/10
to bo...@lists.boost.org
Adam Wulkiewicz wrote:
> I'd rather choose option number one (thank you again) but I'll wait a
> while for the feedback from the comunity.

You should search the archives of the dev list because we have discussed spatial indexing several times in the context of a failed GSOC project from 2006 and the design of my library as well as Boost.Geometry. Both libraries were criticized during review for lacking spatial query data structures. Extention to n-dimensions was discussed at length several times.

There is no doubt that there is interest in a library such as you propose. At this stage we would probably like to see your code and also have you describe the interfaces you have in mind along with example code for how they would be used. The majority of interest will be in the interfaces and how the data structures are used and not the internal details of how they are implemented.

Regards,
Luke

Adam Wulkiewicz

unread,
Jul 28, 2010, 9:11:55 PM7/28/10
to bo...@lists.boost.org
Barend Gehrels wrote:
> Hi Adam,
>
> First I'm very interested in your library.
>
> Is it already uploaded into Vault? Could you publish the URL? If you
> didn't upload it already, you might consider to use the Boost SVN
> sandbox instead.

I've just uploaded preliminary version into the Vault. There is a
README.txt file with description and main.cpp running tests.

Regards,
Adam

Adam Wulkiewicz

unread,
Jul 28, 2010, 9:14:20 PM7/28/10
to bo...@lists.boost.org
Sorry, I didn't write what is it called. It's space-0.1.zip.

Simonson, Lucanus J

unread,
Jul 29, 2010, 3:07:13 PM7/29/10
to bo...@lists.boost.org
Adam Wulkiewicz wrote:
> Sorry, I didn't write what is it called. It's space-0.1.zip.

I wish you had just posted the readme at the outset:

--------------------README.txt from space-0.1.zip--------------------------
It's a preliminary version of library I've called Space. As you can see it's _not_ fully operational.
The main reason is that I wanted to work out some basic concepts and interfaces before I'll go further.

Contents
1. Parts of the library
2. The interfaces and uses
3. Additional notes

1. Parts of the library:

space::ag - namespace containing compile-time linear algebra, it's not required by the other parts of the library but it's good supplement.
- angle, vector, point, matrix, square_matrix - nD
- orientation, rotation_matrix - 2/3D
- rotation_quaternion - 3D

space::kd - nD containers, types and containers may be used in space::spar containers(in regular grid and regular tree)
types.hpp contains nD versions of values
deque, vector which are recursive_container(s) - nD containers which are internally fe. std::vector of std::vectors of std::vectors...
dynamic_array - nD container which is internally a single std::vector
* the name should be changed
static_array - recursive const size container
* it might be a derivation of recursive_container with boost::array as a basic type
* it has additional operators to make possible dereferencing

All containers has nD resize, insert, erase etc.
The iterator is replaced by enumerator and it iterates in nD area of elements.
There is a line_enumerator also which iterates by the elements on a nD line between two elements.

space::spar - nD space partitioning containers
concepts
point_type concept
defined type value_type
static member dimension
operator[]() returning nth coordinate
point_object concept
defined point_type
position() method returning object of point_type
volumetric_object concept
defined point_type
aabb() method returning object of space::spar::aabb<point_type>
If there will be structures using on bounding spheres or other bounding objects there should be more volumetric objects concepts.
aabb - nD AABB
it may be tested for intersection with a ray (for now ray is defined by origin point and direction vector, both are an implementation of point_type concept (it should be fe. coordinates concept in this case))
point_node_kd_tree - kd-tree for point objects
internally has heap-like structure(all nodes in std::vector)
points are the nodes, but only pointers are stored
points passed by iterators in the constructor or build method
splitting plane - position of the middle point - it is left balanced
no addition of new points/rebalancing
queries:
closest point
precise number of closest points
all points from a region (nD hypersphere)
point_kd_tree - kd-tree for point objects
internally has structure of a linked nodes binary tree
points are stored in containers in nodes
containers are in leaf nodes and in internal nodes (which is bad)
points passed by iterators in the constructor or build method
splitting plane - half of the longest edge of current node aabb - not balanced
this is just one of the algorithms because I want to implement different ones
fe. an algorithm of building a tree with clipping of empty space (some articles mentions that this gives good results)
no addition of new points/rebalancing
queries:
closest point
precise number of closest points
all points from a region (nD hypersphere)
kd_tree - kd-tree for volumetric objects
internally has structure of a linked nodes binary tree
objects are stored in containers in nodes
containers are in leaf nodes and in internal nodes (the same nodes are used in kd_tree and point_kd_tree)
splitting plane - half of the longest edge of current node aabb - not balanced
this is just one of the algorithms because I want to implement different ones
fe. V. Havran proposed an algorithm of kd-tree building with cutting of empty space and automatical splitting stop detection which has O(NlogN) complexity,
no addition of new points/rebalancing
no queries
regular_tree - nD generalization of 2Dquadtree/3Doctree
internally has structure of a linked nodes, each node has 2^n children
objects are stored in containers in nodes
containers are in leaf nodes and in internal nodes (again)
splitting plane - halves of the edges of current node aabb
this is just one of the algorithms
no rebuilding/rebalancing
no queries
regular_grid - nD not fully functional regular grid which should be called regular_grid_base
it is a container which has a dimension and i placed in space (has aabb) and contains not infinitesimal cells,
it is able to do some operations on subspaces(ranges) of this grid

2. The interfaces and uses

It's an example where point_node_kd_tree is used but you may change it to space::spar::point_kd_tree as well.

#include <vector>
#include <boost/foreach.hpp>
#include <space/spar.hpp>

struct point
{
typedef float value_type;
static const size_t dimension = 3;

point() {}
point(float a, float b, float c) { arr[0] = a; arr[1] = b; arr[2] = c; }
const float & operator[](size_t i) const { return arr[i]; }
float & operator[](size_t i) { return arr[i]; }

float arr[3];
};

struct point_object
{
typedef point point_type;

const point_type & position() const { return p; }

point_type p;
};

int main()
{
std::vector<point_object> points(100);
BOOST_FOREACH(point_object &p, points)
{
p.p = point(rand() % 100, rand() % 100, rand() % 100);
}

space::spar::point_node_kd_tree<point_object> kdt(points.begin(), points.end());

point_object *p = kdt.find(point(50.0f, 50.0f, 50.0f), 100.0f);

found_points<point_object> fp(3);
kdt.find(fp, point(50.0f, 50.0f, 50.0f), 400.0f);

std::vector<point_object*> fpts;
fpts.reserve(fp.size());
fp.get(std::back_inserter(fpts));

return 0;
}

3. Additional notes

There is a lot of additional things to do in this matter:
- different trees building algorithms,
- different trees structures,
- parallel building,
- parallel queries,
- incremental queries,
- ray queries,
- volume queries,
- use of pool allocators,
- etc.

But first, I need to know, what do you think about this?
Is it a good starting point?
If not, what are your suggestions?
--------------------------------------END of README.txt-------------------------------------


I'm a little dissapointed that there is no insertion or rebalancing on any of the trees. This is more of an intention to write a library than a library. Since Boost.Geometry already provides an nD point type and point concept and planned to eventually provide nD kd-trees you might want to reconsider working with them and adopting their interface design, which was arrived at after literally years of (sometimes heated) debate on this list about what the best design for the interfaces should be. The argument that what you are doing is math and not geometry doesn't really hold water in my opinion. By the way, the iterators over nD data structures problem is an age old debate in C++ and there are people on this list who can provide you with valuable insight on what to do and what not to do based on fifteen years of experience with trying and sometimes failing to define a good iterator based interface for nD containers.

Your interface seems a little off to me, not what I was expecting. Why does find return a pointer to point_object? If you don't find a point within 100.0f does it return a null pointer? Why not use a boost::optional or pass point_object by reference as first argument and return bool? Either is better by a long shot. Are you dynamically allocating the point_object you return? Why does point_object exist, why not just use point_type directly? Why does found_points exist, why not pass by reference a back inserter that accepts point_type as the first argument of find and return the number of points found? There is some complexity to your interface design that needs to be justified with your rationale for making these design decisions.

Don't take this feedback as criticism, I'm still excited that you plan to contribute this library and supportive of your efforts.

Regards,
Luke

Adam Wulkiewicz

unread,
Jul 29, 2010, 9:15:47 PM7/29/10
to bo...@lists.boost.org
Simonson, Lucanus J wrote:
> This is more of an intention to write a library than a library.

Yes, you're right. Maby I was too hasty.

> Since Boost.Geometry already provides an nD point type and point concept and planned to eventually provide nD kd-trees you might want to reconsider working with them and adopting their interface design

I wanted to show you what I've got at this stage.

You're right I should adopt their nD point interface and box probably.

But, if boost::geometry::point you have in mind, coordinates are chosen
in compile-time only(or maby I'm wrong?) and this can't be used by space
partitioning data structures.

> By the way, the iterators over nD data structures problem is an age old debate in C++ and there are people on this list who can provide you with valuable insight on what to do and what not to do based on fifteen years of experience with trying and sometimes failing to define a good iterator based interface for nD containers.

[...]

I will appreciate any help.

> Your interface seems a little off to me, not what I was expecting.

Yes, it's not fully designed and rather low-level. It is the implmented
interface, not the target.

> If you don't find a point within 100.0f does it return a null pointer?

Yes. Btw, 100.0f is a square distance (it should be a distance).

> Why not use a boost::optional or pass point_object by reference as first argument and return bool? Either is better by a long shot.

The thing is, there aren't copies of point_objects in kd-tree. There are
pointers. If there were copies, the 2nd option will be the best.

Btw, it's another issue I'm not sure about. Should copies or
references/pointers be kept in kd-tree?

> Are you dynamically allocating the point_object you return?

No, this is a pointer to one of the point_object passed in the constructor.

> Why does point_object exist, why not just use point_type directly?

Because there may be some complex point object which shouldn't been
point's interface implementation because it isn't a point. Fe. in Photon
Mapping algorithm photons are used. They are point objects but aren't
points, they have some position but may have some other nD attributes
direction, color, etc.

But, I'm open to suggestion.

> Why does found_points exist, why not pass by reference a back inserter that accepts point_type as the first argument of find and return the number of points found?

Yes, this should be corrected. found_points should be encapsulated in
the kd-tree. It is a container designed to take part in search of N
point_objects in some max distance. If more than N objects are found the
distance may be closer than max distance. This class contains
information about the search. Without it in the interface, the number of
points found and the max distance should be returned with the point objects.

> Don't take this feedback as criticism, I'm still excited that you plan to contribute this library and supportive of your efforts.

I expect and appreciate criticism. I'm still learning.

So, I think the searching interface might look more or less like this:
(I assume that copies are used)

[code]

// find one closest point, max distance = 10
point_object p;
bool found = kdt.find(p, point_type(...), 10);

// find max 5 closest points, initial max distance = 10
// output: number of pointsfound and distance to the furthest point
std::vector<point_object> fv;
fv.reserve(5);
std::pair<size_t, value_type> found_data =
kdt.find(std::back_inserter(fv), point_type(...), 10, 5);
// or
value_type max_distance;
size_t found_points = kdt.find(std::back_inserter(fv), max_distance,
point_type(...), 10, 5);

// find all points in area, initial max distance = 10
std::vector<point_object> fv2;
fv2.reserve(10);
size_t found_num = kdt.find(std::back_inserter(fv2), point_type(...), 10);

[code]

And building interface like this:

[code]

std::vector<point_object> v(100);
std::vector<point_object> v2(10);

// build kd-tree here
space::spar::kd_tree<point_object> kdt(v.begin(), v.end());

kdt.add(v2.begin(), v2.end());
kdt.add(point_object(...));
kdt.rebuild();

kdt.clear();
kdt.build(v.begin(), v.end());

[code]

Eventually:

[code]

std::vector<point_object> v(100);
std::vector<point_object> v2(10);
std::vector<point_object> v3(20);

// just add points, tree not built
space::spar::kd_tree<point_object> kdt(v.begin(), v.end());
kdt.add(v2.begin(), v2.end());
kdt.add(point_object(...));

// build kd-tree here
kdt.rebuild();

kdt.add(v3.begin(), v3.end());
kdt.add(point_object(...));
kdt.rebuild();

[code]

Eventually, rebuilding might be 'inside' add method.
Rebuild might be implemented in build.
There might be methods for removing points from given area of kd-tree.

I used rebuild(), not (rebalance()) because there are some splitting
rules producing nice, not balanced kd-trees (see:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.8380).

Regards,
Adam

Simonson, Lucanus J

unread,
Jul 29, 2010, 10:31:04 PM7/29/10
to bo...@lists.boost.org
Adam Wulkiewicz wrote:
> You're right I should adopt their nD point interface and box probably.
>
> But, if boost::geometry::point you have in mind, coordinates are
> chosen
> in compile-time only(or maby I'm wrong?) and this can't be used by
> space partitioning data structures.

If they don't provide an operator[] or get(axis) interface and only get<>() then that is a problem and you probably should influence them to fix that so that before they release their library. It isn't too late.

>> By the way, the iterators over nD data structures problem is an age
>> old debate in C++ and there are people on this list who can provide
>> you with valuable insight on what to do and what not to do based on
>> fifteen years of experience with trying and sometimes failing to
>> define a good iterator based interface for nD containers. [...]
>
> I will appreciate any help.

You might want to see if you can dovetail into ongoing work on linear algebra and matrix library development in boost. We have ublas, multiarray and several libraries under construction in this area.

>> Your interface seems a little off to me, not what I was expecting.
>
> Yes, it's not fully designed and rather low-level. It is the
> implmented interface, not the target.

Ok, that makes more sense.

>> If you don't find a point within 100.0f does it return a null
>> pointer?
>
> Yes. Btw, 100.0f is a square distance (it should be a distance).

That actually makes sense for an implementation interface since you want to avoid the sqrt.

>> Why not use a boost::optional or pass point_object by reference as
>> first argument and return bool? Either is better by a long shot.
>
> The thing is, there aren't copies of point_objects in kd-tree. There
> are pointers. If there were copies, the 2nd option will be the best.
>
> Btw, it's another issue I'm not sure about. Should copies or
> references/pointers be kept in kd-tree?

It depends to some degree on how many dimensions are stored in a point. If the point is not absurdly large then copies should always be better because you avoid pointer chasing and get better cache behavior. Just try sorting a vector of points and compare to sorting a vector of pointers to points and you'll quickly see that local copies wins.

>> Are you dynamically allocating the point_object you return?
>
> No, this is a pointer to one of the point_object passed in the
> constructor.

One reason to do this sort of thing is if the point type disallowed copy construction and assignment, but I doubt you had that in mind. We lean pretty heavily on NRVO and inlining to make return by value fast.

>> Why does point_object exist, why not just use point_type directly?
>
> Because there may be some complex point object which shouldn't been
> point's interface implementation because it isn't a point. Fe. in
> Photon Mapping algorithm photons are used. They are point objects but
> aren't
> points, they have some position but may have some other nD attributes
> direction, color, etc.
>
> But, I'm open to suggestion.

You can use adaptor traits similar to what boost geometry does to avoid the wrapper class.

> So, I think the searching interface might look more or less like this:
> (I assume that copies are used)
>
> [code]
>
> // find one closest point, max distance = 10
> point_object p;
> bool found = kdt.find(p, point_type(...), 10);
>
> // find max 5 closest points, initial max distance = 10
> // output: number of pointsfound and distance to the furthest point
> std::vector<point_object> fv;
> fv.reserve(5);
> std::pair<size_t, value_type> found_data =
> kdt.find(std::back_inserter(fv), point_type(...), 10, 5);
> // or
> value_type max_distance;
> size_t found_points = kdt.find(std::back_inserter(fv), max_distance,
> point_type(...), 10, 5);

I wouldn't bother returning the max distance found. Let the user derive it from the points returned if they want it. It complicates the interface and I'm not sure how often it is useful.

> // find all points in area, initial max distance = 10
> std::vector<point_object> fv2;
> fv2.reserve(10);
> size_t found_num = kdt.find(std::back_inserter(fv2), point_type(...),
> 10);

This one looks nice and clean.

> Eventually, rebuilding might be 'inside' add method.
> Rebuild might be implemented in build.
> There might be methods for removing points from given area of kd-tree.
>
> I used rebuild(), not (rebalance()) because there are some splitting
> rules producing nice, not balanced kd-trees (see:
> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.8380).

What happens if the user queries before after they have added but before they have built? Throw an exception? Return nothing? Automatically rebuild? I wouldn't provide build, rebuild or rebalance interfaces. It should be like the red-black tree, we never tell a std::set to rebalance itself. If the user wants to rebuild the tree they can construct a new one with all the points.

What about if you provided lower_bound and upper_bound:

kdtree::iterator itr = kdt.lower_bound(point, 0.0f);
kdtree::iterator itr2 = kdt.upper_bound(point, 0.0f);
std::vector<point> pts(itr, itr2);
kdt.erase(itr, itr2);

You may also sometimes want all the points in the tree and would just use begin() and end(). This style of interface has the advantage that it is maximally intuitive to people who are used to the stl. It may have the drawback of being challenging to implement and perhaps execute slower than back inserter style. There are iterator tricks used and facilitated by boost to make it easier and more efficient to provide such an interface as a wrapper around the back inserter interface, for example.

Now, let's imagine that instead of euclidean the user wants L distance metric. You can provide a version of lower_bound and upper_bound that take a distance metric functor as a third argument.

Lots to think about.

Regards,
Luke

Adam Wulkiewicz

unread,
Jul 30, 2010, 8:02:33 AM7/30/10
to bo...@lists.boost.org
Simonson, Lucanus J wrote:
> Adam Wulkiewicz wrote:
>> You're right I should adopt their nD point interface and box probably.
>>
>> But, if boost::geometry::point you have in mind, coordinates are
>> chosen
>> in compile-time only(or maby I'm wrong?) and this can't be used by
>> space partitioning data structures.
>
> If they don't provide an operator[] or get(axis) interface and only get<>() then that is a problem and you probably should influence them to fix that so that before they release their library. It isn't too late.
>

To clarify this. It is possible to create some data structures (kd-tree
with sequential choosing of splitting plane, regular tree, etc.) but in
more complex kd-tree splitting algorithm there must be this kind of
interface used. I'd like to develop data structures with 'switchable'
internal structure/building algorithm. If the tree structure is better,
the searching is faster.

>>> Why not use a boost::optional or pass point_object by reference as
>>> first argument and return bool? Either is better by a long shot.
>>
>> The thing is, there aren't copies of point_objects in kd-tree. There
>> are pointers. If there were copies, the 2nd option will be the best.
>>
>> Btw, it's another issue I'm not sure about. Should copies or
>> references/pointers be kept in kd-tree?
>
> It depends to some degree on how many dimensions are stored in a point. If the point is not absurdly large then copies should always be better because you avoid pointer chasing and get better cache behavior. Just try sorting a vector of points and compare to sorting a vector of pointers to points and you'll quickly see that local copies wins.
>
>>> Are you dynamically allocating the point_object you return?
>>
>> No, this is a pointer to one of the point_object passed in the
>> constructor.
>
> One reason to do this sort of thing is if the point type disallowed copy construction and assignment, but I doubt you had that in mind. We lean pretty heavily on NRVO and inlining to make return by value fast.
>
>>> Why does point_object exist, why not just use point_type directly?
>>
>> Because there may be some complex point object which shouldn't been
>> point's interface implementation because it isn't a point. Fe. in
>> Photon Mapping algorithm photons are used. They are point objects but
>> aren't
>> points, they have some position but may have some other nD attributes
>> direction, color, etc.
>>
>> But, I'm open to suggestion.
>
> You can use adaptor traits similar to what boost geometry does to avoid the wrapper class.
>

With points it's ok, since there will be only one instance of a point in
the tree. User may still want to store them in different container and
use kdt just for searching purposes but he may just use some wrapper.
With big points (small photons will have around 28B) he may use a
wrapper too.

The problem is when you take volume objects into consideration. Volume
object may be 'present' in some number of kd-tree nodes. So you must
have pointers/references in there.
And the 'global' question is, what to treat kd-tree like? As a container
which purpose is to store points/volumes and provide fast searching or
maby as an additional searching accelerator.
If it should be the the 'main' container to store objects, there should
be some container internally in the volume kd-tree and the tree
structure should contain pointers/references/indexes. So I thought that
this container should be managed by the user and the tree should be just
a search accelerator.
The main danger I see is that if container is changed, user must
remember to create new kd-tree.
Another thing is that volume objects may be a lot bigger than points,
fe. it may be 3D meshes.

>> // find one closest point, max distance = 10
>> point_object p;
>> bool found = kdt.find(p, point_type(...), 10);
>>
>> // find max 5 closest points, initial max distance = 10
>> // output: number of pointsfound and distance to the furthest point
>> std::vector<point_object> fv;
>> fv.reserve(5);
>> std::pair<size_t, value_type> found_data =
>> kdt.find(std::back_inserter(fv), point_type(...), 10, 5);
>> // or
>> value_type max_distance;
>> size_t found_points = kdt.find(std::back_inserter(fv), max_distance,
>> point_type(...), 10, 5);
>
> I wouldn't bother returning the max distance found. Let the user derive it from the points returned if they want it. It complicates the interface and I'm not sure how often it is useful.

Yes it complicates but it would be nice to return it somehow since it's
calculated 'inside' allready and may be used in algorithms where the
area of found points is used. Maby just implement 2 versions of this method?

Btw, there might be fe. a search area functor. This will make possible
to specify the size and the shape of the search area.

>> I used rebuild(), not (rebalance()) because there are some splitting
>> rules producing nice, not balanced kd-trees (see:
>> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.39.8380).
>
> What happens if the user queries before after they have added but before they have built? Throw an exception? Return nothing? Automatically rebuild? I wouldn't provide build, rebuild or rebalance interfaces. It should be like the red-black tree, we never tell a std::set to rebalance itself. If the user wants to rebuild the tree they can construct a new one with all the points.

It makes it even simpler.

> What about if you provided lower_bound and upper_bound:
>
> kdtree::iterator itr = kdt.lower_bound(point, 0.0f);
> kdtree::iterator itr2 = kdt.upper_bound(point, 0.0f);
> std::vector<point> pts(itr, itr2);
> kdt.erase(itr, itr2);
>
> You may also sometimes want all the points in the tree and would just use begin() and end(). This style of interface has the advantage that it is maximally intuitive to people who are used to the stl. It may have the drawback of being challenging to implement and perhaps execute slower than back inserter style. There are iterator tricks used and facilitated by boost to make it easier and more efficient to provide such an interface as a wrapper around the back inserter interface, for example.
>
> Now, let's imagine that instead of euclidean the user wants L distance metric. You can provide a version of lower_bound and upper_bound that take a distance metric functor as a third argument.

Yes, I like it.

Regards,
Adam

Adam Wulkiewicz

unread,
Jul 30, 2010, 6:29:22 PM7/30/10
to bo...@lists.boost.org
Barend Gehrels wrote:
> Hi Adam,
>
> First I'm very interested in your library.
>
> Is it already uploaded into Vault? Could you publish the URL? If you
> didn't upload it already, you might consider to use the Boost SVN
> sandbox instead.
>
>>
>> I think that the library I'd like to develop don't fit in
>> Boost.Geometry. The purposes of both are different. Space partitioning
>> (in mathematical meaning), is a separate problem. Functionality like
>> this may be used in library like Boost.Geometry but might be used with
>> a large number of different libraries as well. As I see it, it should
>> be a set of general, n-dimensional solutions. Therefore I think, that
>> it should be a separate library.
>
> I probably agree with this. However, I would like to investigate if a
> coupling of these two libraries is possible, to provide the users of
> Boost.Geometry with a good spatial index which works for them by default
> and which can thus also be used for other purposes.
> So Boost.Geometry might follow your library, or you might use some
> concepts of Boost.Geometry, but I would like to see the sources to say
> more about this.

First of all, I must explain myself. I walk in here and write that I'd
like to desing a library. I don't want you to change something. I'd like
to design a library which you might use if you haven't implement some
container by yourself. Moreover, I think it should be separate from
Boost.Geometry. What do you think about it? Have you any objections?
Maby you have some plans and want to do it by yourself?

If no, I think, it's the best to use concepts from Boost.Geometry
because it's an existing library. Or to use some adaptor.

But the most important thing is that if you want to make building of
arbitrary kd-tree possible boost::geometry::point should provide a
method returning given coordinate not in compile-time.

Still, it is possible to create kd-tree using coordinates indexed in
compile-time and I don't know if it will be a lot worse than the other
possible trees. The problem may occur for volume objects.

I might implement simple version of kd-tree for points using your
interface and show it to you if you like.
Later I'd like to use adaptors and to implement some tree building
traits. Different building algorithms might use different interfaces
(compile-time or not).

Regards,
Adam

Simonson, Lucanus J

unread,
Jul 30, 2010, 6:57:29 PM7/30/10
to bo...@lists.boost.org
Adam Wulkiewicz wrote:
> But the most important thing is that if you want to make building of
> arbitrary kd-tree possible boost::geometry::point should provide a
> method returning given coordinate not in compile-time.

This did come up in design discussions and I recall someone stating the need citing specifically nD spatial query as a place where they need it. The desire to make the interface minimal means that as a practical matter the designer of the interface needs to choose either runtime or compile time. Boost.Geometry chose compile time, I chose run time. The argument against runtime is that it may cause access to be slow due to unfortunate things like:

struct pt { double x, y, z; };

double get_coord(const pt& p, unsigned int axis) {
if(axis == 0)
return p.x;
if(axis == 1)
return p.y;
return p.z;
}

double coord = get_coord(pt, 0); //does the if statement compile away?

I argue that the accessor should inline and constant propigation will cause the if statement to dissolve away resulting in zero runtime overhead.

Sadly, this if tree type thing is what you have to write to use the compile time accessor with a runtime parameter, which the compiler can't compile away, in general if the runtime parameter is not a constant.

double get_coord(const pt& p, unsigned int axis) {
if(axis == 0)
return get<0>(p);
if(axis == 1)
return get<1>(p);
return get<2>(p);
}

Even though point may be defined as:

typedef boost_array<double, 3> pt;

and provide an operator[] you end up paying for branch or jump table access to its coordinates in addition to the array lookup for no good reason. Thinking through what the compiler would have to do to optimize away the logic tree or switch statement I find it pretty unlikely.

I don't think it is too late for Boost.Geometry to change their interface, but I do recognize that it entails a lot of work. I encapsulate the use of the adaptor in a free function so that I can change the interface and don't need to change every place where I use it, but I don't know if Boost.Geometry uses the adaptor directly everywhere or not.

Regards,
Luke

Barend Gehrels

unread,
Aug 2, 2010, 5:13:56 AM8/2/10
to bo...@lists.boost.org

> First of all, I must explain myself. I walk in here and write that I'd
> like to desing a library. I don't want you to change something. I'd
> like to design a library which you might use if you haven't implement
> some container by yourself. Moreover, I think it should be separate
> from Boost.Geometry. What do you think about it? Have you any
> objections? Maby you have some plans and want to do it by yourself?
Yes, I think that it is good to be separate, and I don't have objections.
What you said, it can be used for non-geometry as well.

And yes, we did have plans, actually we do have a spatial index based on
R-Tree which lives as an extension within Boost.Geometry.

I don't think it inconvenient to have different (spatial) indexes, I
think it is good. However, it would be very convenient to make the
interfaces similar (or equal).

The interface of the current R-Tree still has to be redesigned so it is
not too late, for us.

>
> If no, I think, it's the best to use concepts from Boost.Geometry
> because it's an existing library. Or to use some adaptor.

Yes, I would advocate using our concepts. The point concept is not
complicated and points are easy to implement and adapt to our concepts.
The point you have in the library I saw last week can be adapted as well.

>
> But the most important thing is that if you want to make building of
> arbitrary kd-tree possible boost::geometry::point should provide a
> method returning given coordinate not in compile-time.

See also Luke's response on this. We have compile time access to select
by coordinate index but, of course, you could add a wrapper in between
translating from runtime to compiletime.

>
> Still, it is possible to create kd-tree using coordinates indexed in
> compile-time and I don't know if it will be a lot worse than the other
> possible trees. The problem may occur for volume objects.

I think it would be good.


>
> I might implement simple version of kd-tree for points using your
> interface and show it to you if you like.

Yes, I would like that :-)


> Later I'd like to use adaptors and to implement some tree building
> traits. Different building algorithms might use different interfaces
> (compile-time or not).

We have for several free functions also different internal algorithms,
which can be specified as an extra argument. We call that a strategy,
there are concepts for different strategies (distance etc.). You might
look if such a system would be convenient for you as well.


Regards, Barend

Adam Wulkiewicz

unread,
Aug 7, 2010, 9:03:55 AM8/7/10
to bo...@lists.boost.org
Hi,

I couldn't decide what point concept is the best so I've implemented
kd-tree building for 3 kind of points to perform some tests and to work
out the basic structure of the library, data types adaptors etc.

These are 'internal' point concepts:
1. compile-time point
point_category = compiletime_point_tag
coordinate_type
coordinate_count
coordinate_type const & get<K>()
void set<K>(v)
2. runtime-time point
point_category = runtime_point_tag
coordinate_type
coordinate_count
coordinate_type const & get(k)
void set(k, v)
3. dynamic point
point_category = dynamic_point_tag
coordinate_type
coordinate_type const & get(k)
void set(k, v)
coordinate_count()

I don't know which one is the best for space partitioning (ct or rt).
There are some 'run-time' data structures (different BSP-tree, including
kd-tree) and there are 'compile-time' data structures
(quadtree/octree/..., regular grid, hierarchical grid, BVH tree).
'Run-time' and 'compile-time' means specific access to the coordinates.
Building of all data structures is performed in run-time because
coordinates values are set in runtime.

Kd-tree (kdtree.hpp) has 2 template parameters Point and PointAdaptor.
The default PointAdaptor is point_traits<Point>::point_adaptor.

One may write adaptor modeling any of concepts above. There are some
adaptors written allready (types.hpp). And if he doesn't want to pass
PointAdaptor each time he creates kd-tree, he may specialize point_traits.

So, kd-tree should work for:
user-defined types
boost::geometry - concept 1
boost::array - concept 2
std::vector, std::deque, uBlas vectors - concept 3

Example:

Lets assume that point is defined as follows:

template <typename V, size_t D>
class runtime_point
{
public:
typedef V coordinate_type;
static const size_t coordinate_count = D;
inline coordinate_type const& operator[](size_t i);
inline coordinate_type & operator[](size_t i);
private:
coordinate_type c[coordinate_count];
};

The adaptor looks like this:

template <typename Point>
struct runtime_point_indexop : private Point
{
typedef runtime_point_tag point_category;
typedef typename Point::coordinate_type coordinate_type;
static const size_t coordinate_count = Point::coordinate_count;

inline runtime_point_indexop() {}
inline runtime_point_indexop(Point const& pt) : Point(pt) {}

inline coordinate_type const& get(size_t i) const {
return Point::operator[](i); }
inline void set(size_t i, coordinate_type const& v) {
Point::operator[](i) = v; }
};

Btw, it's the default adaptor. One might write:

typedef runtime_point<float, 2> rtp;
std::vector<rtp> rtv;
// ... //
space::kdtree<rtp> rtkdt(rtv.begin(), rtv.end());

Or explicitly:

typedef space::runtime_point_indexop<rtp> rtpadapt;
space::kdtree<rtp, rtpadapt> rtkdt(rtv.begin(), rtv.end());

Or specialize point_traits in the first place:

namespace space {
template <typename T>
struct point_traits<rtp>
{
typedef runtime_point_indexop<rtp> point_adaptor;
};
}

And then:

space::kdtree<rtp> rtkdt(rtv.begin(), rtv.end());

In main.cpp there are:
- 2 user-defined point types defined:
* compiletime_point (boost::geometry::point interface)
* runtime_point (point with operator[] instead of get/set)
- point_traits specializations for:
* boost::array
* std::vector
- kd-tree build time tests

On my computer building takes (Core 2 Duo 2GHz, Release build, 50000
points, 2 coordinates per point, floats, full optimization):
MSVC10 GCC3.4.5
runtime_point 0.018s 0.023s
compiletime_point 0.017s 0.017s
boost::array 0.019s 0.022s
std::vector 9.5s 0.278s

runtime_point
adapted from 0.022s 0.023s
compiletime_point

compiletime_point
adapted from 0.017s 0.017s
runtime_point

Btw, the results for std::vector are wierd.

I think that it is nice to have space partitioning working for many
point types (even for std::vectors, uBlas etc). But there must be 3
different kind of algorithms for some cases.
It's not bad to convert from one representation to another. It's obvious
that it's the best to have algorithms working on compile-time points and
convert run-time points when necessary.

What do you think about it?

And what do you think about my data types, adaptors etc?

Regards,
Adam

space.zip

Mathias Gaunard

unread,
Aug 7, 2010, 10:29:41 AM8/7/10
to bo...@lists.boost.org
Le 29/07/2010 20:07, Simonson, Lucanus J wrote:
> Your interface seems a little off to me, not what I was expecting.

I have to agree, and add more to that.

What a spatial index does is fairly simple (to the user). Its interface
should be the same as that of std::multiset, but instead of using
operator<, it should use intersection (which, like std::multiset, should
have a default but be changeable to an arbitrary function object).
You're going to want to change std::multiset<T>::find slightly to allow
arbitrary types, not just T (often considered an arbitrary limitation
and defect by people), and to return a range instead of an iterator; and
you might want to lazily evaluate that range as well, hence providing
the incremental search you put on your to-do list.

The default intersection function would obviously be a generic one from
Geometry.
I'm not sure Geometry has such a generic intersection function; but if
it doesn't, that where it needs to be.

Volume queries, ray queries etc. don't need anything special. They
should work with the generic intersection function just like the rest,
no need to special case anything.

There is no need to introduce any new point/object concept, nor new nD
containers. The spatial index should assume the very least information
it needs in order to be truly generic.

All the implementation details you might need should be in Geometry, and
the default intersection function should be from Geometry as well (if
they're not, that's where they should be anyway), so I don't see why you
would want to make it a separate library, except out of craziness.

Dynamic typing should not be the responsability of the index, so it
should work with objects of type T, not pointers, and let the user deal
with polymorphism, if he needs it, by instantiating the index with smart
pointers or type erasure types.
Memory allocation is purely dealt with by the allocator, so seeing "use
of pool allocators" in the to-do list worries me.

I would also like to see an intrusive version of the index, that doesn't
do any memory allocation. The non-intrusive version can be built on top
of the intrusive one, but the opposite isn't possible.

So my opinion is that the whole design is done utterly wrong at the moment.
I personally won't consider using it or promoting unless it is exactly
as I described (intrusive support is optional).

Mathias Gaunard

unread,
Aug 7, 2010, 11:13:15 AM8/7/10
to bo...@lists.boost.org
Just adding a few things to what I said:

07/08/2010 15:29, Mathias Gaunard wrote:

> What a spatial index does is fairly simple (to the user). Its interface
> should be the same as that of std::multiset, but instead of using
> operator<, it should use intersection (which, like std::multiset, should
> have a default but be changeable to an arbitrary function object).

While intersection test should be enough for searching, you're going to
need other primitives for maintaining the data structure.
They probably should be template parameters as well.

For r-trees, for example, you're going to need a function that gives the
axis-aligned bounding box for a T, and Geometry does provide that with
the 'envelope' function.
But in a typical scenario, you're going to want to precompute all the
bounding boxes for all objects before you index them, so having a
customized function that just returns the pre-calculated envelope seems
needed.


> You're going to want to change std::multiset<T>::find slightly to allow
> arbitrary types, not just T (often considered an arbitrary limitation
> and defect by people), and to return a range instead of an iterator; and
> you might want to lazily evaluate that range as well, hence providing
> the incremental search you put on your to-do list.

You may want to add another find function that sorts the results by
distance as well, for nearest neighbour queries.

Adam Wulkiewicz

unread,
Aug 8, 2010, 4:32:54 PM8/8/10
to bo...@lists.boost.org
Mathias Gaunard wrote:
> Le 29/07/2010 20:07, Simonson, Lucanus J wrote:
>> Your interface seems a little off to me, not what I was expecting.
>
> I have to agree, and add more to that.

Yes, I wrote earlier that the interface should be redesigned and that
I'd like to write the library, not that I have one fully functional.
I've decided to show you the code as it is because I knew that you'll
have many objections, different people will have different views and
because I don't know about your plans and what is needed. I'm gathering
information right now.

> There is no need to introduce any new point/object concept, nor new nD
> containers. The spatial index should assume the very least information
> it needs in order to be truly generic.
>
> All the implementation details you might need should be in Geometry, and
> the default intersection function should be from Geometry as well (if
> they're not, that's where they should be anyway), so I don't see why you
> would want to make it a separate library, except out of craziness.

It depends, how you see these things. You may want spatial index or
various spacial indexes in Boost.Geometry. But you may also want
multi-purpose space partitioning containers enclosed in separate
library. It's a matter of view. Where user should search this
functionality for? And for what data should these containers work for?

> Memory allocation is purely dealt with by the allocator, so seeing "use
> of pool allocators" in the to-do list worries me.

There are large number of different data structures and there may be fe.
a massive tree with large number of nodes created in the building
process and it's better to have pools of nodes than to create each one
separately.

Mathias Gaunard

unread,
Aug 8, 2010, 5:59:48 PM8/8/10
to bo...@lists.boost.org
Le 08/08/2010 21:32, Adam Wulkiewicz wrote:

> It depends, how you see these things. You may want spatial index or
> various spacial indexes in Boost.Geometry. But you may also want
> multi-purpose space partitioning containers enclosed in separate
> library. It's a matter of view. Where user should search this
> functionality for? And for what data should these containers work for?

Boost.Geometry works with un-intrusive concepts, and allows to use
arbitrary types.


> There are large number of different data structures and there may be fe.
> a massive tree with large number of nodes created in the building
> process and it's better to have pools of nodes than to create each one
> separately.

I don't understand your point.
Your container wants to allocate memory, it uses its allocator
parameter. The user wants to change the way memory is allocated, he
changes that parameter.

You can indeed allocate all the memory in one go at construction, I
don't see why that requires a pool allocator.

Adam Wulkiewicz

unread,
Aug 9, 2010, 12:35:20 PM8/9/10
to bo...@lists.boost.org
Mathias Gaunard wrote:
> Le 08/08/2010 21:32, Adam Wulkiewicz wrote:
>
>> It depends, how you see these things. You may want spatial index or
>> various spacial indexes in Boost.Geometry. But you may also want
>> multi-purpose space partitioning containers enclosed in separate
>> library. It's a matter of view. Where user should search this
>> functionality for? And for what data should these containers work for?
>
> Boost.Geometry works with un-intrusive concepts, and allows to use
> arbitrary types.

I'm talking about something different. There are many fields where
spacial indexes are used. Does user who wants to implement an algorithm
extracting features from the picture or searching data in the database
should look for needed data structures somwhere deep in the geometry
library? Or maby, the geometry library should use functionality provided
by the space partitioning library (both using the same concepts)?

>> There are large number of different data structures and there may be fe.
>> a massive tree with large number of nodes created in the building
>> process and it's better to have pools of nodes than to create each one
>> separately.
>
> I don't understand your point.
> Your container wants to allocate memory, it uses its allocator
> parameter. The user wants to change the way memory is allocated, he
> changes that parameter.
>
> You can indeed allocate all the memory in one go at construction, I
> don't see why that requires a pool allocator.

Yes, you're right. Although, there may be more than one allocators used
in the tree building algorithm. There may be allocators for tree nodes,
temporary objects at each recursive step and for objects in leafs. I
thought about implementing this kind of container later and wanted to
select proper allocators internally automatically (because it
complicates the interface and the user must know about the internal
details) but this is just a proposition.

Barend Gehrels

unread,
Aug 10, 2010, 3:35:58 AM8/10/10
to bo...@lists.boost.org

Adam Wulkiewicz wrote:
> Mathias Gaunard wrote:
>> Le 08/08/2010 21:32, Adam Wulkiewicz wrote:
>>
>>> It depends, how you see these things. You may want spatial index or
>>> various spacial indexes in Boost.Geometry. But you may also want
>>> multi-purpose space partitioning containers enclosed in separate
>>> library. It's a matter of view. Where user should search this
>>> functionality for? And for what data should these containers work for?
>>
>> Boost.Geometry works with un-intrusive concepts, and allows to use
>> arbitrary types.
>
> I'm talking about something different. There are many fields where
> spacial indexes are used. Does user who wants to implement an
> algorithm extracting features from the picture or searching data in
> the database should look for needed data structures somwhere deep in
> the geometry library? Or maby, the geometry library should use
> functionality provided by the space partitioning library (both using
> the same concepts)?
I don't think these are structures deep in the geometry library. It can
be the structures of the users, something in a picture or database.
Which is "adapted" to the Boost.Geometry Point Concept (this is normally
done by one Macro call).

The spatial index could then use the Point Concept of Boost.Geometry,
and both libraries will work together happily.

Regards, Barend

Simonson, Lucanus J

unread,
Aug 11, 2010, 1:08:13 PM8/11/10
to bo...@lists.boost.org

Why would a person doing image processing or database development think that their need for spacial indexes are somehow not a need for computational geometry? I'm quite sure the database developers feel the spatial indexes they already have are superior to anything we are using in VLSI or GIS, and they are undoubtedly right to the extent that their data structures are better suited to the needs of their application. Their spacial indexes would likely not be suitable for VLSI compared to our own data structures. The performance of a spacial index is heavily dependent on the characteristics of the data it stores. I hope we all understand that up front because it will cut down on silly arguments about performance later on. If Adam said he was writing a database building blocks library that was going to feature spacial indexese tailored to meet the needs of database query then Boost.Geometry would be a bad place to put it. However, Adam is saying he plans to implement a gen
eric spacial indexes library with several different types of data structures and wants it to be used in multiple application domains. Adam may not realize it, but computational geometry is not an application domain. There are many application domains where computational geometry is used. Does a user who wants to implement an algorithm for extracting features from an image or searching for data in a Geospacial database want to look in the geometry library to find something they both need? Maybe not, but it is better to put those things that are needed in several application domains in a common library which is meaninfully named and easy to find instead of having a separate application domain specific library for each type of user. Geometry is the right name for a library that contains spacial indexes because spacial indexes are computational geometry unless they are otherwise encumbered with domain specific jargon in the interfaces or tailored to domain specific needs.

So now lets talk about software design. If spacial indexes will depend on the Boost.Geometry library for low level features like point concept and traits that we generally agree will be a good idea for interoperability and Boost.Geometry will depend on spacial indexes to implement high level algorithms like closest neighbor query (which they provide now, but would like to optimize by using a good spacial index) then spacial indexes should go in Boost.Geometry to avoid cyclic dependency between libraries. The alternative is to partition Geometry into algorithms that do depend on spacial indexes, which would go in the spacial indexes library, and algorithms that do not, which would stay in Geometry. Such a partition is abitrary and nonsensical.

I don't think it would be inconvenient for Adam to work with the Geometry library authors and I think he could benefit from their experience and direct help with the implementation of his vision for a generic spacial indexes capability in boost. Given recent experience I don't think the community would accept spacial indexes as its own library now that Geometry has been accepted.

Regards,
Luke

Adam Wulkiewicz

unread,
Aug 12, 2010, 10:05:40 AM8/12/10
to bo...@lists.boost.org
Simonson, Lucanus J wrote:
> I don't think it would be inconvenient for Adam to work with the Geometry library authors and I think he could benefit from their experience and direct help with the implementation of his vision for a generic spacial indexes capability in boost. Given recent experience I don't think the community would accept spacial indexes as its own library now that Geometry has been accepted.

Yes, you're right. Thank you all for your advices. I'd be glad to work
on Boost.Geometry. I've allready started to implement kd-tree using
Geometry concepts. When it's finished I'll show it to Barend and see
what he thinks about it.

Regards,
Adam

Barend Gehrels

unread,
Aug 13, 2010, 6:18:47 AM8/13/10
to bo...@lists.boost.org
Hi Adam,

> Yes, you're right. Thank you all for your advices. I'd be glad to work
> on Boost.Geometry. I've allready started to implement kd-tree using
> Geometry concepts. When it's finished I'll show it to Barend and see
> what he thinks about it.
That's good news, welcome!

Regards, Barend

Simonson, Lucanus J

unread,
Aug 13, 2010, 2:12:34 PM8/13/10
to bo...@lists.boost.org
Barend Gehrels wrote:
> Hi Adam,
>> Yes, you're right. Thank you all for your advices. I'd be glad to
>> work on Boost.Geometry. I've allready started to implement kd-tree
>> using Geometry concepts. When it's finished I'll show it to Barend
>> and see what he thinks about it.
> That's good news, welcome!

Adam will probably want to subscribe to the Geometry dev list. I would like to do so as well and join the project. What is the link to the Geometry project that we can take the next steps?

Thanks,
Luke

Barend Gehrels

unread,
Aug 14, 2010, 4:43:39 AM8/14/10
to bo...@lists.boost.org
Hi Luke,

Thanks for your comments and suggestions.

Simonson, Lucanus J wrote:
> Adam will probably want to subscribe to the Geometry dev list. I would like to do so as well and join the project. What is the link to the Geometry project that we can take the next steps?
>

Most information can be found here:
http://trac.osgeo.org/ggl

with also links to the mailing list, documentation and SVN repository.

Regards, Barend

Arash Partow

unread,
Aug 14, 2010, 5:53:56 AM8/14/10
to bo...@lists.boost.org
On 13/08/2010 12:05 AM, Adam Wulkiewicz wrote:
> I've allready started to implement kd-tree using Geometry concepts. When it's finished I'll show it to Barend and see what he thinks about it.

Could you please clarify: Does this mean if I just want to use the kd-tree (not necessarily for anything comp-geo related), I'll now have a dependency of boost.geometry?

Mathias Gaunard

unread,
Aug 14, 2010, 6:41:49 AM8/14/10
to bo...@lists.boost.org
09/08/2010 17:35, Adam Wulkiewicz wrote:

> Yes, you're right. Although, there may be more than one allocators used
> in the tree building algorithm. There may be allocators for tree nodes,
> temporary objects at each recursive step and for objects in leafs.

I don't see why you'd need dynamic memory allocation other than for the
nodes, assuming you're storing values and not not trying to store
polymorphic objects through pointers.

Adam Wulkiewicz

unread,
Aug 15, 2010, 5:46:13 AM8/15/10
to bo...@lists.boost.org
Arash Partow wrote:
> On 13/08/2010 12:05 AM, Adam Wulkiewicz wrote:
>> I've allready started to implement kd-tree using Geometry concepts.
>> When it's finished I'll show it to Barend and see what he thinks about
>> it.
>
> Could you please clarify: Does this mean if I just want to use the
> kd-tree (not necessarily for anything comp-geo related), I'll now have a
> dependency of boost.geometry?

Yes. Kd-tree will be probably somewhere in boost::geometry namespace and
to use it you'll have to use point which has boost::geometry::point
interface. If you didn't want to use geometry::point, you could just
adapt it.

Adam Wulkiewicz

unread,
Aug 15, 2010, 6:07:15 AM8/15/10
to bo...@lists.boost.org
Mathias Gaunard wrote:
> 09/08/2010 17:35, Adam Wulkiewicz wrote:
>
>> Yes, you're right. Although, there may be more than one allocators used
>> in the tree building algorithm. There may be allocators for tree nodes,
>> temporary objects at each recursive step and for objects in leafs.
>
> I don't see why you'd need dynamic memory allocation other than for the
> nodes, assuming you're storing values and not not trying to store
> polymorphic objects through pointers.

It strongly depends on used algorithm if you need it or not. If you use
some kind of internal, temporary objects you need it.
See e.g.
http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or
http://kesen.realtimerendering.com/Intel-EG07.pdf

Mathias Gaunard

unread,
Aug 15, 2010, 12:25:58 PM8/15/10
to bo...@lists.boost.org
Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :

> It strongly depends on used algorithm if you need it or not. If you use
> some kind of internal, temporary objects you need it.
> See e.g.
> http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or

> http://kesen.realtimerendering.com/Intel-EG07.pdf=

I may be missing something here, but I don't see why the temporary
objects in question cannot be stored on the execution stack.

Adam Wulkiewicz

unread,
Aug 15, 2010, 2:49:22 PM8/15/10
to bo...@lists.boost.org
Mathias Gaunard wrote:
> Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :
>
>> It strongly depends on used algorithm if you need it or not. If you use
>> some kind of internal, temporary objects you need it.
>> See e.g.
>> http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or
>> http://kesen.realtimerendering.com/Intel-EG07.pdf=
>
> I may be missing something here, but I don't see why the temporary
> objects in question cannot be stored on the execution stack.

Implementing dynamic memory allocation on the stack could be quite tricky.

Barend Gehrels

unread,
Aug 16, 2010, 4:04:21 AM8/16/10
to bo...@lists.boost.org
hi Adam,

>> Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :
>>
>>> It strongly depends on used algorithm if you need it or not. If you use
>>> some kind of internal, temporary objects you need it.
>>> See e.g.
>>> http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or
>>> http://kesen.realtimerendering.com/Intel-EG07.pdf=
>>
>> I may be missing something here, but I don't see why the temporary
>> objects in question cannot be stored on the execution stack.
>
> Implementing dynamic memory allocation on the stack could be quite
> tricky.

Thanks for the links.

In Boost.Geometry we do not allocate dynamic memory ourselves; all
polygons, rings, lines, multi-polygons etc are stored as an std::
container, which can be selected at compile-time by the library user (as
a template-template-parameter). Besides that it is not part of the
concept so library users might select another solution.

Don't know if this is possible in this case, but if it is possible, I
would prefer that approach.

Regards, Barend

Adam Wulkiewicz

unread,
Aug 16, 2010, 6:27:09 AM8/16/10
to bo...@lists.boost.org
W dniu 2010-08-16 10:04, Barend Gehrels napisal(a):

> hi Adam,
>
>
>
>>> Le 15/08/2010 11:07, Adam Wulkiewicz a écrit :
>>>
>>>> It strongly depends on used algorithm if you need it or not. If you use
>>>> some kind of internal, temporary objects you need it.
>>>> See e.g.
>>>> http://www.cgg.cvut.cz/members/havran/ARTICLES/ingo06rtKdtree.pdf or
>>>> http://kesen.realtimerendering.com/Intel-EG07.pdf=
>>>
>>> I may be missing something here, but I don't see why the temporary
>>> objects in question cannot be stored on the execution stack.
>>
>> Implementing dynamic memory allocation on the stack could be quite
>> tricky.
>
> Thanks for the links.
>
> In Boost.Geometry we do not allocate dynamic memory ourselves; all
> polygons, rings, lines, multi-polygons etc are stored as an std::
> container, which can be selected at compile-time by the library user (as
> a template-template-parameter). Besides that it is not part of the
> concept so library users might select another solution.
>
> Don't know if this is possible in this case, but if it is possible, I
> would prefer that approach.
>
> Regards, Barend

I move this discusion to the ggl list.

Regards,
Adam

Gaetano Mendola

unread,
Apr 10, 2011, 11:48:47 AM4/10/11
to bo...@lists.boost.org, Eric Niebler
On 27/07/2010 15:04, Eric Niebler wrote:
> On 7/26/2010 7:08 PM, Gaetano Mendola wrote:
>> On 07/26/2010 08:52 PM, Eric Niebler wrote:
>>> In this case, you would open a feature request on svn.boost.org ("New
>>> Ticket"), assign to me, and attach your patch. If it looks good, I'll
>>> merge it. Your patch should have docs and tests. Don't worry about
>>> formatting the docs. I can do that. I wouldn't ask you to set up the doc
>>> build chain; it's a headache.
>>
>> Done.
>
> (For those playing along at home, the ticket is here:
> https://svn.boost.org/trac/boost/ticket/4471)
>
> Thanks. This needs documentation, though. Like I said, I don't care
> about formatting, just tell what what text you'd like to see inserted
> where in the docs and I'll do it.

Hi all,
I see the ticket for the kahan accumulator:

https://svn.boost.org/trac/boost/ticket/4471

is still "open", do you need something else to be done?


Regards
Gaetano Mendola

Eric Niebler

unread,
Apr 12, 2011, 1:36:31 AM4/12/11
to Gaetano Mendola, Boost mailing list, Matthias Troyer
I confess I haven't given this library the attention it needs. It's due
to the fact that I have many competing demands for my time, and also
because I'm not actually a statistician -- Matthias Troyer did that
part, but I know he's busy, too -- so I don't really feel qualified to
make certain changes. This is an uncomfortable position to be in, so I
figured I'd ask for help. So, is there anybody with a mathematical bent
and an interest in Boost.Accumulators who would like to help out with
the backlog of bugs and patches to this library? I would stay on as
assistant maintainer to help with the tricky meta-programming.

Gaetano, see below.

On 4/10/2011 10:48 PM, Gaetano Mendola wrote:
> On 27/07/2010 15:04, Eric Niebler wrote:
>> On 7/26/2010 7:08 PM, Gaetano Mendola wrote:
>>> On 07/26/2010 08:52 PM, Eric Niebler wrote:
>>>> In this case, you would open a feature request on svn.boost.org ("New
>>>> Ticket"), assign to me, and attach your patch. If it looks good, I'll
>>>> merge it. Your patch should have docs and tests. Don't worry about
>>>> formatting the docs. I can do that. I wouldn't ask you to set up the
>>>> doc
>>>> build chain; it's a headache.
>>>
>>> Done.
>>
>> (For those playing along at home, the ticket is here:
>> https://svn.boost.org/trac/boost/ticket/4471)
>>
>> Thanks. This needs documentation, though. Like I said, I don't care
>> about formatting, just tell what what text you'd like to see inserted
>> where in the docs and I'll do it.
>
> Hi all,
> I see the ticket for the kahan accumulator:
>
> https://svn.boost.org/trac/boost/ticket/4471
>
> is still "open", do you need something else to be done?

I don't think you ever responded to my request to add the needed docs to
the patch, did you?

--
Eric Niebler
BoostPro Computing
http://www.boostpro.com

Gaetano Mendola

unread,
Apr 12, 2011, 7:09:31 AM4/12/11
to Eric Niebler, Boost mailing list, Matthias Troyer
My reply at the end.

Yes I did in that thread.

Gaetano

--
cpp-today.blogspot.com

Francesco Basile

unread,
Apr 27, 2011, 6:17:05 AM4/27/11
to bo...@lists.boost.org
Erich, I both have an interest Boost.Accumulator and I have a
mathematical-physics background. I'm not an expert in meta-programming,
but, thanks to your help offer, it should not be a problem.
Francesco

Reply all
Reply to author
Forward
0 new messages