[boost] [flat_map] Any interest in a flat_map variant that is much faster on insertion/erase at the price of a slower lookup?

515 views
Skip to first unread message

Giacomo Drago

unread,
Mar 13, 2015, 3:52:14 AM3/13/15
to bo...@lists.boost.org
Hi

My library is not ready for a formal review process, and I'd like to ask
whether there is some potential interest first, thus saving everybody a
lot of time.

I have been working on a flat_map
(http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/flat_map.html)
variant that keeps some of the distinctive features of the original one
(contiguous storage/cache friendliness, zero memory
overhead/shrink_to_fit support, vector-based representation, etc...) but
makes insertion and erasure of the elements much faster, at the price of
a slower lookup and in-order traversal.

Explaining how it works requires time and it's not really necessary if
there is no interest in the first place (e.g. "thanks, but we have no
use for this"). Let's get straight to the benchmarks and to some
conclusions.

#elems: 50000
std::map:
Random insertion: 18.605 ms
Random find: 24.382 ms
Iteration: 5.658 ms
Random erase: 37.166 ms
experimental::flat_map:
Random insertion: 400.141 ms
Random find: 12.352 ms
Iteration: 7.099 ms
Random erase: 467.051 ms
boost::container::flat_map:
Random insertion: 611.678 ms
Random find: 6.293 ms
Iteration: 0.038 ms
Random erase: 615.311 ms

#elems: 200000
std::map:
Random insertion: 146.394 ms
Random find: 163.506 ms
Iteration: 21.973 ms
Random erase: 225.314 ms
experimental::flat_map:
Random insertion: 5068.93 ms
Random find: 65.618 ms
Iteration: 34.244 ms
Random erase: 6851.09 ms
boost::container::flat_map:
Random insertion: 10696.9 ms
Random find: 43.896 ms
Iteration: 0.254 ms
Random erase: 10604 ms


Of course one doesn't expect the random insertion to have a performance
that is comparable to the one of a tree-based map, like std::map.
However, my experimental::flat_map provides a better trade-off than
boost's flat_map for me, as random insertion is massively faster and in
the big picture of the application I am developing this is making a hell
of a difference.

Am I making any sense? Shall I polish the implementation, publish it,
and submit a proposal? I'd really appreciate your feedback.

Kind Regards
Giacomo


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Ion Gaztañaga

unread,
Mar 13, 2015, 4:23:26 AM3/13/15
to bo...@lists.boost.org
El 13/03/2015 a las 8:51, Giacomo Drago escribió:
> Hi
>
> My library is not ready for a formal review process, and I'd like to ask
> whether there is some potential interest first, thus saving everybody a
> lot of time.
>
> I have been working on a flat_map
> (http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/flat_map.html)
> variant that keeps some of the distinctive features of the original one
> (contiguous storage/cache friendliness, zero memory
> overhead/shrink_to_fit support, vector-based representation, etc...) but
> makes insertion and erasure of the elements much faster, at the price of
> a slower lookup and in-order traversal.

Just a question. Are your insertions one by one or by range?

Best,

Ion

Dominique Devienne

unread,
Mar 13, 2015, 4:28:02 AM3/13/15
to bo...@lists.boost.org
On Fri, Mar 13, 2015 at 9:22 AM, Ion Gaztañaga <igazt...@gmail.com> wrote:

> El 13/03/2015 a las 8:51, Giacomo Drago escribió:
>
>> My library is not ready for a formal review process, and I'd like to ask
>> whether there is some potential interest first, thus saving everybody a
>> lot of time.
>>
>> I have been working on a flat_map
>> (http://www.boost.org/doc/libs/1_57_0/doc/html/boost/
>> container/flat_map.html)
>> variant that keeps some of the distinctive features of the original one
>> (contiguous storage/cache friendliness, zero memory
>> overhead/shrink_to_fit support, vector-based representation, etc...) but
>> makes insertion and erasure of the elements much faster, at the price of
>> a slower lookup and in-order traversal.
>>
>
> Just a question. Are your insertions one by one or by range?
>

Might also be interesting to know if your perf numbers are based on a
primitive key or a UDT, and whether the relative performance changes
depending on key/value sizes. --DD

Giacomo Drago

unread,
Mar 13, 2015, 5:11:26 AM3/13/15
to bo...@lists.boost.org
On 2015-03-13 08:27, Dominique Devienne wrote:
> On Fri, Mar 13, 2015 at 9:22 AM, Ion Gaztañaga <igazt...@gmail.com> wrote:
>>
>> Just a question. Are your insertions one by one or by range?
>>
>
> Might also be interesting to know if your perf numbers are based on a
> primitive key or a UDT, and whether the relative performance changes
> depending on key/value sizes. --DD


Insertions are one by one, and they are random with respect to the order
of the keys. Same applies for lookups and erasures. No cheating.

I will try with different keys that are not PDOs and have non-trivial
comparators, but I expect the benchmarks to reflect my previous
findings. The bottom line is that it makes fewer comparisons and swaps
when inserting and removing, and more comparisons when looking up.
Elements are not *quite* sorted into the storage vector, of course.

The main use case I see for this data structure - and I am already using
it even though not in a production environment - is that you may need a
very space-efficient associative, sorted container that is not so
painfully slow when it comes to insert/erase elements.

I'll send more benchmarks. Any interest in the general design and the
source code (not "boostified" yet)?

Best
Giacomo

Phil Endecott

unread,
Mar 13, 2015, 9:38:17 AM3/13/15
to bo...@lists.boost.org
Hi Giacomo,

Giacomo Drago wrote:
> I have been working on a flat_map
> (http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/flat_map.html)
> variant that keeps some of the distinctive features of the original one
> (contiguous storage/cache friendliness, zero memory
> overhead/shrink_to_fit support, vector-based representation, etc...) but
> makes insertion and erasure of the elements much faster, at the price of
> a slower lookup and in-order traversal.
>
> Explaining how it works requires time and it's not really necessary

No, "how it works" is the first thing I want to know! It's unlikely that
you've invented something completely new...

> Let's get straight to the benchmarks

Rather than quantified benchmarks, to start with I'd like to know the
big-O complexity of the operations.

(Personally, I have a flat_map that does batched insertions i.e. the new
items are appended individually and then merged at the end of the batch.
This will have significant benefits for some workloads.)


Regards, Phil.

Ion Gaztañaga

unread,
Mar 13, 2015, 9:38:33 AM3/13/15
to bo...@lists.boost.org
El 13/03/2015 a las 10:11, Giacomo Drago escribió:
> Insertions are one by one, and they are random with respect to the order
> of the keys. Same applies for lookups and erasures. No cheating.

I didn't want to suggest any cheating, just trying to understand the
scenario. If insertions are by range, then flat_map can be optimized. If
insertions are one by one, it's more difficult, unless there are two
phases, first insertion (without lookups between insertions) and after
that, only searches. That scenario could be also optimized.

Best,

Ion

Ion Gaztañaga

unread,
Mar 13, 2015, 9:39:29 AM3/13/15
to bo...@lists.boost.org
El 13/03/2015 a las 14:38, Phil Endecott escribió:

> (Personally, I have a flat_map that does batched insertions i.e. the new
> items are appended individually and then merged at the end of the batch.
> This will have significant benefits for some workloads.)

Yes, this should be added to flat_map. The interface won't be very
pretty, but I would like to hear suggestions in this direction.

Best,

Ion

Giacomo Drago

unread,
Mar 13, 2015, 9:51:13 AM3/13/15
to bo...@lists.boost.org
On 2015-03-13 13:37, Ion Gaztañaga wrote:
> El 13/03/2015 a las 10:11, Giacomo Drago escribió:
>> Insertions are one by one, and they are random with respect to the order
>> of the keys. Same applies for lookups and erasures. No cheating.
>
> I didn't want to suggest any cheating, just trying to understand the
> scenario. If insertions are by range, then flat_map can be optimized. If
> insertions are one by one, it's more difficult, unless there are two
> phases, first insertion (without lookups between insertions) and after
> that, only searches. That scenario could be also optimized.

They are one by one, and can interleave in any possible way without
affecting performance. I didn't imply you were suggesting any cheating,
but I know that benchmarks can be crafted to make things look better
than they are. And you can do this without even knowing.

Best
GIacomo

Giacomo Drago

unread,
Mar 13, 2015, 9:58:33 AM3/13/15
to bo...@lists.boost.org
On 2015-03-13 13:38, Phil Endecott wrote:
> Hi Giacomo,
>
> No, "how it works" is the first thing I want to know! It's unlikely that
> you've invented something completely new...
Granted... I don't think I have invented anything new, but if I had
started describing the data structure (and I am not very good at it) I
would have got no answers. Raw milliseconds usually do the trick,
especially with C++ enthusiasts.
I am writing a PDF that explains how this thing works, but basically
it's a heap with a few tweaks.


>
>> Let's get straight to the benchmarks
>
> Rather than quantified benchmarks, to start with I'd like to know the
> big-O complexity of the operations.
The Big-O complexity of the operations is same or worse. Lookup is
O((log(n))^2), insertion/erasure is O(n), and full in-order traversal is
O(n (log(n))^2) as unfortunately you can't conveniently move to the next
element in the storage vector.


>
> (Personally, I have a flat_map that does batched insertions i.e. the new
> items are appended individually and then merged at the end of the batch.
> This will have significant benefits for some workloads.)

Indeed, that's the first thing I would do but I wanted to get better
insertions even when adding one element at a time, and interleaving
insertions with lookups, because that's what I do in my actual usage
scenario.

Thank you for your time!
Giacomo

Marcelo Zimbres

unread,
Mar 13, 2015, 10:24:49 AM3/13/15
to bo...@lists.boost.org
"Hi

My library is not ready for a formal review process, and I'd like to
ask whether there is some potential interest first, thus saving
everybody a lot of time.

I have been working on a flat_map
(http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/flat_map.html)
variant that keeps some of the distinctive features of the original
one (contiguous storage/cache friendliness, zero memory
overhead/shrink_to_fit support, vector-based representation, etc...)
but makes insertion and erasure of the elements much faster, at the
price of a slower lookup and in-order traversal."

Would it be easy for you to provide benchmarks of std::map with
boost::container::node_allocator available in latest boost release?
According to my own benchmarks it can improve performance a lot,
mainly when memory is likely to be fragmented.

Regards,
Marcelo

Peter Dimov

unread,
Mar 13, 2015, 11:04:17 AM3/13/15
to bo...@lists.boost.org
Ion Gaztañaga wrote:
> El 13/03/2015 a las 14:38, Phil Endecott escribió:
>
> > (Personally, I have a flat_map that does batched insertions i.e. the new
> > items are appended individually and then merged at the end of the batch.
> > This will have significant benefits for some workloads.)
>
> Yes, this should be added to flat_map. The interface won't be very pretty,
> but I would like to hear suggestions in this direction.

It doesn't have to have an interface, at least in principle. flat_map can
just append the new elements to the end, keeping them sorted, until it hits
a threshold, then merge. Lookups will first look at the new element range
(because those are likely to be in cache), if not found, at the old element
range.

You could add a "please merge now" hint, but it won't be required.

Actually, the new element range doesn't even need to be kept sorted, if the
merge threshold is low, although on the other hand this runs into op< vs
op== issues, so maybe not.

Of course, when the underlying vector runs out of capacity, a non-inplace
merge will be performed after reallocation.

Upon closer look on the subject line, did I just reinvent a wheel? :-)

Chris Glover

unread,
Mar 13, 2015, 11:43:07 AM3/13/15
to bo...@lists.boost.org
>
> Ion Gaztañaga wrote:
> > El 13/03/2015 a las 14:38, Phil Endecott escribió:
> >
> > > (Personally, I have a flat_map that does batched insertions i.e. the
> new
> > > items are appended individually and then merged at the end of the
> batch.
>
> On Fri, 13 Mar 2015 at 11:04 Peter Dimov <li...@pdimov.com> wrote:
> It doesn't have to have an interface, at least in principle. flat_map can
> just append the new elements to the end, keeping them sorted, until it hits
> a threshold, then merge. Lookups will first look at the new element range
> (because those are likely to be in cache), if not found, at the old element
> range.
>
>

It's possible to accomplish this today, but you need to do the buffering
yourself outside of the flat_map using the ordered_unique_range_t insert
method.

http://www.boost.org/doc/libs/1_57_0/doc/html/boost/container/flat_map.html#idp36621984-bb

I use this method a lot where I figure out what I am going to be inserting
before I do the insert, then I insert in one step. Of course, this assumes
you don't need to know what's in the map while you're also figuring out
what to put in the map.

-- chris

Giacomo Drago

unread,
Mar 13, 2015, 11:55:25 AM3/13/15
to bo...@lists.boost.org
On 2015-03-13 13:00, Marcelo Zimbres wrote:
> "Hi
> Would it be easy for you to provide benchmarks of std::map with
> boost::container::node_allocator available in latest boost release?
> According to my own benchmarks it can improve performance a lot,
> mainly when memory is likely to be fragmented.

Of course, I will do it ASAP.

Ion Gaztañaga

unread,
Mar 13, 2015, 12:40:33 PM3/13/15
to bo...@lists.boost.org
El 13/03/2015 a las 16:04, Peter Dimov escribió:
>
> It doesn't have to have an interface, at least in principle. flat_map
> can just append the new elements to the end, keeping them sorted, until
> it hits a threshold, then merge. Lookups will first look at the new
> element range (because those are likely to be in cache), if not found,
> at the old element range.

That would complicate the container, and break some invariants, like
"equal_range". flat_map is designed to cover an scenario for more
searches than insertions, maybe another container could be designed for
a different scenario.

> Actually, the new element range doesn't even need to be kept sorted, if
> the merge threshold is low, although on the other hand this runs into
> op< vs op== issues, so maybe not.

You've touched an interesting point. Sort+Merging could improve flat_map
performance in some other scenarios. The option to do a merge can be
added to a range insertion, current implementation has not very good
performance due to excessive data movement. In this scenario we could
back-insert, sort and merge. The problem is:

-> It requires additional memory (stable_sort is needed for
flat_multimap) to achieve O(N·log(N)). inplace_merge requires additional
memory to obtain O(N). A user might not want to pay for this (you might
need 2x temporary memory). One option is to use the remaining capacity
as additional memory by default and optionally add an insertion option
that allocates temporary storage to achieve maximum performance.

-> we need to reimplement std::sort/std::stable_sort/std::inplace_merge
in a Boost.Move-compatible way (using boost::adl_move_swap). It should
allow customizing the temporary storage source. Maybe sorting experts
would like to implement it in Boost.Move.

> Of course, when the underlying vector runs out of capacity, a
> non-inplace merge will be performed after reallocation.
>
> Upon closer look on the subject line, did I just reinvent a wheel? :-)

Maybe, but everyone should reinvent the wheel at least once in life.
It's an great pleasure!

Best,

Ion

Phil Endecott

unread,
Mar 13, 2015, 12:42:00 PM3/13/15
to bo...@lists.boost.org
Peter Dimov wrote:
> Ion Gazta?aga wrote:
>> El 13/03/2015 a las 14:38, Phil Endecott escribi?:
>>
>> > (Personally, I have a flat_map that does batched insertions i.e. the new
>> > items are appended individually and then merged at the end of the batch.
>> > This will have significant benefits for some workloads.)
>>
>> Yes, this should be added to flat_map. The interface won't be very pretty,
>> but I would like to hear suggestions in this direction.

I think we discussed this before. My preferred approach is

flat_set S;
{
flat_set::batch b(S);
b.insert(xxx);
b.insert(yyy);
} // b's dtor does the merge

Or, if you don't like doing things in destructors, give batch a
"commit" or
"merge" method.

One question is: should lookups while a batch is in progress (a) look
only at
pre-batch items, or (b) crash, or (c) work correctly (with a linear
scan of the
new items)? I prefer (a).


> It doesn't have to have an interface, at least in principle. flat_map can
> just append the new elements to the end, keeping them sorted, until it hits
> a threshold, then merge. Lookups will first look at the new element range
> (because those are likely to be in cache), if not found, at the old element
> range.

I once tried to work out the complexity of operations on this
structure, and
I don't think the results were very impressive.

Say you have N elements in the "main" container, and M elements in the
"recently added" portion. Insertion will be O(M) due to moving the older
"new" elements, plus some amortised amount for the eventual merge into the
main container. Lookup will be O(log N + log M). Traversal is O(N + M)
but with a significant constant factor as you have to compare main and new
elements as you go.

I think a useful maximum M is sqrt(N). If M is smaller than sqrt(N), then
you are doing the merge often enough that the amortised merge complexity
exceeds the insertion complexity.

It is also possible to make this a recursive data structure, i.e. the
"recently added" portion is the same sort of container with its own
"very recently added" portion, ad infinitum. I don't think I managed
to work out the complexity of that structure. I suspect that there is a
provable lower bound on what can be achieved with "flat" data structures,
though.


Regards, Phil.

Peter Dimov

unread,
Mar 13, 2015, 3:28:39 PM3/13/15
to bo...@lists.boost.org
Phil Endecott wrote:

> I think we discussed this before. My preferred approach is
>
> flat_set S;
> {
> flat_set::batch b(S);
> b.insert(xxx);
> b.insert(yyy);
> } // b's dtor does the merge

What do you do when b.insert(yyy) throws?

Giacomo Drago

unread,
Mar 14, 2015, 5:30:30 AM3/14/15
to bo...@lists.boost.org
On 2015-03-13 08:27, Dominique Devienne wrote:
> On Fri, Mar 13, 2015 at 9:22 AM, Ion Gaztañaga <igazt...@gmail.com> wrote:
>
> Might also be interesting to know if your perf numbers are based on a
> primitive key or a UDT, and whether the relative performance changes
> depending on key/value sizes. --DD
>

On 2015-03-13 09:11, Giacomo Drago wrote:> On 2015-03-13 08:27,

Dominique Devienne wrote:
>> On Fri, Mar 13, 2015 at 9:22 AM, Ion Gaztañaga <igazt...@gmail.com>
>> wrote:
>>>
>>> Just a question. Are your insertions one by one or by range?
>>>
>>
>> Might also be interesting to know if your perf numbers are based on a
>> primitive key or a UDT, and whether the relative performance changes
>> depending on key/value sizes. --DD

This is what happens with a key made of four integers rather than just
an integer.

Size: 50000
std::map:
Random insertion: 19.881 ms
Random find: 19.87 ms
Iteration: 6.241 ms


Random erase: 42.872 ms
experimental::flat_map:

Random insertion: 509.001 ms
Random find: 14.922 ms
Iteration: 9.036 ms
Random erase: 727.366 ms
boost::container::flat_map:
Random insertion: 1049.87 ms
Random find: 7.778 ms
Iteration: 0.082 ms
Random erase: 885.753 ms

Size: 200000
std::map:
Random insertion: 124.174 ms
Random find: 133.111 ms
Iteration: 21.915 ms
Random erase: 253.892 ms
experimental::flat_map:
Random insertion: 8181.72 ms
Random find: 96.38 ms
Iteration: 45.847 ms
Random erase: 10951.3 ms
boost::container::flat_map:
Random insertion: 20271.5 ms
Random find: 53.225 ms
Iteration: 0.455 ms
Random erase: 17338.5 ms

Best
Giacomo

Phil Endecott

unread,
Mar 14, 2015, 9:10:26 AM3/14/15
to bo...@lists.boost.org
Peter Dimov wrote:
> Phil Endecott wrote:
>
>> I think we discussed this before. My preferred approach is
>>
>> flat_set S;
>> {
>> flat_set::batch b(S);
>> b.insert(xxx);
>> b.insert(yyy);
>> } // b's dtor does the merge
>
> What do you do when b.insert(yyy) throws?

You have at least a couple of choices, e.g. the dtor merges
the successfully-inserted values or it discards everything.

Of course the next question is whether either merging or
discarding can also throw. I would hope that discarding
(i.e. truncating the vector back to its original size)
could reasonably be expected never to throw, right?
Merging would require that the comparisons cannot throw.


Regards, Phil.

Giacomo Drago

unread,
Mar 14, 2015, 8:34:10 PM3/14/15
to bo...@lists.boost.org
On 2015-03-13 15:55, Giacomo Drago wrote:
> On 2015-03-13 13:00, Marcelo Zimbres wrote:
>> "Hi
>> Would it be easy for you to provide benchmarks of std::map with
>> boost::container::node_allocator available in latest boost release?
>> According to my own benchmarks it can improve performance a lot,
>> mainly when memory is likely to be fragmented.
>
> Of course, I will do it ASAP.
>

Hi Marcelo,

I tried to do some benchmarks with boost::container::node_allocator. I
found little or no examples on how to use it. I tried to provide it as
the fourth parameter of std::map and I got weird compiler errors, so I
fell back to boost::container::map.

This is what I get with integer keys:

Size: 50000
std::map:
Random insertion: 16.916 ms
Random find: 24.702 ms
Iteration: 7.528 ms
Random erase: 40.328 ms
boost::container::map<..., boost::container::node_allocator<...>>:
Random insertion: 13.085 ms
Random find: 11.189 ms
Iteration: 5.301 ms
Random erase: 34.896 ms
experimental::flat_map:
Random insertion: 347.988 ms
Random find: 11.603 ms
Iteration: 7.723 ms
Random erase: 494.447 ms
boost::container::flat_map:
Random insertion: 644.319 ms
Random find: 5.481 ms
Iteration: 0.106 ms
Random erase: 624.03 ms

Size: 200000
std::map:
Random insertion: 120.064 ms
Random find: 142.521 ms
Iteration: 22.736 ms
Random erase: 212.725 ms
boost::container::map<..., boost::container::node_allocator<...>>:
Random insertion: 106.19 ms
Random find: 86.353 ms
Iteration: 19.375 ms
Random erase: 169.475 ms
experimental::flat_map:
Random insertion: 5207.11 ms
Random find: 72.71 ms
Iteration: 36.488 ms
Random erase: 6861.92 ms
boost::container::flat_map:
Random insertion: 10995 ms
Random find: 50.254 ms
Iteration: 0.317 ms
Random erase: 10924.4 ms


This what I get with a 4-integer struct as the key:

Size: 50000
std::map:
Random insertion: 15.269 ms
Random find: 19.85 ms
Iteration: 7.1 ms
Random erase: 45.85 ms
boost::container::map<..., boost::container::node_allocator<...>>:
Random insertion: 18.278 ms
Random find: 31.747 ms
Iteration: 4.911 ms
Random erase: 33.039 ms
experimental::flat_map:
Random insertion: 551.777 ms
Random find: 12.955 ms
Iteration: 7.921 ms
Random erase: 688.911 ms
boost::container::flat_map:
Random insertion: 1040.7 ms
Random find: 7.71 ms
Iteration: 0.161 ms
Random erase: 870.031 ms

Size: 200000
std::map:
Random insertion: 128.527 ms
Random find: 137.222 ms
Iteration: 21.572 ms
Random erase: 206.251 ms
boost::container::map<..., boost::container::node_allocator<...>>:
Random insertion: 124.658 ms
Random find: 152.009 ms
Iteration: 23.368 ms
Random erase: 193.538 ms
experimental::flat_map:
Random insertion: 7969.37 ms
Random find: 92.236 ms
Iteration: 41.396 ms
Random erase: 10724.9 ms
boost::container::flat_map:
Random insertion: 19883.7 ms
Random find: 57.496 ms
Iteration: 0.459 ms
Random erase: 17759.1 ms


The size of the keys seems to make quite a difference. And of course,
one has to consider the overhead given by the pointers in a linked
structure, however allocations are performed.

What are your thoughts?

Giacomo Drago

unread,
Mar 14, 2015, 8:40:16 PM3/14/15
to bo...@lists.boost.org
OK, it is time for me to decide whether to submit this or not. Can this
flat_map reasonably make it into Boost? If not, I'll push it on GitHub
after polishing the source code.

I thank you all for your kind feedback!

Giacomo

Phil Endecott

unread,
Mar 15, 2015, 12:15:30 PM3/15/15
to bo...@lists.boost.org
Giacomo Drago wrote:
> OK, it is time for me to decide whether to submit this or not.

Tell us how it works!

(If you don't want to tell us how it works, then no, this is
not suitable for Boost.)


Regards, Phil.

Giacomo Drago

unread,
Mar 15, 2015, 2:14:50 PM3/15/15
to bo...@lists.boost.org
On 2015-03-15 16:15, Phil Endecott wrote:
> Giacomo Drago wrote:
>> OK, it is time for me to decide whether to submit this or not.
>
> Tell us how it works!
>
> (If you don't want to tell us how it works, then no, this is
> not suitable for Boost.)
>

Of course I want, it is not secret, but usually people can express a
(lack of) interest in something without knowing "how it works," like:
"No, we don't need a time machine that brings you to the same time you
are when you use it." By telling this, you save some time to the
self-deluded inventor, and to yourselves.

Anyway, I finally managed to find some time to publish the source code
with a brief explanation: https://github.com/giacomodrago/flat_map

Feel free to challenge whatever part of it. ;) Please don't remark (at
least not now) the absence of proper iterators or exception safety, and
any other implementation detail, because at the moment I still have to
figure out whether this data structure is a reasonable idea in the first
place. There's time for polishing it. Of course, if a bug is affecting
the benchmarks, please tell me: I am sorry, it is not done on purpose.

Best
Giacomo

Marcelo Zimbres

unread,
Mar 16, 2015, 5:17:31 AM3/16/15
to bo...@lists.boost.org
"The size of the keys seems to make quite a difference. And of course, one has
to consider the overhead given by the pointers in a linked structure, however
allocations are performed.

What are your thoughts?"

Hi Giacomo,

thanks for providing these benchmarks with booost::container::node_allocator.
My point is that the performance of std::map depends so much on the allocation
strategy used that it is almost unfair to compare its benchmark when using the
standard allocator. One can draw conclusions about its bad performance when
the real problem may be on the allocator used. I see for example that for size
50000 the boost::container::map with the node allocator performs better than
your experimental map in all cases (specially on random insertion and erase).

You may want to have a look on this plot [1] of my own benchmarks of std::set.
It will give you an idea of how much the allocator influences the performance.

Regards,
Marcelo

[1] https://github.com/mzimbres/rtcpp/blob/master/fig/std_set_insertion.png

Giacomo Drago

unread,
Mar 16, 2015, 6:52:45 AM3/16/15
to bo...@lists.boost.org
On 2015-03-15 22:33, Marcelo Zimbres wrote:
> "The size of the keys seems to make quite a difference. And of course, one has
> to consider the overhead given by the pointers in a linked structure, however
> allocations are performed.
>
> What are your thoughts?"
>
> Hi Giacomo,
>
> thanks for providing these benchmarks with booost::container::node_allocator.
> My point is that the performance of std::map depends so much on the allocation
> strategy used that it is almost unfair to compare its benchmark when using the
> standard allocator. One can draw conclusions about its bad performance when
> the real problem may be on the allocator used. I see for example that for size
> 50000 the boost::container::map with the node allocator performs better than
> your experimental map in all cases (specially on random insertion and erase).
>
> You may want to have a look on this plot [1] of my own benchmarks of std::set.
> It will give you an idea of how much the allocator influences the performance.
>

Thank you for your feedback.

While I see your point, I think one does not choose a flat_map over a
linked structure only for having faster lookups. We are talking about an
associative container with no memory overhead, and, possibly, faster
lookups. When I am in need of an associative, sorted container, in 95%
of the cases I would opt for the vanilla std::map, with or without a
better allocator. But we are addressing specific needs here.

The comparison I am making is between boost's flat_map and my
"experimental" flat_map, with a linked-structure map providing just the
baseline to understand what are the trade-offs involved. Furthermore,
comparing insertion/erasure performance of std::map (with or without
boost::container::node_allocator) with insertion/erasure in whatever
flat map is...uhm...pointless. It's... well... another Big-O. Everybody
knows the linked structure to have blazing fast insertions and erasures
when compared to a vector where you have to move elements around.

I also point you to the size 50000 benchmark when the key is a 4-integer
PDO.

Said that, thank you for talking about the importance of allocators:
I'll keep that in mind for future benchmarks.

Mathias Gaunard

unread,
Mar 18, 2015, 5:37:46 PM3/18/15
to bo...@lists.boost.org
On 13/03/15 17:41, Phil Endecott wrote:
> Peter Dimov wrote:
>> Ion Gazta?aga wrote:
>>> El 13/03/2015 a las 14:38, Phil Endecott escribi?:
>>>
>>> > (Personally, I have a flat_map that does batched insertions i.e.
>>> the new > items are appended individually and then merged at the end
>>> of the batch. > This will have significant benefits for some workloads.)
>>>
>>> Yes, this should be added to flat_map. The interface won't be very
>>> pretty, but I would like to hear suggestions in this direction.
>
> I think we discussed this before. My preferred approach is
>
> flat_set S;
> {
> flat_set::batch b(S);
> b.insert(xxx);
> b.insert(yyy);
> } // b's dtor does the merge
>

How is that any different from the following?

flat_set S;
flat_set b;
b.insert(xxx);
b.insert(yyy);
S.insert_sorted(b.begin(), b.end());

Ion Gaztañaga

unread,
Mar 19, 2015, 4:29:54 AM3/19/15
to bo...@lists.boost.org
El 18/03/2015 a las 22:37, Mathias Gaunard escribió:

>
> How is that any different from the following?
>
> flat_set S;
> flat_set b;
> b.insert(xxx);
> b.insert(yyy);
> S.insert_sorted(b.begin(), b.end());

Currently you can do this:

S.insert(ordered_unique_range, b.begin(), b.end());

It's more efficient that a loop inserting one by one, but it still can
be improved. If:

- distance(b.begin(), b.end()) == M and
- S.size() == N

currently (Boost 1.58) it does MlogN comparisons and at most N + M
moves/copies for Bidirectional Iterators. It allocates extra memory for
that (memory for insertion positions/indexes that allow linear
move/copies when inserting M objects in those positions.

Currently I'm working on vector::merge/merge_unique functions that need
N + M comparisons and moves, while trying to minimize extra memory (if
vector memory needs to be reallocated because S.capacity() < N + M, it
uses a std::merge variant to achieve O(N). I plan this improvement for
Boost 1.59. If there is enough room in current vector memory for to
store N elements plus indexes, it achieves also O(N) without allocating
a new buffer. An optional refinement would be to fallback to MlogN
(using binary search) instead of N+M comparisons when N >> M.

I have plans to improve non-ordered range insertion to achieve optimal
complexity but that requires storing and sorting (stable sorting for
non-unique containers) the range [b.begin(), b.end()) in auxiliary
memory, and then merging the newly ordered M elements with already
stored N elements. Using std::stable_sort/sort +
std::merge/inplace_merge is not optimal as they require additional data
movements and memory allocations (e.g. memory from
get_temporary_storage() is not reused). Since this optimal solution
requires reimplementing stable_sort/sort/inplace_merge adding support
for external temporary storage and Boost.Move I don't think it's
feasible in the short term.

Best,

Ion

Phil Endecott

unread,
Mar 19, 2015, 11:56:23 AM3/19/15
to bo...@lists.boost.org
Mathias Gaunard <mathias...@ens-lyon.org> wrote:
> On 13/03/15 17:41, Phil Endecott wrote:
>> flat_set S;
>> {
>> flat_set::batch b(S);
>> b.insert(xxx);
>> b.insert(yyy);
>> } // b's dtor does the merge
>>
>
> How is that any different from the following?
>
> flat_set S;
> flat_set b;
> b.insert(xxx);
> b.insert(yyy);
> S.insert_sorted(b.begin(), b.end());

Your code uses more memory (temporarily) and does more copying.

To be clear, my "batch" is a small object that stores only a
reference to the flat_set and some iterators/indexes/pointers.
Here's a sketch of one way of doing it:

class batch
{
flat_set& f;
size_t orig_size;
public:
batch(flat_set& f_): f(f_), orig_size(f.size()) {}
void insert(const T& val) { f.push_back(val); }
void rollback() { f.truncate(orig_size); }
void commit() {
sort(f.begin()+orig_size, f.end());
inplace_merge(f.begin(),f.begin()+orig_size,f.end());
orig_size = f.size();
}
~batch() { commit(); }
};


Regards, Phil.

Phil Endecott

unread,
Mar 19, 2015, 1:05:59 PM3/19/15
to bo...@lists.boost.org
Giacomo Drago <gia...@giacomodrago.com> wrote:
> Anyway, I finally managed to find some time to publish the source code
> with a brief explanation: https://github.com/giacomodrago/flat_map

I think your "brief explanation" is this sentence:

[The elements] are organised in a heap where each level is kept sorted.

OK. If there is more that I've missed, please let me know.

That almost sounds like an implicit binary tree, except that I guess
you're storing the min item in the parent, not the mid. I.e. to store
the values 0 to 6, an implicit binary tree would store:

3 1 5 0 2 4 6

The children of item i are at locations 2i+1 and 2i+2. To iterate the
items in order, you do a mid-order traversal.

But a (min-) heap with sorted levels would be more like this:

0 1 4 2 3 5 6

In this case you can do a pre-order traversal to get the items in-order.

Have I understood correctly?

An implicit tree can clearly do lookup in O(log N). Insertion at a leaf
is also O(log N), but I'm not sure about rebalancing; I suspect that each
time you move a node you need to move all of its children, which obviously
gets expensive.

You say that you think your lookup is O(log^2 N), so maybe I've not
correctly understood your design.

The last time I looked at implicit binary trees, it was with the aim of
improving locality of reference: a binary search in a flat_map, although
theoretically O(log N), has terribly unfriendly VM and cache behaviour
for large N. This is the same argument as for B-trees.


Regards, Phil.

P.S. Have you looked at Boost.Heap?

Giacomo Drago

unread,
Mar 19, 2015, 3:22:17 PM3/19/15
to bo...@lists.boost.org
Thanks for your kind reply, Phil. I will answer your questions the best
I can.

On 2015-03-19 17:05, Phil Endecott wrote:
> That almost sounds like an implicit binary tree, except that I guess
> you're storing the min item in the parent, not the mid. I.e. to store
> the values 0 to 6, an implicit binary tree would store:
>
> 3 1 5 0 2 4 6
>
> The children of item i are at locations 2i+1 and 2i+2. To iterate the
> items in order, you do a mid-order traversal.
>
> But a (min-) heap with sorted levels would be more like this:
>
> 0 1 4 2 3 5 6
>
> In this case you can do a pre-order traversal to get the items in-order.

A simple pre-order traversal? Does it work? In your example it clearly
does, but you can also have

[ 0 ] [ 1 4 ] [ 2 5 6 7 ]

In that case, how do you 'jump' from 2 up to 4 and then down again to 5?
How do you know "who's next" (as it can be someone ten levels above)? I
am just confused and talking nonsense?

> You say that you think your lookup is O(log^2 N), so maybe I've not
> correctly understood your design.

I basically make a per-level binary search with some minor optimizations
easily understandable by reading the code (but I can elaborate, if I am
making some sense so far and you wish me to continue). How can you
search in O(log N) in a heap where the elements of each level are sorted
(not antagonizing, genuinely just asking for curiosity)?

If I convinced you so far, then we can get into how insertion and
erasure work.

> P.S. Have you looked at Boost.Heap?
I used Boost.Heap with great profit in the past. I didn't use any Boost
library for this project because the implementation is a throwaway
prototype and hacking up a modified heap by the means of a std::vector
was, for me, the fastest way to get the job done - for now.
In the unlikely case this strange and hardly useful data structure made
it into Boost, it would be rewritten and properly Boostified, without
any reinvention of the wheel of course, and stress tested to death.

Best
Giacomo

Phil Endecott

unread,
Mar 21, 2015, 1:00:34 PM3/21/15
to bo...@lists.boost.org
Giacomo Drago wrote:
> On 2015-03-19 17:05, Phil Endecott wrote:
>> That almost sounds like an implicit binary tree, except that I guess
>> you're storing the min item in the parent, not the mid. I.e. to store
>> the values 0 to 6, an implicit binary tree would store:
>>
>> 3 1 5 0 2 4 6
>>
>> The children of item i are at locations 2i+1 and 2i+2. To iterate the
>> items in order, you do a mid-order traversal.
>>
>> But a (min-) heap with sorted levels would be more like this:
>>
>> 0 1 4 2 3 5 6
>>
>> In this case you can do a pre-order traversal to get the items in-order.
>
> A simple pre-order traversal? Does it work? In your example it clearly
> does, but you can also have
>
> [ 0 ] [ 1 4 ] [ 2 5 6 7 ]

Well that doesn't look like "each level is kept sorted" to me, but
thanks for the further explanation of what you're actually doing.

It is possible that this is a novel structure. I'm not convinced
that it is useful. If you want a reasonably compact representation,
i.e. without the overhead of per-item pointers, but need fast
insertion and lookup (both in terms of computational complexity and
absolute speed), then an in-memory B tree is hard to beat.


Regards, Phil.

Giacomo Drago

unread,
Mar 21, 2015, 1:23:11 PM3/21/15
to bo...@lists.boost.org
On 2015-03-21 17:00, Phil Endecott wrote:
> Giacomo Drago wrote:
> Well that doesn't look like "each level is kept sorted" to me, but
> thanks for the further explanation of what you're actually doing.
Well, it is *exactly* "each level is kept sorted", literally. :)

> It is possible that this is a novel structure. I'm not convinced
> that it is useful. If you want a reasonably compact representation,
> i.e. without the overhead of per-item pointers, but need fast
> insertion and lookup (both in terms of computational complexity and
> absolute speed), then an in-memory B tree is hard to beat.

I see. Can you provide me with an implementation of such data structure
having the same memory footprint of a flat_map (no per-item pointers)
and better performance characteristics? I look forward to seeing it.

Best
Giacomo

Phil Endecott

unread,
Mar 22, 2015, 10:41:25 AM3/22/15
to bo...@lists.boost.org
Giacomo Drago wrote:
> On 2015-03-21 17:00, Phil Endecott wrote:
>> Giacomo Drago wrote:
>> Well that doesn't look like "each level is kept sorted" to me, but
>> thanks for the further explanation of what you're actually doing.
> Well, it is *exactly* "each level is kept sorted", literally. :)

To be clear: in your structure, each pair of siblings is ordered.
But those elements are not ordered with respect to the other items
at the same level (the "cousins").

>> It is possible that this is a novel structure. I'm not convinced
>> that it is useful. If you want a reasonably compact representation,
>> i.e. without the overhead of per-item pointers, but need fast
>> insertion and lookup (both in terms of computational complexity and
>> absolute speed), then an in-memory B tree is hard to beat.
>
> I see. Can you provide me with an implementation of such data structure
> having the same memory footprint of a flat_map (no per-item pointers)
> and better performance characteristics? I look forward to seeing it.

There is an in-memory B-tree at google code that looks decent, though
I've not personally used it:

http://code.google.com/p/cpp-btree/

If you click "Wiki pages - UsageInstructions" at the bottom left of
that page you'll find tables giving the memory overhead. A B-tree has
no per-item pointers, but it does have per-page pointers; you can
asymptotically approach zero overhead by increasing the page size.
With an infinite page size you effectively have a flat_map, with
a page size of one you have a binary tree.

In practice, a more important influence on memory footprint than the
per-page pointers is the empty space in the pages that results from
splitting after insertion, and after deletion but before merging.
But a flat_map implemented on a std::vector has a similar issue, i.e.
that std::vector doubles its size when it re-allocates. So a
std::vector will be wasting up to half of the memory that it has
allocated.


Regards, Phil.

Giacomo Drago

unread,
Mar 22, 2015, 10:58:44 AM3/22/15
to bo...@lists.boost.org
Hi Phil, thanks for following up.

On 2015-03-22 14:41, Phil Endecott wrote:
> To be clear: in your structure, each pair of siblings is ordered.
> But those elements are not ordered with respect to the other items
> at the same level (the "cousins").

Well... To be clear: they are. For each level of the binary heap (if we
agree on the definition of level) the elements are sorted. Indeed,
that's basically the whole point of it, otherwise I couldn't make a
level-wise binary search when looking up, and I would need some sort of
sorcery to get those benchmarks. Are we... on the same page? Am I
missing something?


> There is an in-memory B-tree at google code that looks decent, though
> I've not personally used it:
>
> http://code.google.com/p/cpp-btree/
>
> If you click "Wiki pages - UsageInstructions" at the bottom left of
> that page you'll find tables giving the memory overhead. A B-tree has
> no per-item pointers, but it does have per-page pointers; you can
> asymptotically approach zero overhead by increasing the page size.
> With an infinite page size you effectively have a flat_map, with
> a page size of one you have a binary tree.
>
> In practice, a more important influence on memory footprint than the
> per-page pointers is the empty space in the pages that results from
> splitting after insertion, and after deletion but before merging.
> But a flat_map implemented on a std::vector has a similar issue, i.e.
> that std::vector doubles its size when it re-allocates. So a
> std::vector will be wasting up to half of the memory that it has
> allocated.

On a vector you can easily shrink_to_fit thus bringing the overhead to
zero (if you neglect the vector size and a pointer to its buffer). I'll
study the B-tree and its trade-offs - I declare my ignorance in this
particular area - but I still think a flat_map (that does not mean this
flat_map in particular) has some reason to be.

Best
Giacomo

Phil Endecott

unread,
Mar 22, 2015, 12:40:40 PM3/22/15
to bo...@lists.boost.org
Giacomo Drago wrote:
> On 2015-03-22 14:41, Phil Endecott wrote:
>> There is an in-memory B-tree at google code that looks decent, though
>> I've not personally used it:
>>
>> http://code.google.com/p/cpp-btree/
>>
>> If you click "Wiki pages - UsageInstructions" at the bottom left of
>> that page you'll find tables giving the memory overhead. A B-tree has
>> no per-item pointers, but it does have per-page pointers; you can
>> asymptotically approach zero overhead by increasing the page size.
>> With an infinite page size you effectively have a flat_map, with
>> a page size of one you have a binary tree.
>>
>> In practice, a more important influence on memory footprint than the
>> per-page pointers is the empty space in the pages that results from
>> splitting after insertion, and after deletion but before merging.
>> But a flat_map implemented on a std::vector has a similar issue, i.e.
>> that std::vector doubles its size when it re-allocates. So a
>> std::vector will be wasting up to half of the memory that it has
>> allocated.
>
> On a vector you can easily shrink_to_fit thus bringing the overhead to
> zero (if you neglect the vector size and a pointer to its buffer).

And you can compact ("vacuum") a B-tree, if you want.

> I still think a flat_map (that does not mean this
> flat_map in particular) has some reason to be.

For me, the main uses of flat data structures have been:

(a) When it's important to have no pointers at all, so that I can store
the data in a memory-mapped file or perhaps in interprocess shared
memory. (Though most recently I decided to use offset pointers from
Boost.Interprocess along with a Boost.Intrusive map, and that worked
well.)

(b) When the data has a "life cycle" with different phases, e.g. an
initial phase of bulk creation, followed by a phase of random lookup.
In this case you only sort once so the computational complexity is
optimal.

I would not want to use them when either you have fine-grained mixing
of insertion and lookup, or when the amount of data is large enough
that the poor locality of reference starts to dominate.


Regards, Phil.

Giacomo Drago

unread,
Mar 22, 2015, 1:13:36 PM3/22/15
to bo...@lists.boost.org
On 2015-03-22 16:40, Phil Endecott wrote:
> And you can compact ("vacuum") a B-tree, if you want.

I see.


> (a) When it's important to have no pointers at all, so that I can store
> the data in a memory-mapped file or perhaps in interprocess shared
> memory. (Though most recently I decided to use offset pointers from
> Boost.Interprocess along with a Boost.Intrusive map, and that worked
> well.)
>
> (b) When the data has a "life cycle" with different phases, e.g. an
> initial phase of bulk creation, followed by a phase of random lookup.
> In this case you only sort once so the computational complexity is
> optimal.
>
> I would not want to use them when either you have fine-grained mixing
> of insertion and lookup, or when the amount of data is large enough
> that the poor locality of reference starts to dominate.

You make a good point here!


From your silence on the "sorted vs non sorted" issue, I am positive we
are on the same page about that, regardless its relevance.


Best
Giacomo

Phil Endecott

unread,
Mar 24, 2015, 3:32:12 PM3/24/15
to bo...@lists.boost.org
Giacomo Drago wrote:
> From your silence on the "sorted vs non sorted" issue, I am positive we
> are on the same page about that, regardless its relevance.

Yes, I have finally worked out what you're doing there.

It seems to me that for an insertion, you'll potentially have to
move all of the bottom level; that will be up to half of the
container. If my maths is right, you'll do on average one third
of the number of moves that a flat_map will do.

Another way to get the same saving would be to have N normal
flat_maps (where N is e.g. 2 or 3). Insert into whichever is
currently smallest, and lookup in both. That has the same
typical and worst-case complexity as a flat_map, but you can
trade off (by constant factors) the cost of insertion vs. the
cost of lookup and iteration by adjusting N. Looking at some
of your benchmark numbers, you seem to be getting about X2 for
insertion and erasing, vs. X0.66 for find. So I would expect
a "flat map pair" to get comparable performance, and without
losing the worst-case complexity of flat_map.


Regards, Phil.

Giacomo Drago

unread,
Apr 7, 2015, 9:32:15 AM4/7/15
to bo...@lists.boost.org
On 2015-03-24 19:32, Phil Endecott wrote:
> It seems to me that for an insertion, you'll potentially have to
> move all of the bottom level; that will be up to half of the
> container. If my maths is right, you'll do on average one third
> of the number of moves that a flat_map will do.

I concur with you.


> Another way to get the same saving would be to have N normal
> flat_maps (where N is e.g. 2 or 3). Insert into whichever is
> currently smallest, and lookup in both. That has the same
> typical and worst-case complexity as a flat_map, but you can
> trade off (by constant factors) the cost of insertion vs. the
> cost of lookup and iteration by adjusting N.

This is simple and brilliant, and can help me with the actual scenario I
am dealing with. Do you mind if I use this idea, and I appropriately
mention you? At this point I think there is no reason to submit this
library to Boost.


Best
Giacomo

Phil Endecott

unread,
Apr 8, 2015, 5:40:14 AM4/8/15
to bo...@lists.boost.org
Giacomo Drago wrote:
> On 2015-03-24 19:32, Phil Endecott wrote:
>> Another way to get the same saving would be to have N normal
>> flat_maps (where N is e.g. 2 or 3). Insert into whichever is
>> currently smallest, and lookup in both. That has the same
>> typical and worst-case complexity as a flat_map, but you can
>> trade off (by constant factors) the cost of insertion vs. the
>> cost of lookup and iteration by adjusting N.
>
> This is simple and brilliant, and can help me with the actual scenario I
> am dealing with. Do you mind if I use this idea, and I appropriately
> mention you?

You're welcome, and there is no need for any attribution.


Regards, Phil.
Reply all
Reply to author
Forward
0 new messages