Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

what is the fastest way to store a list of unique pointers ?

71 views
Skip to first unread message

Lynn McGuire

unread,
Aug 11, 2017, 6:12:11 PM8/11/17
to
What is the fastest way to store a list of unique pointers ?

We were storing a list of unique pointers in a std::vector list. But
searching it is slow before deciding to add a new pointer. So one of my
programmers changed the std::vector list to a std::map. That way we
always "add" the pointer to the list but never have to look it up. But,
the resulting value is always just set to 1. The resulting std::map
uses about 10% of the time that the std::vector did.

std::map <DataGroup * ptr, int> aList;

void addNewPointer (DataGroup * newPtr)
{
aList [newPtr] = 1;
}

Thanks,
Lynn

Alf P. Steinbach

unread,
Aug 11, 2017, 6:30:54 PM8/11/17
to
Have you tried a `std::unordered_set`?


Cheers!,

- Alf

Öö Tiib

unread,
Aug 11, 2017, 6:40:11 PM8/11/17
to
On Saturday, 12 August 2017 01:12:11 UTC+3, Lynn McGuire wrote:
> What is the fastest way to store a list of unique pointers ?

What program does with such storage? It creates reads updates and
deletes (CRUD) its elements. So "fastness" of the way to store depends
upon what patterns and proportions of elementary operations of that
CRUD the program will have.

>
> We were storing a list of unique pointers in a std::vector list. But
> searching it is slow before deciding to add a new pointer.

If searching is important why didn't you sort the vector?

> So one of my programmers changed the std::vector list to a std::map.
> That way we always "add" the pointer to the list but never have to
> look it up. But, the resulting value is always just set to 1.

If the value is bogus then why you did not use set or unordered_set?

> The resulting std::map uses about 10% of the time that the std::vector
> did.
>
> std::map <DataGroup * ptr, int> aList;
>
> void addNewPointer (DataGroup * newPtr)
> {
> aList [newPtr] = 1;
> }

It can sure be that speed of adding new pointers is most important feature
of container but it can also be that it is not.

Manfred

unread,
Aug 11, 2017, 7:21:53 PM8/11/17
to
I think that std::set is the closest to the current behaviour (std::map
using only keys). And I think it is faster than unordered_set too,
although I have to say I did not check.

>
>
> Cheers!,
>
> - Alf

Alf P. Steinbach

unread,
Aug 11, 2017, 7:27:28 PM8/11/17
to
On 12.08.2017 01:21, Manfred wrote:
> On 08/12/2017 12:30 AM, Alf P. Steinbach wrote:
>> On 12.08.2017 00:11, Lynn McGuire wrote:
>>> What is the fastest way to store a list of unique pointers ?
>>>
>>> We were storing a list of unique pointers in a std::vector list. But
>>> searching it is slow before deciding to add a new pointer. So one of
>>> my programmers changed the std::vector list to a std::map. That way
>>> we always "add" the pointer to the list but never have to look it up.
>>> But, the resulting value is always just set to 1. The resulting
>>> std::map uses about 10% of the time that the std::vector did.
>>>
>>> std::map <DataGroup * ptr, int> aList;
>>>
>>> void addNewPointer (DataGroup * newPtr)
>>> {
>>> aList [newPtr] = 1;
>>> }
>>
>> Have you tried a `std::unordered_set`?
>
> I think that std::set is the closest to the current behaviour (std::map
> using only keys).

Can't say that I understand what you're thinking of here.


> And I think it is faster than unordered_set too,
> although I have to say I did not check.

Not this either. Since when did hashing become slower than a tree search?



Cheers!,

- Alf

Manfred

unread,
Aug 11, 2017, 7:39:07 PM8/11/17
to
On 08/12/2017 01:27 AM, Alf P. Steinbach wrote:
> On 12.08.2017 01:21, Manfred wrote:
>> On 08/12/2017 12:30 AM, Alf P. Steinbach wrote:
>>> On 12.08.2017 00:11, Lynn McGuire wrote:
>>>> What is the fastest way to store a list of unique pointers ?
>>>>
>>>> We were storing a list of unique pointers in a std::vector list.
>>>> But searching it is slow before deciding to add a new pointer. So
>>>> one of my programmers changed the std::vector list to a std::map.
>>>> That way we always "add" the pointer to the list but never have to
>>>> look it up. But, the resulting value is always just set to 1. The
>>>> resulting std::map uses about 10% of the time that the std::vector did.
>>>>
>>>> std::map <DataGroup * ptr, int> aList;
>>>>
>>>> void addNewPointer (DataGroup * newPtr)
>>>> {
>>>> aList [newPtr] = 1;
>>>> }
>>>
>>> Have you tried a `std::unordered_set`?
>>
>> I think that std::set is the closest to the current behaviour
>> (std::map using only keys).
>
> Can't say that I understand what you're thinking of here.
keys in a std::map are ordered, aren't they?
Since the OP is not using the map values, but only the keys, then I
think that such usage of the map is practically equivalent to using a
std::set.

>
>
>> And I think it is faster than unordered_set too, although I have to
>> say I did not check.
>
> Not this either. Since when did hashing become slower than a tree search?
You may be right here, as I said, I did not check.

>
>
>
> Cheers!,
>
> - Alf

Alf P. Steinbach

unread,
Aug 11, 2017, 9:37:27 PM8/11/17
to
On 12.08.2017 01:38, Manfred wrote:
> On 08/12/2017 01:27 AM, Alf P. Steinbach wrote:
>> On 12.08.2017 01:21, Manfred wrote:
>>> On 08/12/2017 12:30 AM, Alf P. Steinbach wrote:
>>>> On 12.08.2017 00:11, Lynn McGuire wrote:
>>>>> What is the fastest way to store a list of unique pointers ?
>>>>>
>>>>> We were storing a list of unique pointers in a std::vector list.
>>>>> But searching it is slow before deciding to add a new pointer. So
>>>>> one of my programmers changed the std::vector list to a std::map.
>>>>> That way we always "add" the pointer to the list but never have to
>>>>> look it up. But, the resulting value is always just set to 1. The
>>>>> resulting std::map uses about 10% of the time that the std::vector
>>>>> did.
>>>>>
>>>>> std::map <DataGroup * ptr, int> aList;
>>>>>
>>>>> void addNewPointer (DataGroup * newPtr)
>>>>> {
>>>>> aList [newPtr] = 1;
>>>>> }
>>>>
>>>> Have you tried a `std::unordered_set`?
>>>
>>> I think that std::set is the closest to the current behaviour
>>> (std::map using only keys).
>>
>> Can't say that I understand what you're thinking of here.
> keys in a std::map are ordered, aren't they?

Oh. Well consider these points:

• The order of raw pointers is pretty arbitrary.

• `std::map` was not chosen for its good fit to the problem: they're not
using the mapping. I.e. there is AFAICS no reason to believe the order
is significant even if the keys were of a kind with a natural order.


> Since the OP is not using the map values, but only the keys, then I
> think that such usage of the map is practically equivalent to using a
> std::set.

Yes.


>>> And I think it is faster than unordered_set too, although I have to
>>> say I did not check.
>>
>> Not this either. Since when did hashing become slower than a tree search?
> You may be right here, as I said, I did not check.

An implementation of `unordered_set` can be slower than `set` for a
given use case, but it's unlikely.

The reason is that a search in a tree structure, while logarithmic, is
on average much slower than an on average constant time lookup.


Cheers & hth.,

- Alf

Marcel Mueller

unread,
Aug 12, 2017, 2:44:32 AM8/12/17
to
On 12.08.17 00.11, Lynn McGuire wrote:
> What is the fastest way to store a list of unique pointers ?

Fastest for scalable sets are in general hash sets. In C++ they are
implemented by unordered_set<>.

set<> itself uses Red-Black-Tree or something similar. Although the
complexity is O(log n) this is in practice amazingly inefficient because
of the large memory overhead (in case of small content) and the bad
locality (degrading cache efficiency). I almost never use this
containers (including map<>).

A reasonable compromise to the ordered standard containers are B-Trees.
Almost every database uses them. Unfortunately there is still no B-Tree
implementation in the standard.

> We were storing a list of unique pointers in a std::vector list. But
> searching it is slow before deciding to add a new pointer.

This is the worst choice. You ended up with complexity O(n²).

> So one of my
> programmers changed the std::vector list to a std::map. That way we
> always "add" the pointer to the list but never have to look it up. But,
> the resulting value is always just set to 1. The resulting std::map
> uses about 10% of the time that the std::vector did.

Use unordered_set<> as there is no need to keep the pointers in some
particular order as well as there is no need for an associative container.


Marcel

Melzzzzz

unread,
Aug 12, 2017, 2:57:37 AM8/12/17
to
On 2017-08-12, Marcel Mueller <news.5...@spamgourmet.org> wrote:
> On 12.08.17 00.11, Lynn McGuire wrote:
>> What is the fastest way to store a list of unique pointers ?
>
> Fastest for scalable sets are in general hash sets. In C++ they are
> implemented by unordered_set<>.
>
> set<> itself uses Red-Black-Tree or something similar. Although the
> complexity is O(log n) this is in practice amazingly inefficient because
> of the large memory overhead (in case of small content) and the bad
> locality (degrading cache efficiency). I almost never use this
> containers (including map<>).
>
> A reasonable compromise to the ordered standard containers are B-Trees.
> Almost every database uses them. Unfortunately there is still no B-Tree
> implementation in the standard.

There is nothing that prevents implementing std::map/set as btree.

--
press any key to continue or any other to quit...

Mr Flibble

unread,
Aug 12, 2017, 5:29:40 AM8/12/17
to
On 12/08/2017 07:44, Marcel Mueller wrote:
> set<> itself uses Red-Black-Tree or something similar. Although the
> complexity is O(log n) this is in practice amazingly inefficient because
> of the large memory overhead (in case of small content) and the bad
> locality (degrading cache efficiency). I almost never use this
> containers (including map<>).

One can use a pool allocator with set/map to improve allocation speed
and cache locality; boost has fast pool allocator for this purpose.

/Flibble

Öö Tiib

unread,
Aug 12, 2017, 6:23:34 AM8/12/17
to
Yes, but this tells us that for example when we have 42 000 'Growl*' some
of what share value then you suggest that set with pool allocator like
'std::set<Growl*, std::less<Growl*>, boost::fast_pool_allocator<Growl*>>'
is faster to fill with those than just 'std::set<Growl*>'.
The real question (that OP seemingly had) is if it is also faster
than 'std::unordered_set<Growl*>'.

Manfred

unread,
Aug 12, 2017, 7:52:25 AM8/12/17
to
Whatever the ordering for raw pointers, still std::map keys are ordered
- their intent has been since the beginning of time that of fast search.
I agree with you that the OP is misusing it, but, with respect to fast
lookup, the choice makes sense.

>
>
>> Since the OP is not using the map values, but only the keys, then I
>> think that such usage of the map is practically equivalent to using a
>> std::set.
>
> Yes.
>
>
>>>> And I think it is faster than unordered_set too, although I have to
>>>> say I did not check.
>>>
>>> Not this either. Since when did hashing become slower than a tree
>>> search?
>> You may be right here, as I said, I did not check.
>
> An implementation of `unordered_set` can be slower than `set` for a
> given use case, but it's unlikely.
>
> The reason is that a search in a tree structure, while logarithmic, is
> on average much slower than an on average constant time lookup.

That is typically true for large sets. Thanks for the clarification.

Manfred

unread,
Aug 12, 2017, 10:04:59 AM8/12/17
to
On 08/12/2017 03:40 PM, Stefan Ram wrote:
> Both Herb and Chandler (and to some extend also Bjarne)
> suggest, as far as I have understood them, to always use a
> vector, because the implementations of the sets are so slow.
> But my measurements seem to confirm the naïve assumption:
> That unordered_set is fastest.

This obviously depends on the usage of the container: for the general
case of random or sequential access, then vector wins.
When it comes to more complex operations like searching, then it is
understandable that other types of container work better.

What Bjarne often shows is that in unexpected cases, like insertion,
that list would naively be better equipped for, actually vector wins -
so the recommendation is to use vector as a default, but this does not
mean always.

Marcel Mueller

unread,
Aug 12, 2017, 10:27:14 AM8/12/17
to
In this case only the memory overhead of roughly one order of magnitude
(for small objects) remains.
And well, if I always have to use a pool allocator to keep the
performance reasonable the question arises whether it is the right
container.
Since I usually do not place large objects into a container std::set and
std::map is of no particular use for me.


Marcel

Marcel Mueller

unread,
Aug 12, 2017, 10:27:14 AM8/12/17
to
On 12.08.17 08.57, Melzzzzz wrote:
>> A reasonable compromise to the ordered standard containers are B-Trees.
>> Almost every database uses them. Unfortunately there is still no B-Tree
>> implementation in the standard.
>
> There is nothing that prevents implementing std::map/set as btree.

NACK!

The guarantee that iterators to untouched elements remain valid on
insert/delete makes a reasonable B-Tree implementation impossible.


Marcel

Melzzzzz

unread,
Aug 12, 2017, 11:22:01 AM8/12/17
to
On 2017-08-12, Marcel Mueller <news.5...@spamgourmet.org> wrote:
Well, that makes restrictions to implementations, for sure.
I tested BTree implementation in Rust against my AVL tree .
Iteration through btree is order of magnitude faster then binary tree.
Insertion and deletion is about twice as fast in btree.
That is just because btree is cache friendly.

>
>
> Marcel

Mr Flibble

unread,
Aug 12, 2017, 1:33:56 PM8/12/17
to
Anyone who claims that std::set and std::map is no use to them is
obviously Doing It Wrong (TM). They are essential containers and should
be used when it is appropriate to use them. If you don't know when it
is appropriate to use them or you think that it is never appropriate to
use them then you are newb.

/Flibble

Paavo Helde

unread,
Aug 12, 2017, 5:10:58 PM8/12/17
to
On 12.08.2017 19:56, Stefan Ram wrote:
> Manfred <non...@invalid.add> writes:
>> This obviously depends on the usage of the container: for the general
>> case of random or sequential access, then vector wins.
>> When it comes to more complex operations like searching, then it is
>> understandable that other types of container work better.
>
> You are right. In my case, I used »random()« but I was not
> aware of RAND_MAX still being 32767 or so and therefore,
> the final section of the sorted_vector rarely got shifted.
>
> I now use a random-number generator that generates random
> numbers up to ::std::numerics_limits<unsigned long>::max()
> or so and get these results:
>
> filling ::sorted_vector 79'893,747.427
> retrieving ::sorted_vector 238,574.119
> filling ::std::unordered_set 360,994.757
> retrieving ::std::unordered_set 248,835.371
> filling ::std::set 741,265.933
> retrieving ::std::set 722,563.192
>
> . Here are the results with »::randomize( 0 );« added
> to find the same numbers again:
>
> filling ::sorted_vector 83'803,941.903
> retrieving ::sorted_vector 266,760.153
> filling ::std::unordered_set 362,649.427
> retrieving ::std::unordered_set 161,849.880
> filling ::std::set 754,459.535
> retrieving ::std::set 713,790.944
>
> . This means that when we actually find some numbers
> again, now ::std::unordered_set is fastest.

It appears that your definition of "sorted_vector" is useless (a poor
implementation of std::set). A proper "sorted_vector" is what you fill
by e.g. push_back() first, then sort once and keep (at least mostly)
constant afterwards. This should get rid of the apparent 100x overhead
in filling it and make the numbers more comparable.

This is not to say that it might beat std::unordered_set, the latter may
still outwin because it does not have to keep the data in a given order.

Cheers
Paavo

Marcel Mueller

unread,
Aug 13, 2017, 4:55:26 AM8/13/17
to
On 12.08.17 19.33, Mr Flibble wrote:
> Anyone who claims that std::set and std::map is no use to them is
> obviously Doing It Wrong (TM). They are essential containers and should
> be used when it is appropriate to use them.

OK, you are right. I just did not have any use case so far (approx. 20
years) where they are superior to a B-Tree.

When I used this containers it was only for the reason that a B-Tree was
not available and it was not worth the trouble to include a B-Tree
implementation.

If I had the choice whether a RBTree or a B-Tree is in the standard I
would clearly vote for the B-Tree. Unfortunately many container
libraries choose the other way (e.g. .NET as well).

> If you don't know when it
> is appropriate to use them or you think that it is never appropriate to
> use them then you are newb.

The only thing I can think of is when iterator invalidation on container
modifications is critical. At this point RB-Trees are a good choice. But
this requirement is infrequent.


Marcel
0 new messages