Space efficiency when using vectors.

Paul

unread,

Dec 5, 2015, 3:15:04 PM12/5/15

to

The quote below is from a book about algorithms in C++, and is discussing representations of graphs. I don't understand what he means by initializing a vector with small capacity. The nearest thing that I'm aware of is the reserve() method. However, that only works to request more capacity, not to limit capacity. I'd be grateful for any explanations. The ways that I'm aware of, for initializing a vector, don't involve the capacity.

Thanks, Paul

BEGIN QUOTE
There are several alternatives for maintaining the adjacency lists. First, observe that the lists themselves can be maintained in either vectors or lists. However, for sparse graphs, when using vectors, the programmer may need to initialize each vector with a smaller capacity than the default; otherwise, there could be significant wasted space.
END QUOTE

Nobody

unread,

Dec 5, 2015, 3:36:55 PM12/5/15

to

On Sat, 05 Dec 2015 12:14:38 -0800, Paul wrote:

Nothing in the standard allows you to initialise a vector with a specific
capacity. The closest is to initialise it with a given number of
elements, call shrink_to_fit() on it, then clear(). shrink_to_fit()
requires C++11, and is a non-binding request to reduce the capacity to
match the size.

It's possible that on some implementations, creating a vector with an
initial set of elements results in the capacity matching the size, even if
the default capacity is larger. But that's just an implementation detail.

Melzzzzz

unread,

Dec 5, 2015, 3:44:33 PM12/5/15

to

Well, on systems that overcommit memory, like Linux, this is of no
concern, since what is reserved and not touched is never actually
allocated. But then again, if memory is of concern one will use more
compact data structure.

Paul

unread,

Dec 5, 2015, 3:50:42 PM12/5/15

to

Would a list instead of a vector be an example of a "more compact data structure"?

Paul

Öö Tiib

unread,

Dec 5, 2015, 4:28:48 PM12/5/15

to

On Saturday, 5 December 2015 22:15:04 UTC+2, Paul wrote:
> The quote below is from a book about algorithms in C++, and is discussing representations of graphs. I don't understand what he means by initializing a vector with small capacity. The nearest thing that I'm aware of is the reserve() method. However, that only works to request more capacity, not to limit capacity. I'd be grateful for any explanations. The ways that I'm aware of, for initializing a vector, don't involve the capacity.

C++ standard does not specify default capacity for vectors. Most
implementations seem to use 0 as default capacity. That makes sense
(as cheapest). Not sure what your author means by it.

> BEGIN QUOTE
> There are several alternatives for maintaining the adjacency lists. First, observe that the lists themselves can be maintained in either vectors or lists. However, for sparse graphs, when using vectors, the programmer may need to initialize each vector with a smaller capacity than the default; otherwise, there could be significant wasted space.
> END QUOTE

Google indicates that the author of it "Mark Allen Weiss" has published
at least two quite identical books: "Data Structures and Algorithm
Analysis in Java" and "Data Structures and Algorithm Analysis in C++".
That makes it likely that both books are crap.

Ian Collins

unread,

Dec 5, 2015, 4:52:26 PM12/5/15

to

Please try and wrap your lines!

The answer to your question is the usual one when discussing data
structures: it depends!

If the data is sparse, there will be a point where a list is more
appropriate. What determines that point depends on the data and the
machine the code is running on. A sparse vector may work well with the
processor cache where an equivalent list may not. You really need to
measure with real data on the intended target to be sure.

--
Ian Collins

Norman Peelman

unread,

Dec 5, 2015, 6:01:09 PM12/5/15

to

Although, Thunderbird has wrapped appropriately above... your lines
came across just as long as his did.

Alf P. Steinbach

unread,

Dec 5, 2015, 6:44:17 PM12/5/15

to

First, sorry for TB's quoting. I used to fix it manually. Mozilla sucks.

***

This could be meaningful if "the default" refers to something other than
std::vector's own default.

Otherwise it would have to refer to some implementation-specific
functionality.

Cheers & hth.,

- Alf

Ian Collins

unread,

Dec 5, 2015, 6:51:01 PM12/5/15

to

Alf P. Steinbach wrote:
>
> First, sorry for TB's quoting. I used to fix it manually. Mozilla sucks.

Edit->Rewrap

--
Ian Collins

Alf P. Steinbach

unread,

Dec 5, 2015, 8:28:30 PM12/5/15

to

On 12/6/2015 12:50 AM, Ian Collins wrote:
> Alf P. Steinbach wrote:
>>
>> First, sorry for TB's quoting. I used to fix it manually. Mozilla sucks.
>
> Edit->Rewrap
>
>

Oh thanks! I hadn't noticed that. Must have been added in recent years.

Cheers,

- Alf

David Brown

unread,

Dec 6, 2015, 6:33:04 AM12/6/15

to

It has a shortcut ctrl-R, and has existed as long as I can remember
(I've used Thunderbird since version 1). Just be careful if there is
code listings in the post - they don't look good after a re-wrap!

Christian Gollwitzer

unread,

Dec 6, 2015, 8:30:03 AM12/6/15

to

Am 06.12.15 um 12:32 schrieb David Brown:

Yep. But you can also select a portion of the message and hit ^R, it'll
reformat just the selection.

Christian

David Brown

unread,

Dec 6, 2015, 4:27:30 PM12/6/15

to

I never thought of that! Thanks. (It is highly unusual for moans about
line wrapping to lead to any changes - but this time at least two people
have learned a new and useful trick.)

mark

unread,

Dec 6, 2015, 6:04:13 PM12/6/15

to

On 2015-12-05 21:14, Paul wrote:
> The quote below is from a book about algorithms in C++, and is
> discussing representations of graphs. I don't understand what he
> means by initializing a vector with small capacity. The nearest
> thing that I'm aware of is the reserve() method. However, that only
> works to request more capacity, not to limit capacity.

If you know the exact capacity beforehand, this does limit excess
capacity. Chances are pretty good that you will get the exact capacity
you want, if you call reserve() in the beginning.

If the internal vector resizing is triggered by push_back(), you usually
get a 50% or 100% capacity increase. So chances are good you end up with
a fair amount of wasted space.

StuartRedmann

unread,

Dec 7, 2015, 3:01:58 AM12/7/15

to

> On 06/12/15 14:29, Christian Gollwitzer wrote:
>> Yep. But you can also select a portion of the message and hit ^R, it'll
>> reformat just the selection.

On 12/6/15, David Brown wrote:
> I never thought of that! Thanks. (It is highly unusual for moans about
> line wrapping to lead to any changes - but this time at least two people
> have learned a new and useful trick.)

Make that three.

Regards,
Stuart

PS: Many people consider it probably anal fixation to fix word wrapping
in use net postings, but I can't help it. Thanks to your hint, I'll have
much less to do for future postings. Another thing that I usually do is
to untangle the quotation authors (like I did in this post). Usually it
is not hard to figure out who wrote what, but if a posting contains more
quotes from older replies it sometimes gets hard to tell which part has
been written by whom. In such cases I tend to repeat the author's names
in front of their parts of the posting. SCNR

Juha Nieminen

unread,

Dec 7, 2015, 4:37:32 AM12/7/15

to

Paul <peps...@gmail.com> wrote:
> The quote below is from a book
> about algorithms in C++, and is discussing representations of graphs.
> I don't understand what he means by initializing a vector with small
> capacity. The nearest thing that I'm aware of is the reserve()
> method. However, that only works to request more capacity, not to
> limit capacity. I'd be grateful for any explanations. The ways that
> I'm aware of, for initializing a vector, don't involve the capacity.

If you are looking for extreme space efficiency in such situations, you
may well have to create your own vector substitute.

If you know for certain that all the vectors will have a certain maximum
size, and this size is relatively small, just use std::array.

If the vast majority of the vectors are small, but you need to support
larger vectors as well, and you need extreme efficiency when dealing
with those small vectors, create your own vector implementation that
uses "short vector optimization" (similar to "short string optimization").
In other words, in your vector class theres some member array of
the size you deem appropriate, and whenever there's at most that many
elements in the vector, use it. If the amount of elements exceeds that
amount, use the array for the data needed to handle the elements as
a dynamic array (this can be directly done with a union.)

If the large vectors are very often copied/assigned, but seldom modified,
and restricting your vector to move semantics doesn't suffice, you could
in addition use copy-on-write semantics.

This becomes a bit complicated implementation-wise, but when well
done, it will make your program more efficient.

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

mark

unread,

Dec 7, 2015, 5:02:05 AM12/7/15

to

On 2015-12-07 10:37, Juha Nieminen wrote:
> If the vast majority of the vectors are small, but you need to support
> larger vectors as well, and you need extreme efficiency when dealing
> with those small vectors, create your own vector implementation that
> uses "short vector optimization" (similar to "short string optimization").
> In other words, in your vector class theres some member array of
> the size you deem appropriate, and whenever there's at most that many
> elements in the vector, use it. If the amount of elements exceeds that
> amount, use the array for the data needed to handle the elements as
> a dynamic array (this can be directly done with a union.)

Recent Boost versions have that:
<http://www.boost.org/doc/libs/master/doc/html/container/non_standard_containers.html#container.non_standard_containers.small_vector>