Sequence container capacity after calling clear()

Leigh Johnston

unread,

Mar 23, 2013, 6:27:46 PM3/23/13

to

Hi,

Can we please change the ISO C++ Standard so that explicitly states what
happens to a sequence container's capacity() after calling clear()?

Currently the behaviour is unspecified and I know of at least one
implementation that deallocates on vector<T>::clear().

If the behaviour remains unspecified then it is effectively impossible
to write portable code that uses clear() and you have to hope things
such as v.erase(v.begin(), v.end()) behave more consistently across
different implementations.

/Leigh

--
[ comp.std.c++ is moderated. To submit articles, try posting with your ]
[ newsreader. If that fails, use mailto:std-cpp...@vandevoorde.com ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]

Mathias Gaunard

unread,

Mar 25, 2013, 12:49:44 PM3/25/13

to

On Mar 23, 10:30 pm, Leigh Johnston <le...@i42.co.uk> wrote:
> Hi,
>
> Can we please change the ISO C++ Standard so that explicitly states what
> happens to a sequence container's capacity() after calling clear()?
>
> Currently the behaviour is unspecified and I know of at least one
> implementation that deallocates on vector<T>::clear().
>
> If the behaviour remains unspecified then it is effectively impossible
> to write portable code that uses clear() and you have to hope things
> such as v.erase(v.begin(), v.end()) behave more consistently across
> different implementations.

Use shrink_to_fit if you want to reduce capacity to 0 after a clear.

goran...@googlemail.com

unread,

Mar 25, 2013, 12:40:59 PM3/25/13

to

{ Reformatted; please limit your lines to 70 characters, and do not
insert blank lines in quoted sections -mod }

On Saturday, March 23, 2013 10:30:03 PM UTC+1, Leigh Johnston wrote:
> Hi,
>
> Can we please change the ISO C++ Standard so that explicitly states
> what happens to a sequence container's capacity() after calling
> clear()?
>
> Currently the behaviour is unspecified and I know of at least one
> implementation that deallocates on vector<T>::clear().
>
> If the behaviour remains unspecified then it is effectively
> impossible to write portable code that uses clear() and you have to
> hope things such as v.erase(v.begin(), v.end()) behave more
> consistently across different implementations.

Erm... I don't understand why is portability affected by what clear()
does behind the scenes.

I understand consistency (e.g. performance considerations),
though. But then, v = vector<TYPE>() does "the other" ;-) clear.

(Out of curiosity, which implementation deallocates?)

G.

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Leigh Johnston

unread,

Mar 25, 2013, 3:30:48 PM3/25/13

to

On 25/03/2013 16:40, goran...@googlemail.com wrote:
> { Reformatted; please limit your lines to 70 characters, and do not
> insert blank lines in quoted sections -mod }
>
> On Saturday, March 23, 2013 10:30:03 PM UTC+1, Leigh Johnston wrote:
>> Hi,
>>
>> Can we please change the ISO C++ Standard so that explicitly states
>> what happens to a sequence container's capacity() after calling
>> clear()?
>>
>> Currently the behaviour is unspecified and I know of at least one
>> implementation that deallocates on vector<T>::clear().
>>
>> If the behaviour remains unspecified then it is effectively
>> impossible to write portable code that uses clear() and you have to
>> hope things such as v.erase(v.begin(), v.end()) behave more
>> consistently across different implementations.
>
> Erm... I don't understand why is portability affected by what clear()
> does behind the scenes.

To see how it might affect portability consider how reallocations
invalidate iterators and element references.

>
> I understand consistency (e.g. performance considerations),
> though. But then, v = vector<TYPE>() does "the other" ;-) clear.
>
> (Out of curiosity, which implementation deallocates?)

QNX.

/Leigh

Tobias Müller

unread,

Mar 25, 2013, 4:45:42 PM3/25/13

to

Leigh Johnston <le...@i42.co.uk> wrote:
> On 25/03/2013 16:40, goran...@googlemail.com wrote:
>> Erm... I don't understand why is portability affected by what
>> clear() does behind the scenes.
>
> To see how it might affect portability consider how reallocations
> invalidate iterators and element references.

Iterators and references to elements of an _empty_ vector? What are
they good for?

Tobi

Öö Tiib

unread,

Mar 25, 2013, 4:41:09 PM3/25/13

to

On Monday, 25 March 2013 20:40:02 UTC+2, Leigh Johnston wrote:
> On 25/03/2013 16:40, goran...@googlemail.com wrote:
> > On Saturday, March 23, 2013 10:30:03 PM UTC+1, Leigh Johnston
> > wrote:
> >> Can we please change the ISO C++ Standard so that explicitly
> >> states what happens to a sequence container's capacity() after
> >> calling clear()?
> >>
> >> Currently the behaviour is unspecified and I know of at least one
> >> implementation that deallocates on vector<T>::clear().
> >>
> >> If the behaviour remains unspecified then it is effectively
> >> impossible to write portable code that uses clear() and you have
> >> to hope things such as v.erase(v.begin(), v.end()) behave more
> >> consistently across different implementations.
> >
> > Erm... I don't understand why is portability affected by what
> > clear() does behind the scenes.
>
> To see how it might affect portability consider how reallocations
> invalidate iterators and element references.

If that is important then you can use resize until that QNX fixes
their compiler. Unless they deallocate on 'v.resize(0)' too of course
... :-/

Leigh Johnston

unread,

Mar 25, 2013, 8:44:36 PM3/25/13

to

On 25/03/2013 20:45, Tobias Müller wrote:
>
> Leigh Johnston <le...@i42.co.uk> wrote:
>> On 25/03/2013 16:40, goran...@googlemail.com wrote:
>>> Erm... I don't understand why is portability affected by what
>>> clear() does behind the scenes.
>>
>> To see how it might affect portability consider how reallocations
>> invalidate iterators and element references.
>
> Iterators and references to elements of an _empty_ vector? What are
> they good for?

If you call push_back() multiple times on a vector element references
and iterators remain valid if capacity remains unchanged.

/Leigh

Tobias Müller

unread,

Mar 26, 2013, 6:38:19 AM3/26/13

to

Leigh Johnston <le...@i42.co.uk> wrote:

> On 25/03/2013 20:45, Tobias M�ller wrote:
>>
>> Leigh Johnston <le...@i42.co.uk> wrote:
>>> On 25/03/2013 16:40, goran...@googlemail.com wrote:
>>>> Erm... I don't understand why is portability affected by what
>>>> clear() does behind the scenes.
>>>
>>> To see how it might affect portability consider how reallocations
>>> invalidate iterators and element references.
>>
>> Iterators and references to elements of an _empty_ vector? What are
>> they good for?
>
> If you call push_back() multiple times on a vector element
> references and iterators remain valid if capacity remains unchanged.

Sure, but we are talking about clear(), not push_back(). After a call
to clear(), the vector is empty, that means no element exist. So all
references and iterators are invalid (dangling) anyway.

Tobi

goran...@googlemail.com

unread,

Mar 26, 2013, 6:49:41 AM3/26/13

to

{ Blank lines in quoted sections and quoted moderator comments
removed; please do so before submitting your article -mod }

On Monday, March 25, 2013 7:40:02 PM UTC+1, Leigh Johnston wrote:

> On 25/03/2013 16:40, go...@:::c...@googlemail.com wrote:
> > On Saturday, March 23, 2013 10:30:03 PM UTC+1, Leigh Johnston wrote:
> >> Hi,
> >>
> >> Can we please change the ISO C++ Standard so that explicitly states
> >> what happens to a sequence container's capacity() after calling
> >> clear()?
> >>
> >> Currently the behaviour is unspecified and I know of at least one
> >> implementation that deallocates on vector<T>::clear().
> >>
> >> If the behaviour remains unspecified then it is effectively
> >> impossible to write portable code that uses clear() and you have to
> >> hope things such as v.erase(v.begin(), v.end()) behave more
> >> consistently across different implementations.
> >
> > Erm... I don't understand why is portability affected by what
> > clear() does behind the scenes.
>
> To see how it might affect portability consider how reallocations
> invalidate iterators and element references.

That's what I thought one might think of, but (AFAIK)

* there is no guarantee whatsoever that an iterator that used to
"point" to an element, then stopped to point to it (because vector
was cleared) can become valid again when vector is filled back up.

* I can't see how holding references to destroyed objects can be a
good idea.

Now I think that you used undocumented behavior, that this behavior
generally leads to dubious practices (above), and that now you want a
way out without changing the design. I've done a fair amount of
relying on undocumented behavior (and will likely do it again in the
future), so I feel for you, but that doesn't make us right ;-).

Goran.

Leigh Johnston

unread,

Mar 26, 2013, 10:55:38 AM3/26/13

to

On 25/03/2013 16:49, Mathias Gaunard wrote:
>
> On Mar 23, 10:30 pm, Leigh Johnston <le...@i42.co.uk> wrote:
>> Hi,
>>
>> Can we please change the ISO C++ Standard so that explicitly states what
>> happens to a sequence container's capacity() after calling clear()?
>>
>> Currently the behaviour is unspecified and I know of at least one
>> implementation that deallocates on vector<T>::clear().
>>
>> If the behaviour remains unspecified then it is effectively impossible
>> to write portable code that uses clear() and you have to hope things
>> such as v.erase(v.begin(), v.end()) behave more consistently across
>> different implementations.
>
> Use shrink_to_fit if you want to reduce capacity to 0 after a clear.

You didn't grok my post properly. I want the opposite: I want the
capacity to be unchanged after calling clear().

/Leigh

Leigh Johnston

unread,

Mar 26, 2013, 10:40:45 PM3/26/13

to

On 26/03/2013 10:38, Tobias Müller wrote:
>
> Leigh Johnston <le...@i42.co.uk> wrote:
>> On 25/03/2013 20:45, Tobias M�ller wrote:
>>>
>>> Leigh Johnston <le...@i42.co.uk> wrote:
>>>> On 25/03/2013 16:40, goran...@googlemail.com wrote:
>>>>> Erm... I don't understand why is portability affected by what
>>>>> clear() does behind the scenes.
>>>>
>>>> To see how it might affect portability consider how reallocations
>>>> invalidate iterators and element references.
>>>
>>> Iterators and references to elements of an _empty_ vector? What are
>>> they good for?
>>
>> If you call push_back() multiple times on a vector element
>> references and iterators remain valid if capacity remains unchanged.
>
> Sure, but we are talking about clear(), not push_back(). After a call
> to clear(), the vector is empty, that means no element exist. So all
> references and iterators are invalid (dangling) anyway.

We are talking about calling push_back() *after* calling clear(). We
are talking about wanting a vector's capacity to be *unchanged* after
calling clear().

/Leigh

Leigh Johnston

unread,

Mar 26, 2013, 10:42:04 PM3/26/13

to

On 26/03/2013 10:49, goran...@googlemail.com wrote:
> { Blank lines in quoted sections and quoted moderator comments
> removed; please do so before submitting your article -mod }

{ please refer to the mod comment quoted above -mod }

You have totally got the wrong end of the stick; I am not relying on UB.
I am talking about iterators and references to elements added with,
for example, push_back() *after* calling clear(). I want clear() to
leave the capacity unchanged.

Consider:

v.reserve(2);
v.push_back(0);
/* do something with v */
v.clear(); // oops, implementation reset capacity to 0
v.push_back(1);
auto i = v.begin();
v.push_back(2);
// i might now be invalid as we may have reallocated

If the Standard made it explicit that clear() leaves a vector's capacity
unchanged then we would not have implementations that invalidated 'i' above.

/Leigh

Dave Harris

unread,

Mar 27, 2013, 7:55:11 AM3/27/13

to

In article <85745c86-b880-462e...@googlegroups.com>,

goran...@googlemail.com () wrote:
> On Monday, March 25, 2013 7:40:02 PM UTC+1, Leigh Johnston wrote:
> > To see how it might affect portability consider how reallocations
> > invalidate iterators and element references.
>
> That's what I thought one might think of, but (AFAIK)
>
> * there is no guarantee whatsoever that an iterator that used to
> "point" to an element, then stopped to point to it (because
> vector was cleared) can become valid again when vector is filled
> back up.
>
> * I can't see how holding references to destroyed objects can be a
> good idea.

That's not what Leigh means. Instead consider:
{
std::vector<int> v( 10 );
assert( v.size() == 10 && v.capacity() == 10 );
v.clear();
assert( v.size() == 0 && v.capacity() == 10 );
v.push_back( 1 );
std::vector<int>::iterator i = v.begin();
v.push_back( 2 );
cout << *i;
}

If clear() does not change capacity and both asserts are true, then
the iterator will still be valid after the final push_back(). If not,
not. For reasonable code, it matters.

-- Dave Harris, Nottingham, UK.

Seungbeom Kim

unread,

Mar 27, 2013, 7:50:27 AM3/27/13

to

On 2013-03-25 17:44, Leigh Johnston wrote:

> On 25/03/2013 20:45, Tobias M�ller wrote:
>>
>> Leigh Johnston <le...@i42.co.uk> wrote:
>>>
>>> To see how it might affect portability consider how reallocations
>>> invalidate iterators and element references.
>>
>> Iterators and references to elements of an _empty_ vector? What are
>> they good for?
>
> If you call push_back() multiple times on a vector element
> references and iterators remain valid if capacity remains unchanged.

You can do something like this:

size_type cap = vector.capacity();
vector.clear();
vector.reserve(cap);

// or when you know how many elements you'll have:
vector.clear();
vector.reserve(expected_number_of_elements);

After the call to clear(), the vector is essentially the same as one
constructed anew. So, just as you would call reserve() before a series
of calls to push_back() on a vector constructed anew, you can do the
same on a vector just clear()ed.

You may be concerned this involves a deallocation and an allocation
even when not necessary, but this has to happen many times in a tight
loop to actually affect the performance, which I don't see very
common. And it is likely that the deallocated memory is not even
returned to the OS immediately but kept in the user process just in
case it is requested again soon, which the method above can take
advantage of.

--
Seungbeom Kim

Bart van Ingen Schenau

unread,

Mar 27, 2013, 8:08:37 AM3/27/13

to

On Tue, 26 Mar 2013 19:42:04 -0700, Leigh Johnston wrote:

> If the Standard made it explicit that clear() leaves a vector's
> capacity unchanged then we would not have implementations that
> invalidated 'i' above.

That is not true. Even if the standard is made more explicit on how
clear can affect the capacity, that does not mean there can't be any
buggy implementations that get it wrong.

As ruled by the committee in their response to LWG issue 1102 (http://
cplusplus.github.com/LWG/lwg-closed.html#1102), the standard already
requires the behaviour you want, although the wording might not be as
convenient as you would like.

>
> /Leigh

Bart v Ingen Schenau

Daniel Krügler

unread,

Mar 27, 2013, 8:03:24 AM3/27/13

to

On 2013-03-27 03:42, Leigh Johnston wrote:
[..]

> I am talking about iterators and references to elements added with,
> for example, push_back() *after* calling clear(). I want clear() to
> leave the capacity unchanged.

[..]

> Consider:
>
> v.reserve(2);
> v.push_back(0);
> /* do something with v */
> v.clear(); // oops, implementation reset capacity to 0
> v.push_back(1);
> auto i = v.begin();
> v.push_back(2);
> // i might now be invalid as we may have reallocated
>
> If the Standard made it explicit that clear() leaves a vector's
> capacity unchanged then we would not have implementations that
> invalidated 'i' above.

Actually I had already responded to this thread, but somehow my reply
did not occur in this group (only on the comp.std.c++ group). The
problem that you are describing was already cause for a previous
library issue:

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#1102

Even NAD issues do have some value to document an existing discussion
of that matter. Personally I agree with you that the wording is less
than clear and should be improved. But for reopening this or for
opening a new issue, there needs to be further evidence that this is
still a problem as of today's compilers. Can you give some pointer for
a library implementation that still changes the capacity during
clear()? I know that previous Visual Studio Compiler libraries did so,
but I thought that this had been fixed in latter version. If you are
aware of newer Library implementations that show these effects that
would be a valuable hint.

There is one newer argument in my mind that could (additionally) be
used to justify a new discussion of that theme: With the acceptance of

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#704

clear() (in the sequence container specification) is no longer defined
in terms of erase(), so one could argue that we have lost further
information about the semantics of that function. This is possibly
only weak evidence, so I would prefer to collect further information
on that topic before reattempting to submit a library issue or to
reopen it.

Feel free to contact my privately in that matter.

HTH & Greetings from Bremen,

Daniel Kr�gler

Andy Champ

unread,

Mar 27, 2013, 8:11:15 AM3/27/13

to

On 27/03/2013 02:42, Leigh Johnston wrote:
> You have totally got the wrong end of the stick; I am not relying on
> UB. I am talking about iterators and references to elements added
> with, for example, push_back() *after* calling clear(). I want
> clear() to leave the capacity unchanged.
>
> Consider:
>
> v.reserve(2);
> v.push_back(0);
> /* do something with v */
> v.clear(); // oops, implementation reset capacity to 0
> v.push_back(1);
> auto i = v.begin();
> v.push_back(2);
> // i might now be invalid as we may have reallocated
>
> If the Standard made it explicit that clear() leaves a vector's
> capacity unchanged then we would not have implementations that
> invalidated 'i' above.
>
> /Leigh

I've just been looking through the standard (well, the last
pre-release draft) and I think it's not at all clear.

When you do a pop_back on a vector it cannot reallocate. If it did it
would invalidate iterators to items that are still there. The same for
erase - iterators to earlier items should still be valid (23.2.1 p 11)
although those after the erase won't be (23.3.6.5 p 3).

In string, but not in the containers, clear is defined as erase(begin,
end). There must be logic for this being left out - mustn't there?

It's interesting that assign is defined in terms of erase(begin, end)
rather than clear.

It's also interesting that while shrink_to_fit may reduce capacity
there is no statement that it may invalidate iterators to earlier
items. This will severely limit its use - the system cannot perform
any movement of objects to reduce heap fragmentation. I can imagine
someone allocating a vector which is big enough for any use, putting
some elements in it, then performing shrink_to_fit - and finding that
afterwards it has fragmented the heap horribly.

I'm kind of open on your suggestion. But I think it ought to be
explicit - when the size of a vector is reduced to zero, either by
pop_back, resize, or clear, this is a special case and it should state
what state the memory is in - even if this is to explicitly state that
it may or may not have been released.

I also think that shrink_to_fit ought to invalidate all the iterators
inside the collection, and have all the other riders that are present
for when reallocation happens.

Andy

Johannes Sixt

unread,

Mar 27, 2013, 10:55:46 AM3/27/13

to

On 23 Mrz., 22:30, Leigh Johnston <le...@i42.co.uk> wrote:
> Can we please change the ISO C++ Standard so that explicitly states
> what happens to a sequence container's capacity() after calling
> clear()?
>
> Currently the behaviour is unspecified and I know of at least one
> implementation that deallocates on vector<T>::clear().
>
> If the behaviour remains unspecified then it is effectively
> impossible to write portable code that uses clear() and you have to
> hope things such as v.erase(v.begin(), v.end()) behave more
> consistently across different implementations.

Call reserve(), then the implementation doesn't have a lot of freedom.
>From the specification of reserve():

It is guaranteed that no reallocation takes place during insertions
that happen after a call to reserve() until the time when an
insertion would make the size of the vector greater than the value
of capacity().

I take this to imply that after a call to reserve(), only
shrink_to_fit() and swap() are allowed to shrink capacity().

-- Hannes

alan_mc...@this.is.invalid

unread,

Mar 27, 2013, 11:01:34 AM3/27/13

to

On Wednesday, March 27, 2013 7:00:08 AM UTC-4, Seungbeom Kim wrote:

> You may be concerned this involves a deallocation and an allocation
> even when not necessary, but this has to happen many times in a
> tight loop to actually affect the performance, which I don't see
> very common.

You may not see it [the cost of allocation/deallocation] very often,
but it's pretty common to have to worry about it in the work I do.

We avoid using new and delete in the data processing phase of our
codes (it's OK in initialization) because when we do, our profiler
results show them being major CPU users. We avoid std::string for the
same reason. For that matter, we discovered that deleting a null
pointer had a pretty significant cost.

In fact, we use fixed-length arrays instead of std::vector if we know
the maximum length in advance, again, for performance reasons.[*] When
an extra 30% of performance can save you having to buy hardware that
costs a programmer's salary for a year, you care about all the stuff
that C++ gurus say doesn't matter any more.

[*] I really wish C++ would incorporate C99 variable-length arrays.
They would have _really_ simplified some sections of my code.

--

James K. Lowden

unread,

Mar 27, 2013, 5:54:25 PM3/27/13

to

On Wed, 27 Mar 2013 09:01:34 CST
alan_mc...@this.is.invalid wrote:

> In fact, we use fixed-length arrays instead of std::vector if we know
> the maximum length in advance, again, for performance reasons.[*] When
> an extra 30% of performance can save you having to buy hardware that
> costs a programmer's salary for a year, you care about all the stuff
> that C++ gurus say doesn't matter any more.

You've studied this, and you're in the very space where C++ is supposed
to shine: no more than 10% overhead above C. I'd like to know more.

Are you saying vector::operator[] takes 30% more time than a bare
pointer? Or that your program's *overall* performance is 30% faster
if you use C arrays instead of std::vector? Or something else?

I'm also curious about the economics. Even if I take your efficiency
figures at face value, I have to believe the problem you're addressing
yourself to grows over time (although, I hope, slower than Moore's
Law). If so, isn't pushing on the efficiency pedal a losing battle?
By squeezing 30%, you don't *avoid* buying the next machine; at best you
just defer it. By how long, do you reckon?

It takes some doing to spend $50,000 on a server these days. I'm
having trouble imagining of machine that costs "a programmer's salary",
especially because the programmer's salary is being spent avoiding the
hardware cost. Twice: once for efficiency, and again in lost
convenience and simplicity.

I'm not doubting you. I just live in a world where machines are cheap
and people are not. Except in very limited circumstance, merely using
C++ with any halfway decent data structures and algorithms is fast, and
speed beyond that is a function of hardware and problem size, plus
maybe a co-processor. To me, your message seems to have arrived from
1965. So I'm curious.

--jkl

Andy Champ

unread,

Mar 27, 2013, 5:52:43 PM3/27/13

to

On 27/03/2013 15:01, alan_mc...@this.is.invalid wrote:
> [*] I really wish C++ would incorporate C99 variable-length arrays.

> They would have_really_ simplified some sections of my code.

std::array won't help? Or doesn't your compiler support it?

Andy

Leigh Johnston

unread,

Mar 27, 2013, 5:51:56 PM3/27/13

to

The version of QNX I am using is the implementation that has a capacity
resetting vector<T>::clear().

/Leigh

alan_mc...@this.is.invalid

unread,

Mar 28, 2013, 6:16:24 PM3/28/13

to

On Wednesday, March 27, 2013 5:00:05 PM UTC-4, Andy Champ wrote:

> On 27/03/2013 15:01, alan_mckenney1 wrote:
>
> > [*] I really wish C++ would incorporate C99 variable-length arrays.
>
> > They would have_really_ simplified some sections of my code.
>
> std::array won't help? Or doesn't your compiler support it?

It doesn't seem to (no man page.)

But anyway, based on what documentation I've found on the web, the size of
a std::array is determined at compile time. I need it to be determined
at run time, but be allocated on the stack.

alan_mc...@this.is.invalid

unread,

Mar 28, 2013, 6:17:29 PM3/28/13

to

On Wednesday, March 27, 2013 5:00:02 PM UTC-4, James K. Lowden wrote:

> Are you saying vector::operator[] takes 30% more time than a bare
> pointer? Or that your program's *overall* performance is 30% faster
> if you use C arrays instead of std::vector? Or something else?

Something else. I'm saying that, for us, even 30% improvement is
worth switching to more cumbersome coding styles, if necessary.

Minimizing the frequency of heap allocation and deallocation can
gain us at least 30%, so I won't use std::vector as a local variable
in the processing phase, because that would mean that each function
call results in an allocation and deallocation.

By contrast, a variable-length array is allocated on the stack (no
heap overhead) and doesn't require us to guess what the maximum
size is that we'll need.

If std::vector::clear() results in a deallocation, we won't use it
in the processing phase, either.

> If so, isn't pushing on the efficiency pedal a losing battle?
> By squeezing 30%, you don't *avoid* buying the next machine; at best you
> just defer it. By how long, do you reckon?

By that logic, there's no point to going to the doctor when you're sick,
either, since you're only putting off the inevitable.

Leigh Johnston

unread,

Mar 28, 2013, 8:14:19 PM3/28/13

to

On 28/03/2013 22:16, alan_mc...@this.is.invalid wrote:
> On Wednesday, March 27, 2013 5:00:05 PM UTC-4, Andy Champ wrote:
>> On 27/03/2013 15:01, alan_mckenney1 wrote:
>>
>>> [*] I really wish C++ would incorporate C99 variable-length arrays.
>>
>>> They would have_really_ simplified some sections of my code.
>>
>> std::array won't help? Or doesn't your compiler support it?
>
> It doesn't seem to (no man page.)
>
> But anyway, based on what documentation I've found on the web, the size of
> a std::array is determined at compile time. I need it to be determined
> at run time, but be allocated on the stack.

Not sure if it quite fits your needs but you could try my "vecarray"
container: you specify a maximum size as a template parameter (compile
time) but actual number of elements can vary.

http://i42.co.uk/stuff/vecarray.htm

/Leigh

A. McKenney

unread,

Mar 29, 2013, 4:29:16 PM3/29/13

to

Leigh Johnston wrote:
> On 28/03/2013 22:16, alan_mckenney1 wrote:
> > ... the size of

> > a std::array is determined at compile time. I need it to be determined
> > at run time, but be allocated on the stack.
>
> Not sure if it quite fits your needs but you could try my "vecarray"
> container: you specify a maximum size as a template parameter (compile
> time) but actual number of elements can vary.

Nope.

If we knew the maximum size in advance (=at development time), the
problem would be trivial. In many cases, normal C arrays would be
fine.

What we're trying to get away from is having to guess at a reasonable
(less then 2Gb :-) ) upper bound on the size and hoping we don't ever
have to
deal with data longer than that. And from the performance hit of
heap storage for function-local data.

James K. Lowden

unread,

Mar 29, 2013, 4:30:52 PM3/29/13

to

On Thu, 28 Mar 2013 15:17:29 -0700 (PDT)
alan_mc...@this.is.invalid wrote:

> > Are you saying vector::operator[] takes 30% more time than a bare
> > pointer? Or that your program's *overall* performance is 30% faster
> > if you use C arrays instead of std::vector? Or something else?
>

> Minimizing the frequency of heap allocation and deallocation can
> gain us at least 30%, so I won't use std::vector as a local
> variable in the processing phase, because that would mean that each
> function call results in an allocation and deallocation.

I see. That's an astonishing figure. You really should publish
what you've learned.

> > If so, isn't pushing on the efficiency pedal a losing battle?
> > By squeezing 30%, you don't *avoid* buying the next machine; at
> > best you just defer it. By how long, do you reckon?
>
> By that logic, there's no point to going to the doctor when you're
> sick, either, since you're only putting off the inevitable.

Time in incompressible and money is fungible. You can spend your time
on efficiency gains or functionality, seldom both at once. You're
asserting that the loss in productivity is economically justified. All
I'm asking is for you to describe that environment, if you would,
because I've never heard of anything like it, except in the embedded
space.

--jkl

Andy Champ

unread,

Mar 30, 2013, 2:49:14 AM3/30/13

to

On 29/03/2013 20:29, A. McKenney wrote:
> If we knew the maximum size in advance (=at development time), the
> problem would be trivial. In many cases, normal C arrays would be
> fine.
>
> What we're trying to get away from is having to guess at a reasonable
> (less then 2Gb:-) ) upper bound on the size and hoping we don't ever
> have to
> deal with data longer than that. And from the performance hit of
> heap storage for function-local data.

A _lot_ less than 2Gb. That 2Gb has to have your code, heap and stack -
if you are on a 32-bit machine. (on 64-bit, all bets are off)

Can't you allocate the array once (as a std::vector) then pass
references to it around the place?

Andy

Francis Glassborow

unread,

Mar 30, 2013, 6:30:10 AM3/30/13

to

On 29/03/2013 20:30, James K. Lowden wrote:
> On Thu, 28 Mar 2013 15:17:29 -0700 (PDT)
> alan_mc...@this.is.invalid wrote:
>
>>> Are you saying vector::operator[] takes 30% more time than a bare
>>> pointer? Or that your program's *overall* performance is 30%
>>> faster if you use C arrays instead of std::vector? Or something
>>> else?
>>
>> Minimizing the frequency of heap allocation and deallocation
>> can gain us at least 30%, so I won't use std::vector as a local
>> variable in the processing phase, because that would mean that
>> each function call results in an allocation and deallocation.
>
> I see. That's an astonishing figure. You really should publish
> what you've learned.
>

Indeed it is. I suspect that the real problem is somewhere else. I
wonder how that figure was measured. It is no good just running a tst
program that does lots of heap allocation/deallocation because real
programs do other things.

Whilst I guess you cannot show the real code would you post:

1) The test programs and methodology you used to come up with that
claim.
2) The general nature of the software being produced.

I am reminded of a something I read a long time ago (back in the 90s)
on optimisation in C++ where the author pointed out that a lazy coding
style with std::string could badly impact on performance. Where
performance mattered it was important to avoid unnecessary copying of
strings (passing by value being just one of the causes of performance
degradation).

Perhaps your code suffers in a similar way (important to pass
std::vector by reference or const reference and only spin off a copy
when actually mutating the original in a context where the original
needs to be maintained.)

Heap allocation/deallocation used to be a serious time consumer, but
AFAIK that problem was largely overcome more than a decade ago.

Francis

A. McKenney

unread,

Apr 1, 2013, 1:25:27 AM4/1/13

to

Francis Glassborow wrote:

> Whilst I guess you cannot show the real code would you post:
> 1) The test programs and methodology you used to come up with that
> claim.
> 2) The general nature of the software being produced.

I do not work for an academic institution, so I hope you will
understand that I may feel the need to be circumspect with details,
and that my employer might not see any benefit to my publishing
anything. Their only interest is in not needing to buy more hardware,
network capacity, etc.

I think I can describe the general methodology, since IMHO it's pretty
obvious: Basically, we use recordings of actual data activity and
replay them for testing. We do profiling, make changes, and
ultimately we have to prove that we've actually reduced CPU time and
improved throughput by performance testing, again with recordings of
actual production data activity. Reducing the frequency of heap
allocation/deallocation and the use of std::string (other than
read-only access) tends to be the low-lying fruit.

I'm not interested in making any general claim about costs. I simply
wanted to indicate that the claim that allocation/deallocation costs
can be ignored is not true in all applications.

David Lowndes

unread,

Apr 1, 2013, 4:04:47 AM4/1/13

to

>Heap allocation/deallocation used to be a serious time consumer, but
>AFAIK that problem was largely overcome more than a decade ago.

It was?
Do you have a reference to what you're alluding to there?

Dave

Dave Harris

unread,

Apr 1, 2013, 3:04:59 PM4/1/13

to

In article <f0e3640c-e535-471f...@googlegroups.com>,

alan_mc...@this.is.invalid () wrote:
> For that matter, we discovered that deleting a null
> pointer had a pretty significant cost.

Really? Was:
delete p;

significantly worse than:
if (p)
delete p;

when p was null? Or is any test too slow? This is so surprising to
me that I have to wonder whether something is wrong somewhere.

I was going to ask whether you'd tried using a custom allocator.
If you can be sure of the last-in first-out allocation patterns
implied by your stack-based vectors, then the implementation could
be quite simple and have no loops and few if-statements. However,
if even a single if-statement is too slow, there's not much that
can be done.

-- Dave Harris, Nottingham, UK.

Francis Glassborow

unread,

Apr 1, 2013, 3:09:41 PM4/1/13

to

On 01/04/2013 06:25, A. McKenney wrote:
> Francis Glassborow wrote:
>
>> Whilst I guess you cannot show the real code would you post:
>> 1) The test programs and methodology you used to come up with that
>> claim.
>> 2) The general nature of the software being produced.
>
> I do not work for an academic institution, so I hope you will
> understand that I may feel the need to be circumspect with details,
> and that my employer might not see any benefit to my publishing
> anything. Their only interest is in not needing to buy more hardware,
> network capacity, etc.

Yes, I realised that there could well be a proprietary concern but if
you are unable to even describe the nature of the software it becomes
impossible to ascertain the validity of your assertions about performance.

>
> I think I can describe the general methodology, since IMHO it's pretty
> obvious: Basically, we use recordings of actual data activity and
> replay them for testing. We do profiling, make changes, and
> ultimately we have to prove that we've actually reduced CPU time and
> improved throughput by performance testing, again with recordings of
> actual production data activity. Reducing the frequency of heap
> allocation/deallocation and the use of std::string (other than
> read-only access) tends to be the low-lying fruit.

Yes that demonstrates that what you did improved the performance, but
what it does not do is show that what you did was the most appropriate
way to improve performance.

For example, there is no fundamental problem with using std::string (and
the optimisation for short strings greatly reduces the costs). However a
lazy programming style can result in unnoticed excessive
construction/destruction of strings.

There are a number of techniques for reducing the cost of heap
allocation which include ensuring that the program has a sufficiently
large default heap.

There are many ways of reducing heap allocation/deallocation. I wonder
what mechanism you have in place to protect against stack overflow (a
very real risk if you are dealing with large arrays)

>
> I'm not interested in making any general claim about costs. I simply
> wanted to indicate that the claim that allocation/deallocation costs
> can be ignored is not true in all applications.
>
>

I do not think that that was asserted. However you stated:

<quote>

In fact, we use fixed-length arrays instead of std::vector if we know
the maximum length in advance, again, for performance reasons.[*] When
an extra 30% of performance can save you having to buy hardware that
costs a programmer's salary for a year, you care about all the stuff
that C++ gurus say doesn't matter any more.

</quote>

Which sort of implies that using fixed length arrays was at least 30%
better than using a std::vector. Actually most experienced programmers
would use a C-style array if the length is known in advance and
performance is a critical issue. However that choice is not cost free
and does add to the requirements for maintenance (the programmer doing
that has to be familiar with C-style arrays and all the consequential
gotchas)

However all this is getting a long way from the origianl topic of this
thread.

Francis

Dave Harris

unread,

Apr 1, 2013, 3:06:34 PM4/1/13

to

In article <20130329101530.c...@speakeasy.net>,

jklo...@speakeasy.net (James K. Lowden) wrote:
> Time in incompressible and money is fungible. You can spend your
> time on efficiency gains or functionality, seldom both at once.
> You're asserting that the loss in productivity is economically
> justified. All I'm asking is for you to describe that environment,
> if you would, because I've never heard of anything like it, except
> in the embedded space.

Nowadays efficiency can also be important for cloud-based computing,
where you pay for the cycles you use, and for mobile devices, where
it affects battery life.

I don't know much about either, myself. I gather some people have big
problems, which they solve by renting compute servers from the likes
of Amazon. Some of these systems are huge. Think about the processing
that Amazon themselves must do, for example. Or think about running an
online game with half a million users.

-- Dave Harris, Nottingham, UK.

Öö Tiib

unread,

Apr 1, 2013, 2:14:08 PM4/1/13

to

On Monday, 1 April 2013 11:04:47 UTC+3, David Lowndes wrote:
> >Heap allocation/deallocation used to be a serious time consumer, but
> >AFAIK that problem was largely overcome more than a decade ago.
>
> It was?
> Do you have a reference to what you're alluding to there?

Allocations and deallocations are way cheaper than they were old
times. Dynamic memory still is not cost-free and there are still no
reasons to be wasteful. It is easy to measure what costs what. Million of
small allocations & deallocations will take like half a second on
modern hardware. For sane cases that is plenty but not for naive newbie
cases like for passing dynamic containers by value or for creating
temporary dynamic containers in tight loops.

For example ... just add two strings up to compare with third string, do
it millions of times and program hangs indeed. Such buggy algorithms
should be fixed instead of complaining that allocations are slow.

Francis Glassborow

unread,

Apr 1, 2013, 3:11:49 PM4/1/13

to

On 01/04/2013 09:04, David Lowndes wrote:
>> Heap allocation/deallocation used to be a serious time consumer, but
>> AFAIK that problem was largely overcome more than a decade ago.
>
> It was?
> Do you have a reference to what you're alluding to there?
>
> Dave
>

It is my understanding that heap managers and optimisers have become
very much more efficient than they were in the 80s and 90s. Of course it
is very hard to benchmark these because they often work best in exactly
the kind of code used for benchmarking :(

Francis

Robert Wessel

unread,

Apr 1, 2013, 11:43:43 PM4/1/13

to

On Mon, 1 Apr 2013 13:04:59 CST, bran...@googlemail.com (Dave
Harris) wrote:

>In article <f0e3640c-e535-471f...@googlegroups.com>,
>alan_mc...@this.is.invalid () wrote:
>> For that matter, we discovered that deleting a null pointer had a
>> pretty significant cost.
>
>Really? Was:
> delete p;
>
>significantly worse than:
> if (p)
> delete p;
>
>when p was null? Or is any test too slow? This is so surprising to me
>that I have to wonder whether something is wrong somewhere.

FWIW, with MSVC, the path length for a delete of a null void pointer
is 14 instructions, including a conditional branch that probably would
usually mispredict in that case (how often is a null pointer actually
free'd or deleted?). I could see that the second case might be
significantly faster if null pointers were actually common, and
particularly if nulls were common enough to get the branch to predict
correctly for that case.

If I were writing the library, I might be tempted to use some very
slow method for dealing with the free of a null pointer, unless I
could get some evidence that such was actually common enough to
matter, so long as I could speed up the common case a bit. For
example, if you had the traditional memory heap where a free basically
involves linking the space from the free'd block into the prior and
next blocks as appropriate, usually via some fields in the block's
header, you could eliminate any check of the input being a null
pointer so long as you ensured that the memory containing the header
was always unreadable if the input was a null pointer, thus triggering
an exception. For example, the OS might guarantee that the first and
last page in the address space are unreadable. Then you'd patch up
the null free in the (hardware) exception handler. That would likely
be on the order of 100 times slower than actually comparing the
pointer to zero, but would cost nothing if you had a real pointer.

David Lowndes

unread,

Apr 2, 2013, 8:18:30 AM4/2/13

to

>Allocations and deallocations are way cheaper than they were old
>times.

Why, what's magically changed?

Dave

David Lowndes

unread,

Apr 2, 2013, 8:19:15 AM4/2/13

to

>It is my understanding that heap managers and optimisers have become
>very much more efficient than they were in the 80s and 90s.

Have you got any references to qualify that?

There are some optimisations that are done for particular use cases,
but as far as I'm aware, no order of magnitude quantum leap for
general usage.

I believe that memory allocation for .NET & Java run-times is much
leaner - though you then get massive memory usage and unpredictable
performance hits when the compaction is done.

Dave

James K. Lowden

unread,

Apr 2, 2013, 1:39:38 PM4/2/13

to

On Mon, 1 Apr 2013 20:43:43 -0700 (PDT)
Robert Wessel <robert...@this.is.invalid> wrote:

> If I were writing the library, I might be tempted to use some very
> slow method for dealing with the free of a null pointer, unless I
> could get some evidence that such was actually common enough to
> matter, so long as I could speed up the common case a bit.

Hmm. Comparison to a literal constant (i.e. to the representation of a
null pointer) is extremely cheap. It adds very little to a bona fide
delete, and avoids operating on an invalid pointer with an attendant
special exception-handling branch. ISTM the "deal with simple cases
first" policy applies as well here as anywhere else.

> FWIW, with MSVC, the path length for a delete of a null void pointer
> is 14 instructions

That's acceptable, right? Stop me where I go wrong. Allowing 10 cycles
per instruction, that's 140 cycles at approximately 3,800,000,000
cycle/second, or 27,142,857 null pointer deletions per second. Each
such delete requires 0.000000036 seconds.

I'm having difficulty imagining an algorithm that needs to tickle the
heap at a rate of millions per second, where the difference between

0.000000036
and
0.0000000036

for a delete is both significant and unavoidable.

IOW, I can well imagine that efficient heap operations materially
affect the performance of a wide variety of applications. I cannot
imagine any situation for which 14 instructions to delete a null pointer
is even measurable.

--jkl

Dave Harris

unread,

Apr 2, 2013, 5:05:22 PM4/2/13

to

In article <kjfll8ddhttbf64o0...@4ax.com>,

Dav...@example.invalid (David Lowndes) wrote:
> >Allocations and deallocations are way cheaper than they were old
> >times.
>
> Why, what's magically changed?

Better algorithms. In the olden days, there would be a single heap
used for blocks of all sizes. Allocation code would have to search
for a free block of a suitable size, and potentially split it in
two, and deallocation code would potentially have to coalesce
adjacent free blocks.

I believe modern allocators keep, in effect, a free list for each
size of block. Allocation code is like:

void *alloc( size_t sz ) {
size_t index = sz / 8;
if (index < 256)
if (Block *pBlock = freeList[index]) {
freeList[index]->next = pBlock->next;
pBlock->next = (Block *)index;
return pBlock->memory;
}
// ...
}

void free( void *memory ) {
if (!memory)
return;
Block *pBlock = (Block *)((char *)memory - sizeof(Block *));
size_t index = (size_t) pBlock->next;
if (index < 256) {
pBlock->next = freeList[index];
freeList[index] = pBlock;
return;
}
// ...
}

So in the common case, there is no looping at all, and not many
if-statements.

-- Dave Harris, Nottingham, UK.

David Lowndes

unread,

Apr 3, 2013, 6:05:45 AM4/3/13

to

>Better algorithms.

OK, fair enough, newer heap allocators may be faster than old basic
ones, but they're still not trivial operations, and presumably will
still have to employ locking strategies for multi-threaded
environments?

i.e. In some situations they may be quite a lot better than less
optimal old implementations used to be - but you should always
consider them as having a notable performance impact?

Dave

Öö Tiib

unread,

Apr 3, 2013, 1:28:05 PM4/3/13

to

On Wednesday, 3 April 2013 13:05:45 UTC+3, David Lowndes wrote:
> >Better algorithms.
>
> OK, fair enough, newer heap allocators may be faster than old basic
> ones, but they're still not trivial operations, and presumably will
> still have to employ locking strategies for multi-threaded
> environments?

Lock-less concurrent memory allocators, thread-local fast lists etc.

> i.e. In some situations they may be quite a lot better than less
> optimal old implementations used to be - but you should always
> consider them as having a notable performance impact?

Viewing dynamic memory management as always notable performance issue
is wrong. It has been worked on by so lot of people and symposiums
dedicated to memory management are regularly held and it has been
fruitful, even open source memory managers work very well.

Doing anything that is unneeded has some cost unless compiler
can optimize it out. Even unneeded division of two ints can affect
overall performance observably on such cases.

I am not certain if needless memory management may be optimized out,
but I suspect that no. At least I haven't seen any C++ compilers
attempting to optimize that:

int main(){delete new char;}

Everywhere it calls operator new and operator delete on whatever
optimization level despite to human eye it looks pointless to call
those. That means that dynamic memory management operations are
somehow considered externally observable behavior and so doing it
needlessly is doubly bad.

Francis Glassborow

unread,

Apr 4, 2013, 12:02:51 AM4/4/13

to

On 03/04/2013 18:28, 嘱 Tiib wrote:

> I am not certain if needless memory management may be optimized out,
> but I suspect that no. At least I haven't seen any C++ compilers
> attempting to optimize that:
>
> int main(){delete new char;}
>
> Everywhere it calls operator new and operator delete on whatever
> optimization level despite to human eye it looks pointless to call
> those. That means that dynamic memory management operations are
> somehow considered externally observable behavior and so doing it
> needlessly is doubly bad.
>
>

It is quite hard to optimise these calls away because the new expression
involves a call to operator new which may have been replaced by one that
has some extra (observable) functionality. The same is true for the
delete expression.

Francis

Thomas Richter

unread,

Apr 4, 2013, 7:01:24 PM4/4/13

to

On 27.03.2013 22:54, James K. Lowden wrote:

> Are you saying vector::operator[] takes 30% more time than a bare
> pointer? Or that your program's *overall* performance is 30% faster
> if you use C arrays instead of std::vector? Or something else?

>From my personal experience: I once tried to replace raw C++ arrays
(and pointers to such) with the std::valarray, hoping that the guarantee
that valarray does not impose aliasing would have some impact on the
speed of the compiled program. It did not. In fact, std::valarray was
slower. I don't remember slower by how much, but at least so much that I
no longer consider using it.

The algorithms I'm using are digital signal processing (actually,
wavelet filters for image compression), and they better have to be fast.
All the fancy C++ mechanisms do not help here, their overhead is just
not acceptable.

> I'm also curious about the economics. Even if I take your efficiency
> figures at face value, I have to believe the problem you're addressing
> yourself to grows over time (although, I hope, slower than Moore's
> Law). If so, isn't pushing on the efficiency pedal a losing battle?
> By squeezing 30%, you don't *avoid* buying the next machine; at best you
> just defer it. By how long, do you reckon?

In my world, 30% slower would be utterly unacceptable. The point is not
whether the 30% slower could be compensated by buying a new machine next
year or not. The point is that 30% slower than the competitor means
loosing a lot of customers. We're selling algorithms and implementations
by their speed, not so much by the cleanness of their implementation -
and 30% would have a significant impact on sales.

Thus, STL is not an option.

If requiring a variable-length array on the stack rather the heap, most
C++ compilers offer an extension called "alloca" that allows you to do
exactly that. It is very low overhead (basically, it adds an offset the
stack pointer) and thus pretty fast.

> I'm not doubting you. I just live in a world where machines are cheap
> and people are not. Except in very limited circumstance, merely using
> C++ with any halfway decent data structures and algorithms is fast, and
> speed beyond that is a function of hardware and problem size, plus
> maybe a co-processor. To me, your message seems to have arrived from
> 1965. So I'm curious.

It's not so easy. It's a problem of the problem domain. If we're
speaking about codecs, then feel ensured that they sell by speed. The
C++ STL just gets in the way if you need it.

Greetings,
Thomas

Balog Pal

unread,

Apr 5, 2013, 1:46:55 AM4/5/13

to

On 4/4/2013 6:02 AM, Francis Glassborow wrote:
> On 03/04/2013 18:28, 嘱 Tiib wrote:
>
>> I am not certain if needless memory management may be optimized out,
>> but I suspect that no. At least I haven't seen any C++ compilers
>> attempting to optimize that:
>>
>> int main(){delete new char;}
>>
>> Everywhere it calls operator new and operator delete on whatever
>> optimization level despite to human eye it looks pointless to call
>> those. That means that dynamic memory management operations are
>> somehow considered externally observable behavior and so doing it
>> needlessly is doubly bad.
>
> It is quite hard to optimise these calls away because the new expression
> involves a call to operator new which may have been replaced by one that
> has some extra (observable) functionality. The same is true for the
> delete expression.

Hmm, an excellent point. And smells like a defect to the standard or at
least a good opportunity for improvement.

To me that case is the same as copy-elision. For that the standard
grants a license to remove a cctor/dtor pair despite it can be
user-defined and can have observable behavior. The supposed semantics
suggest the actions shall cancel out properly.

Is there a reason the same license to eliminate could not be provided
for op new/delete pair?

Daniel Krügler

unread,

Apr 5, 2013, 6:44:08 AM4/5/13

to

On 2013-04-05 07:46, Balog Pal wrote:
> To me that case is the same as copy-elision. For that the standard
> grants a license to remove a cctor/dtor pair despite it can be
> user-defined and can have observable behavior. The supposed
> semantics suggest the actions shall cancel out properly.
>
> Is there a reason the same license to eliminate could not be
> provided for op new/delete pair?

How do you define such a new/delete pair? new/delete calls are
completely independent and there exists no control by the core
language for them. You can write code of the form

T* p = nullptr;
if (condition1)
p = new T;
...
T* q = p;
...
if (condition2)
delete q;

Furthermore, we can call new in one function and delete on another.

This is different for implicitly called destructors, since we can
always refer them to a corresponding construction in a local context
and the compiler always knows about the relation of the "argument" of
the destructor relative to the constructor.

HTH & Greetings from Bremen,

Daniel Kr�gler

Francis Glassborow

unread,

Apr 5, 2013, 1:33:17 PM4/5/13

to

On 05/04/2013 00:01, Thomas Richter wrote:
> On 27.03.2013 22:54, James K. Lowden wrote:
>
>> Are you saying vector::operator[] takes 30% more time than a bare
>> pointer? Or that your program's *overall* performance is 30% faster
>> if you use C arrays instead of std::vector? Or something else?
>
>> From my personal experience: I once tried to replace raw C++ arrays
> (and pointers to such) with the std::valarray, hoping that the guarantee
> that valarray does not impose aliasing would have some impact on the
> speed of the compiled program. It did not. In fact, std::valarray was
> slower. I don't remember slower by how much, but at least so much that I
> no longer consider using it.
>
> The algorithms I'm using are digital signal processing (actually,
> wavelet filters for image compression), and they better have to be fast.
> All the fancy C++ mechanisms do not help here, their overhead is just
> not acceptable.

OK that makes perfect sense. This is one of a relatively small number of
areas where optimisation at almost any cost is worthwhile.

I am reminded of an early weather forecasting program that was producing
very good 24 hour forecasts. The problem was that on the then available
hardware it took 48 hours to run. Increase the performance by a factor
of 10 and a purely academic program becomes a viable one. Increase it by
a factor of 100 and it becomes possible to run it multiple times with
slightly different data (as is done these days) to determine the
stability of the forecast (allows forecasters to recognise when the
'butterfly' effect may be important.)

However note that the value of the output needs to be high to justify
the costs of programmer intensive optimisation and the consequential
maintenance costs.

Francis

Öö Tiib

unread,

Apr 5, 2013, 1:38:30 PM4/5/13

to

On Friday, 5 April 2013 13:44:08 UTC+3, Daniel Kr�gler wrote:
> On 2013-04-05 07:46, Balog Pal wrote:
> > To me that case is the same as copy-elision. For that the standard
> > grants a license to remove a cctor/dtor pair despite it can be
> > user-defined and can have observable behavior. The supposed
> > semantics suggest the actions shall cancel out properly.
> >
> > Is there a reason the same license to eliminate could not be
> > provided for op new/delete pair?
>
> How do you define such a new/delete pair?

For example like that:

template<class T>
T* factory( int i )
{
T* p = new T; // <- this is THE new of pair

// Some code dealing with *p and i
// that sometimes returns *p.
// when T is int it can't do nothing and so may be erased.

delete p; // <- this is THE delete of pair
return nullptr;
}

That is situation that stops compilers from optimizing all the
contents of 'factory<int>(42)' (and so call to it) out and so we have
to work on it and manually specialize.

> new/delete calls are completely independent and there exists no
> control by the core language for them.

Return values also can't be always optimized out. It could be far from
all pairs of new and delete. For example if compiler can prove that
everything done with *p between new and delete is pointless and *p
was default or copy-constructed then the pair may be elided.
Same like with copy constructor and destructor of return value.

Balog Pal

unread,

Apr 6, 2013, 12:23:43 AM4/6/13

to

On 4/5/2013 12:44 PM, Daniel Kr�gler wrote:
> On 2013-04-05 07:46, Balog Pal wrote:
>> To me that case is the same as copy-elision. For that the standard
>> grants a license to remove a cctor/dtor pair despite it can be
>> user-defined and can have observable behavior. The supposed
>> semantics suggest the actions shall cancel out properly.
>>
>> Is there a reason the same license to eliminate could not be
>> provided for op new/delete pair?
>
> How do you define such a new/delete pair?

Result of new passed to delete. How else? ;-)

> new/delete calls are
> completely independent and there exists no control by the core

> language for them. You can write code of the form ...

That's beside the point. If the implementation *can* figure out that a
result from new is passed to delete immediately, or between them only
irrelevant actions happen, is should be licenced to replace that with a
no-op. Cases where it can't or is too lazy to follow are not changed.

> Furthermore, we can call new in one function and delete on another.

Yep, and if the compiler can arrange to have all the code at some point
(maybe as late as linking or even execution), it can work with the info.
Like in the case of a local, if it's passed to a function by ref, but
the function is inlined and turns out the function does not touch that
variable, it can be eliminated entirely.

> This is different for implicitly called destructors, since we can
> always refer them to a corresponding construction in a local context
> and the compiler always knows about the relation of the "argument" of
> the destructor relative to the constructor.

I see only difference in chance to apply an optimisation in general --
and even that is diverging only for free-standing allocations, that are
pretty rare in many practices.

The main case for this scope is objects like string or vector, used as a
local. If I have an unused array<> it can be removed, but a similar
vector/string can not, because it has that mem-allocation inside. Even
better example is when you use unique_ptr or other smart pointers.

Our compilers do amazingly comprehensive data flow analysis for two
decades or more. Can unroll loops, chunk together operations, rearrange
by conditionals, just to name a few. It is common to create objects at
some distance from use, especially if conditions go between. With old
C++ it might be somewhat hosed in practice, but as constexpr gets
mainstream it will be even easier and more straightforward to optimize.
But operations involving new hoses that completely, as the involved
functions are volatile by nature -- and as mentioned can even be
replaced with unseen body. The effect is that we lose performance and/or
have to write more and messier code.

Francis Glassborow

unread,

Apr 6, 2013, 9:15:27 AM4/6/13

to

On 05/04/2013 06:46, Balog Pal wrote:
> On 4/4/2013 6:02 AM, Francis Glassborow wrote:
>> On 03/04/2013 18:28, 嘱 Tiib wrote:
>>
>>> I am not certain if needless memory management may be optimized
>>> out, but I suspect that no. At least I haven't seen any C++
>>> compilers attempting to optimize that:
>>>
>>> int main(){delete new char;}
>>>
>>> Everywhere it calls operator new and operator delete on whatever
>>> optimization level despite to human eye it looks pointless to call
>>> those. That means that dynamic memory management operations are
>>> somehow considered externally observable behavior and so doing it
>>> needlessly is doubly bad.
>>
>> It is quite hard to optimise these calls away because the new
>> expression involves a call to operator new which may have been
>> replaced by one that has some extra (observable) functionality. The
>> same is true for the delete expression.
>
> Hmm, an excellent point. And smells like a defect to the standard or
> at least a good opportunity for improvement.
>
> To me that case is the same as copy-elision.

Copy elision is a very special case because a copy ctor carries out a
conceptual process that has no side effects (the concept of copying
that is). C++ therefore issues a licence to elide copying even when
the actual ctor generates a side-effect and thereby warns the
programmer that they cannot in general rely on such a side-effect
occurring (though if you need it, there are ways to ensure that an
actual copy will be produced)

There is nothing in the concept of creating and destroying a dynamic
object that suggests there will not a be a side-effect (observable
behaviour) so the compiler optimiser has to fall back on the general
licence to remove code whose actions cannot result in a side-effect.

Note that in general a new-expression executes both a new operator
(that might be user provided) and a ctor. A delete expression
similarly executes both a delete operator and a dtor. Do we really
want to tell programmers that the optimiser is free to remove their
new/delete pair even when the execution of such a pair will produce
observable behaviour? IMO we do not (even the licence to elide copy
construction has been questioned in the past though I think most
programmers now think it justified in the contexts where it is
permitted).

Francis

Dave Harris

unread,

Apr 6, 2013, 9:21:41 AM4/6/13

to

In article <kjl9ud$v76$1...@news.ett.com.ua>, pa...@lib.hu (Balog Pal) wrote:
> > It is quite hard to optimise these calls away because the new
> > expression involves a call to operator new which may have been
> > replaced by one that has some extra (observable)
> > functionality. The same is true for the delete expression.
>
> Hmm, an excellent point. And smells like a defect to the standard or
> at least a good opportunity for improvement.
>
> To me that case is the same as copy-elision.

Copy-elision is a pretty clear win. It means eliminating overhead
altogether. Replacing heap allocation with stack allocation is not the
same. If the user has a small stack and a large heap, they might
/want/ to use the heap. If the code uses recursion heavily, we can
imagine cases where heap-based allocation works, and stack-based fails
from running out of stack. Given that the user requested heap-based by
using operator new(), it would arguably be a bug for the compiler to
presume it knew better. Especially for C++, which still has aspersions
for granting low-level control to the programmer.

-- Dave Harris, Nottingham, UK.

Balog Pal

unread,

Apr 6, 2013, 11:32:29 PM4/6/13

to

On 4/6/2013 3:15 PM, Francis Glassborow wrote:
>> Hmm, an excellent point. And smells like a defect to the standard or
>> at least a good opportunity for improvement.
>>
>> To me that case is the same as copy-elision.

> Copy elision is a very special case because a copy ctor carries out a
> conceptual process that has no side effects (the concept of copying
> that is). C++ therefore issues a licence to elide copying even when
> the actual ctor generates a side-effect and thereby warns the
> programmer that they cannot in general rely on such a side-effect
> occurring (though if you need it, there are ways to ensure that an
> actual copy will be produced)
>
> There is nothing in the concept of creating and destroying a dynamic
> object that suggests there will not a be a side-effect (observable
> behaviour) so the compiler optimiser has to fall back on the general
> licence to remove code whose actions cannot result in a side-effect.

And why is that exactly? I mean we start to mix the issue a little: when
we issue new and delete, we play with operator new/delete and also
ctor/dtor. The part I was (or intended to) talking about is the operator
new/delete, not the rest. That part is really internal and its semantics
is to manage raw storage. In the clean case with non-relpaced operators
to me it looks like a completely internal matter, and no reason to allow
it to be considered a side effect.

And when we extend that to replaced versions -- to me that is what looks
completely mapping to the cctor case. It also can be replaced, but the
compiler may assume it does nothing beyond copy.

I do not suggest to change anything about the ctor/dtor part itself,
that is regular code that is allowed to have side effects, but very
often does not (especially if used with new/delete). But if the compiler
can figure out it's free of side effects it can act ahead.

> Note that in general a new-expression executes both a new operator
> (that might be user provided) and a ctor. A delete expression
> similarly executes both a delete operator and a dtor. Do we really
> want to tell programmers that the optimiser is free to remove their
> new/delete pair even when the execution of such a pair will produce
> observable behaviour? IMO we do not (even the licence to elide copy
> construction has been questioned in the past though I think most
> programmers now think it justified in the contexts where it is
> permitted).

Of course not, without proper proof. As I see it, the real hoser is not
that, but the op new/delete with its access to memory pool, likely
involving synch objects; and the replaceability.
Also, IMO while we do use ctors/dtors aiming side effects (RAII!) op
new/delete has no such use, it really serve a pretty narrow purpose, and
intendedly a symmetric one.

Okay, not counting stuff like heap checkers/leak detectors, that love to
weigh in, but for their purpose the paired elision looks benign.

With that clarification do you still see a problem?

Balog Pal

unread,

Apr 6, 2013, 11:34:05 PM4/6/13

to

On 4/6/2013 3:21 PM, Dave Harris wrote:
> In article <kjl9ud$v76$1...@news.ett.com.ua>, pa...@lib.hu (Balog Pal) wrote:
>>> It is quite hard to optimise these calls away because the new
>>> expression involves a call to operator new which may have been
>>> replaced by one that has some extra (observable)
>>> functionality. The same is true for the delete expression.
>>
>> Hmm, an excellent point. And smells like a defect to the standard or
>> at least a good opportunity for improvement.
>>
>> To me that case is the same as copy-elision.
>
> Copy-elision is a pretty clear win. It means eliminating overhead
> altogether. Replacing heap allocation with stack allocation is not the
> same. If the user has a small stack and a large heap, they might
> /want/ to use the heap.

Exactly. There are valid motivations to rearrange code from direct
members to heap based ones. With all the same intended semantics, but
moved to a different memory pool for that budgeting reason. Only. If I
do that, I'd expect, and would be happy to retain the same level of
optimization support on my objects. What currently can't realistically
happen.

> If the code uses recursion heavily, we can
> imagine cases where heap-based allocation works, and stack-based fails
> from running out of stack.

Or even without recursion, the object may sit on a huge chunk of data.
And default stacks are pretty small these days (default is 1M on WIN32,
while you're free to roam in 2G memory).

> Given that the user requested heap-based by
> using operator new(), it would arguably be a bug for the compiler to
> presume it knew better.

Err, it knows better what exactly? We were talking about cases where the
object was just constructed, destructed, and in between only operations
that the compiler can figure out the end result at compile time. That
would be replaced with just result if no call to op new/delete pair was
sitting there.

> Especially for C++, which still has aspersions
> for granting low-level control to the programmer.

Please make up a case for the programmer where he would want operator
delete(operator new(42)) to be observable, and forced to actually
execute rather than be considered a no-op. I could not come up with
one. The closest motivation I could thinker was relying on presence of
synch primitives thus enjoying an effect of membar/fence, but IMO that
is not something worth encouraging.

{ quoted signature removed -mod }

Dave Harris

unread,

Apr 7, 2013, 8:38:10 AM4/7/13

to

In article <kjqam1$2hp9$1...@news.ett.com.ua>, pa...@lib.hu (Balog Pal)
wrote:

> > Given that the user requested heap-based by using operator new(),
> > it would arguably be a bug for the compiler to presume it knew
> > better.
>
> Err, it knows better what exactly?

Knows better whether the memory should come from heap or stack.

> We were talking about cases where the object was just constructed,
> destructed, and in between only operations that the compiler can
> figure out the end result at compile time.

I thought we were talking about replacing heap allocation with stack
allocation, not eliminating allocation altogether.

Even if heap allocations were not considered observable, that
shouldn't give the compiler license to replace heap allocation with
stack allocation. Which is what the earlier poster was concerned
about.

-- Dave Harris, Nottingham, UK.

Francis Glassborow

unread,

Apr 7, 2013, 8:40:56 AM4/7/13

to

On 07/04/2013 04:32, Balog Pal wrote:

> With that clarification do you still see a problem?

With that clarification I do not understand what you are trying to
fix. Consider this code fragment:

atype * at = new atype;
delete at;

The compiler treats this as

get enough memory for an instance of atype
construct a default instance of atype in that memory
store the location of the object in at.

Destroy the object of type atype found at the given location
release the memory obtained for the creation of an atype instance

Now, IIUC, you simply want a licence to somehow remove the memory
allocation/deallocation. How would you manage that? The instance has
to reside somewhere.

Please note that the standard does not mandate how the memory is
obtained so the default operator new and operator delete can work any
way that the implementer chooses. The implementation is entirely free
to use stack based memory if it can determine that the lifetime of the
instance will be entirely contained within the time that the stack
memory is valid. Note that this is only normally the case when the new
and delete expressions are in the same (brace limited) block.

Now, note that programmers use dynamic objects for several purposes
(for example, large objects that they do not want to reside on the
stack - a limited resource on some platforms). I cannot imagine
(maybe a failure of my imagination) a case where a competent
programmer would want a compiler to start overruling a decision to use
dynamic memory unless it can determine that doing so will have no
side-effects (note that performance is not a side-effect even though
you may be able to determine that an optimisation has been applied
because of the execution time of the code -- that is a serious problem
to some benchmark programs)

So my answer is that from what I currently understand I do not see any
need to broaden the already existing licence for optimisation as
regards dynamic objects.

Francis

Francis Glassborow

unread,

Apr 7, 2013, 8:42:53 AM4/7/13

to

On 07/04/2013 04:34, Balog Pal wrote:
> Err, it knows better what exactly? We were talking about cases where
> the object was just constructed, destructed, and in between only
> operations that the compiler can figure out the end result at
> compile time. That would be replaced with just result if no call to
> op new/delete pair was sitting there.

But the implementer already has a licence to use such an optimisation
providing that there are also no side-effects within the code for
operator new and and operator delete.

Actually I am always a little concerned when a compiler decides that
it can remove my code. At the very least it should warn me and provide
a mechanism by which I can insist that it does it my way.

Note that that implies that I should always be able to turn off link
time optimisations that are based on whole code analysis.

Also note that even copy ctor elision is strictly controlled and is
only permitted in certain well defined contexts.

Francis

Daniel Krügler

unread,

Apr 9, 2013, 1:28:46 PM4/9/13

to

Am 27.03.2013 22:51, schrieb Leigh Johnston:
> On 27/03/2013 12:03, Daniel Kr�gler wrote:
[..]
>> Actually I had already responded to this thread, but somehow my
>> reply did not occur in this group (only on the comp.std.c++
>> group). The problem that you are describing was already cause for a
>> previous library issue:
>>
>> http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#1102
>>
>> Even NAD issues do have some value to document an existing
>> discussion of that matter. Personally I agree with you that the
>> wording is less than clear and should be improved. But for
>> reopening this or for opening a new issue, there needs to be
>> further evidence that this is still a problem as of today's
>> compilers. Can you give some pointer for a library implementation
>> that still changes the capacity during clear()? I know that
>> previous Visual Studio Compiler libraries did so, but I thought
>> that this had been fixed in latter version. If you are aware of
>> newer Library implementations that show these effects that would be
>> a valuable hint.
>
> The version of QNX I am using is the implementation that has a
> capacity resetting vector<T>::clear().

I have been informed that this must be a rather old implementation and
that newer versions must be available that have fixed that problem.
Nonetheless I agree with James Kanze's response on comp.std.c++ that
the issue I have referred to above should be reopened to improve the
wording.

HTH & Greetings from Bremen,

Daniel Kr�gler

Dave Harris

unread,

Apr 17, 2013, 2:33:09 AM4/17/13

to

In article <Cpydnbpw07JyzfzM...@bt.com>,

francis.g...@btinternet.com (Francis Glassborow) wrote:
> Consider this code fragment:
>
> atype * at = new atype;
> delete at;

> [...]

>
> Now, IIUC, you simply want a licence to somehow remove the memory
> allocation/deallocation. How would you manage that? The instance has
> to reside somewhere.

If the compiler can show that atype's constructor and destructor have
no visible side effects, then it ought to be able to optimise away
both lines. As things stand, I thought we agreed, it cannot, because
the allocation and deallocation of memory count as a visible side
effect (unless the compiler can prove that the user has not defined
operator new and delete, which it usually can't; especially if the
user has). The compiler could replace it with something like:

operator delete( operator new( sizeof atype ) );

but it can't get rid of that.

So the suggestion is for calls to operator new and delete to not be
considered observable, even if they are defined by the user. To which
the counter was made, that it would allow the compiler to replace
heap-based allocation with stack-based, which the programmer may not
desire. This is true: however, it would not require the compile to do
so. It would become a quality of implementation issue. Best practice
might be for the compiler to only remove calls to operator new and
delete if it can do so safely; for example, in cases like the above
where the code would vanish entirely and no extra stack allocation
would be done.

-- Dave Harris, Nottingham, UK.

Daniel Krügler

unread,

Apr 21, 2013, 6:51:14 PM4/21/13

to

Am 17.04.2013 08:33, schrieb Dave Harris:
[...]

> If the compiler can show that atype's constructor and destructor have
> no visible side effects, then it ought to be able to optimise away
> both lines. As things stand, I thought we agreed, it cannot, because
> the allocation and deallocation of memory count as a visible side
> effect (unless the compiler can prove that the user has not defined
> operator new and delete, which it usually can't; especially if the
> user has). The compiler could replace it with something like:
>
> operator delete( operator new( sizeof atype ) );
>
> but it can't get rid of that.
>
> So the suggestion is for calls to operator new and delete to not be
> considered observable, even if they are defined by the user. To which
> the counter was made, that it would allow the compiler to replace
> heap-based allocation with stack-based, which the programmer may not
> desire. This is true: however, it would not require the compile to do
> so. It would become a quality of implementation issue. Best practice
> might be for the compiler to only remove calls to operator new and
> delete if it can do so safely; for example, in cases like the above
> where the code would vanish entirely and no extra stack allocation
> would be done.

It might be worth adding that the C++ Core Language group has recently
accepted a proposal during the Bristol meeting that relaxes the
constraints upon new expressions to provide better optimizations
choices. In particular, an implementation is now allowed to omit calls
to the replaceable global allocation function, it may instead provide
some "own" storage or it may extend the allocation of another new
expression. For details please read the proposal (N3664 will be
available in the post-meeting mailing on
http://www.open-std.org/jtc1/sc22/wg21/).

HTH & Greetings from Bremen

Daniel Krï¿œgler

Balog Pal

unread,

Apr 23, 2013, 12:31:41 AM4/23/13

to

On 4/7/2013 2:40 PM, Francis Glassborow wrote:
> On 07/04/2013 04:32, Balog Pal wrote:
>
>> With that clarification do you still see a problem?
>
> With that clarification I do not understand what you are trying to
> fix.

It appears that I was thinking along the lines as the recently accepted
N3664. It covers both my problem and supposed solution.

I hope soon most compilers will jump on it and start omitting and fusing
allocations.