How to deal with running out of stack space?

Juha Nieminen

unread,

Jun 14, 2019, 8:44:11 AM6/14/19

to

One thing I have always felt as being astonishingly under-defined
both in C and C++ is the concept of the stack, how much of it a
program has available, and especially, what happens if you run
out of it.

With dynamic memory allocation there's always a mechanism to deal
with the case that it fails: Either a null pointer is returned, or
an exception is thrown (depending on how exactly you are doing the
allocation). In either case, you can deal with the problem.

But what exactly happens if eg. your recursion level is too big and
you run out of stack space?

To be honest, I actually don't know what the standard says about this
(and in fact, this is a question I'm presenting here.) I would imagine
it's UB or implementation-defined behavior. And I'm guessing there's
simply no way of knowing *when* you are going to run out of stack
space, or to gracefully recover from it.

How exactly should programs be designed, if one wanted to be extra
paranoid about the possibility of running out of stack space? Or is
it something that there's simply no recourse against?

Paavo Helde

unread,

Jun 14, 2019, 9:48:00 AM6/14/19

to

With MSVC++ one can make stack overflow a recoverable error, though it's
tricky to get it right and there is a danger to make the whole program
slower. A good starting point is the _resetstkoflw() documentation.

Another more portable solution is to measure the used stack space and
fail the operation if it is becoming too large. Measuring is actually
easy, you just compare the addresses of local variables in the current
function and in main() (or another thread start function).

In practice this measuring has to be done only in recursive functions
(direct or indirect), as long as one abstains from using evil things
like VLA-s and alloca().

What's more tricky is to figure out what the max stack size is and how
much of it would be used by non-recursive functions called from the
function doing the check. A cheap and dirty way is to just test it: run
the program so that it gets into infinite recursion. Find out the number
of recursions it was able to do before crashing, then insert a check,
leaving some safety margin.

As always, unit tests are invaluable for detecting regressions caused by
unrelated code changes.

David Brown

unread,

Jun 14, 2019, 9:50:07 AM6/14/19

to

On 14/06/2019 14:44, Juha Nieminen wrote:
> One thing I have always felt as being astonishingly under-defined
> both in C and C++ is the concept of the stack, how much of it a
> program has available, and especially, what happens if you run
> out of it.
>
> With dynamic memory allocation there's always a mechanism to deal
> with the case that it fails: Either a null pointer is returned, or
> an exception is thrown (depending on how exactly you are doing the
> allocation). In either case, you can deal with the problem.
>
> But what exactly happens if eg. your recursion level is too big and
> you run out of stack space?
>
> To be honest, I actually don't know what the standard says about this
> (and in fact, this is a question I'm presenting here.) I would imagine
> it's UB or implementation-defined behavior. And I'm guessing there's
> simply no way of knowing *when* you are going to run out of stack
> space, or to gracefully recover from it.

That's about right, yes. It is certainly possible for some
compiler/OS/cpu combinations to give you something extra here, but there
is no standard way to detect, predict, or handle out of stack failures.

This is intentional - run-time checking of stack problems would be very
difficult to specify at a language level, and could quickly be quite
costly (in terms of run-time and code space) to implement. Most
languages, I think, leave stack overflow as undefined and rely on the OS
to crash the program in a controlled manner via memory access faults.

>
> How exactly should programs be designed, if one wanted to be extra
> paranoid about the possibility of running out of stack space? Or is
> it something that there's simply no recourse against?
>

Standard techniques include:

1. Avoid recursion (direct or indirect).
2. If recursion is necessary (or it is the cleanest solution to a
problem), be /very/ careful that you know its depth will be limited.
3. Avoid VLAs, or make sure their size is within known limits.
4. Be careful with any large non-static local variables.
5. Use OS, compiler or target-specific features.
6. Test worst cases.
7. Use OS-specific calls or settings to reduce stack size during
testing, and increase it during production.
8. Use static stack size warnings or analysis.
9. Use dynamic stack checking and analysis.

Some possible gcc features that can help include:

-Wstack-usage=n
-fstack-check
-fstack-limit
-fstack-usage

Other compilers probably have features you can use.

Thomas David Rivers

unread,

Jun 14, 2019, 11:28:46 AM6/14/19

to

Paavo Helde wrote:

>
>
> Another more portable solution is to measure the used stack space and
> fail the operation if it is becoming too large. Measuring is actually
> easy, you just compare the addresses of local variables in the current
> function and in main() (or another thread start function).
>

That is not a portable solution as there is no requirement that the stack be
a contiguous "chunk" of memory. On the mainframe, for example, it is
not... so you can't tell
how much stack is "used" by comparing addresses of automatic variables.

- Dave Rivers -

--
riv...@dignus.com Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com

Melzzzzz

unread,

Jun 14, 2019, 1:38:20 PM6/14/19

to

That's undefined behavior.

>
>

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Keith Thompson

unread,

Jun 14, 2019, 2:58:18 PM6/14/19

to

Thomas David Rivers <riv...@dignus.com> writes:
> Paavo Helde wrote:
>> Another more portable solution is to measure the used stack space and
>> fail the operation if it is becoming too large. Measuring is actually
>> easy, you just compare the addresses of local variables in the current
>> function and in main() (or another thread start function).
>>
> That is not a portable solution as there is no requirement that the stack be
> a contiguous "chunk" of memory. On the mainframe, for example, it is
> not... so you can't tell
> how much stack is "used" by comparing addresses of automatic variables.

Even if the stack is contiguous, relational operators on pointers
to distinct objects have undefined behavior. It's likely to "work"
on most systems, but there's no guarantee. (And the stack could
grow either up or down.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Thiago Adams

unread,

Jun 14, 2019, 4:00:47 PM6/14/19

to

On Friday, June 14, 2019 at 9:44:11 AM UTC-3, Juha Nieminen wrote:
> One thing I have always felt as being astonishingly under-defined
> both in C and C++ is the concept of the stack, how much of it a
> program has available, and especially, what happens if you run
> out of it.
> With dynamic memory allocation there's always a mechanism to deal
> with the case that it fails: Either a null pointer is returned, or
> an exception is thrown (depending on how exactly you are doing the
> allocation). In either case, you can deal with the problem.
>
> But what exactly happens if eg. your recursion level is too big and
> you run out of stack space?
> To be honest, I actually don't know what the standard says about this
> (and in fact, this is a question I'm presenting here.) I would imagine
> it's UB or implementation-defined behavior. And I'm guessing there's
> simply no way of knowing *when* you are going to run out of stack
> space, or to gracefully recover from it.

I believe some embedded hardware (like arduino) will not give you an
option. It doesn't make any check. The standard cannot help.

> How exactly should programs be designed, if one wanted to be extra
> paranoid about the possibility of running out of stack space? Or is
> it something that there's simply no recourse against?

if you don't allocate memory on the stack like VLAs and you don't have recursion, then you can statically determine how much stack memory each function needs.
(
See 13.13 Stack Usage Report
https://www.gimpel.com/html/manual.pdf
)

But some hardware have the same memory for stack and heap and the amount
of stack is variable.
The solution for extra safety is is not to use recursion and dynamic allocations.

Thiago Adams

unread,

Jun 14, 2019, 4:04:25 PM6/14/19

to

On Friday, June 14, 2019 at 10:50:07 AM UTC-3, David Brown wrote:
....

> Standard techniques include:
>
> 1. Avoid recursion (direct or indirect).
> 2. If recursion is necessary (or it is the cleanest solution to a
> problem), be /very/ careful that you know its depth will be limited.
> 3. Avoid VLAs, or make sure their size is within known limits.
> 4. Be careful with any large non-static local variables.
> 5. Use OS, compiler or target-specific features.
> 6. Test worst cases.
> 7. Use OS-specific calls or settings to reduce stack size during
> testing, and increase it during production.
> 8. Use static stack size warnings or analysis.
> 9. Use dynamic stack checking and analysis.

What do you think about adding this one?

If your hardware shares the memory between heap and stack:

10. Don't use dynamic memory allocation

rick.c...@gmail.com

unread,

Jun 14, 2019, 4:07:05 PM6/14/19

to

On Friday, June 14, 2019 at 8:44:11 AM UTC-4, Juha Nieminen wrote:
> One thing I have always felt as being astonishingly under-defined
> both in C and C++ is the concept of the stack, how much of it a
> program has available, and especially, what happens if you run
> out of it.
>
> With dynamic memory allocation there's always a mechanism to deal
> with the case that it fails: Either a null pointer is returned, or
> an exception is thrown (depending on how exactly you are doing the
> allocation). In either case, you can deal with the problem.
>
> But what exactly happens if eg. your recursion level is too big and
> you run out of stack space?

I don't see how this is a C/C++ issue at all. To my knowledge,
there is no guarantee of a stack in the C or C++ standards. They
are totally implementation-specific.

> To be honest, I actually don't know what the standard says about this
> (and in fact, this is a question I'm presenting here.) I would imagine
> it's UB or implementation-defined behavior. And I'm guessing there's
> simply no way of knowing *when* you are going to run out of stack
> space, or to gracefully recover from it.
>
> How exactly should programs be designed, if one wanted to be extra
> paranoid about the possibility of running out of stack space? Or is
> it something that there's simply no recourse against?

If you are running out of stack space, re-write your algorithm.
In post-80386's expand-down segment designs where the initial
stack pointer points to a logical top-of-memory address, and
where paging is employed, it will never be an issue anyone has
to deal with again, unless your algorithm is broken.

The only time I've gotten stack exceptions since the 1990s is
when I've had an algorithm error.

--
Rick C. Hodgin

Paavo Helde

unread,

Jun 14, 2019, 4:12:48 PM6/14/19

to

On 14.06.2019 18:28, Thomas David Rivers wrote:
> Paavo Helde wrote:
>
>>
>>
>> Another more portable solution is to measure the used stack space and
>> fail the operation if it is becoming too large. Measuring is actually
>> easy, you just compare the addresses of local variables in the current
>> function and in main() (or another thread start function).
>>
> That is not a portable solution as there is no requirement that the
> stack be
> a contiguous "chunk" of memory. On the mainframe, for example, it is
> not... so you can't tell
> how much stack is "used" by comparing addresses of automatic variables.

Right. That's why I said "more portable", not "portable".

Paavo Helde

unread,

Jun 14, 2019, 4:17:51 PM6/14/19

to

Which behaves much better than stack overflow, on many common platforms.

Alf P. Steinbach

unread,

Jun 14, 2019, 5:21:08 PM6/14/19

to

On 14.06.2019 20:58, Keith Thompson wrote:
> Thomas David Rivers <riv...@dignus.com> writes:
>> Paavo Helde wrote:
>>> Another more portable solution is to measure the used stack space and
>>> fail the operation if it is becoming too large. Measuring is actually
>>> easy, you just compare the addresses of local variables in the current
>>> function and in main() (or another thread start function).
>>>
>> That is not a portable solution as there is no requirement that the stack be
>> a contiguous "chunk" of memory. On the mainframe, for example, it is
>> not... so you can't tell
>> how much stack is "used" by comparing addresses of automatic variables.

I've seen this alleged fact before.

Then it was a misunderstanding of a machine that linked up stack frames
obtained from a single contiguous machine stack.

So a concrete link to the spec of "the mainframe" would be nice.

> Even if the stack is contiguous, relational operators on pointers
> to distinct objects have undefined behavior. It's likely to "work"
> on most systems, but there's no guarantee. (And the stack could
> grow either up or down.)

Pointers to distinct objects can be compared using std::greater & family.

On a segmented architecture one can imagine this being done by comparing
first segment selectors, and if they're equal, the segment offsets.

However, since that doesn't necessarily involve a translation to a
single linear address space, or the possibility of doing that, that
doesn't imply that forming the difference of two such pointers is
necessarily possible on all platforms.

Cheers!,

- Alf

James Kuyper

unread,

Jun 14, 2019, 7:35:23 PM6/14/19

to

On 6/14/19 8:44 AM, Juha Nieminen wrote:
> One thing I have always felt as being astonishingly under-defined
> both in C and C++ is the concept of the stack, how much of it a
> program has available, and especially, what happens if you run
> out of it.
>
> With dynamic memory allocation there's always a mechanism to deal
> with the case that it fails: Either a null pointer is returned, or
> an exception is thrown (depending on how exactly you are doing the
> allocation). In either case, you can deal with the problem.
>
> But what exactly happens if eg. your recursion level is too big and
> you run out of stack space?
>
> To be honest, I actually don't know what the standard says about this

Nothing. Almost all implementations of C++ use a stack structure to
store local variables, and many of them do so using a hardware stack,
but the standard says almost nothing about such issues.

The closest it comes to talking about such matters is to define the term
"implementation limits": "restrictions imposed upon programs by the
implementation". It takes some imagination to look at those words and
realize that a limited amount of stack space is one example of the kind
of limit it's talking about. The standard doesn't say anything about
what happens when you exceed an implementation limit, so the behavior is
undefined by "ommission of any explicit definition of the behavior".

It's all very vague and open to dispute. Have fun!

Ian Collins

unread,

Jun 14, 2019, 8:14:57 PM6/14/19

to

Don't use recursion, do use static analysis tools to check your stack depth!

--
Ian.

Ben Bacarisse

unread,

Jun 14, 2019, 9:26:22 PM6/14/19

to

Recursion is not itself a problem. Tail call optimisation can eliminate
the stack costs for some recursive programs, and even without it, some
patterns of recursion are not inherently costly. For example, printing
a 32-bit number as a decimal by printing a tenth of it and the printing
the last digit won't recurse more than ten levels. Obviously, there are
limited environments where even that is unacceptable, but they are
probably the exception these days.

--
Ben.

Ian Collins

unread,

Jun 14, 2019, 9:30:22 PM6/14/19

to

Recursion may not be a problem, but it can defeat static analysis tools.

--
Ian.

Thiago Adams

unread,

Jun 14, 2019, 10:20:55 PM6/14/19

to

I don’t known if any tool does that but
recursive functions could be reported with
a constant + number*N where N is how deep
we call the function.

Chris Vine

unread,

Jun 15, 2019, 4:59:46 AM6/15/19

to

What you say about tail call elimination is right for C, but not so
much for C++ because C++ has destructors, which means that most
recursive calls which look as if they are in tail position in fact
are not. To get tail call elimination to work in C++ you need to have
only built-in or trivial types (in other words, only C-like types) in
local scope when the recursive call is made.

That is one of the reasons why Rust has not implemented tail call
elimination despite it being heavily influenced by functional languages.
It has adopted C++'s approach of having determinative destruction/
deallocation of objects instead of garbage collection.

Melzzzzz

unread,

Jun 15, 2019, 6:02:19 AM6/15/19

to

It had GC in the beginning, but they ditched it for performance reasons.
It also had segmented stack, which they also ditched for same reasons...

Bonita Montero

unread,

Jun 15, 2019, 7:59:41 AM6/15/19

to

> But what exactly happens if eg. your recursion level is too big and
> you run out of stack space?

If you won't do massive alloca()s on most platforms there will never
be such a deep recursion-level that the stack.space is exhausted.
Consider that if you have thousands of recursion-levels you also will
have not only an according depth of the stack but also a with of the
recursions per node, i.e. your computations will last infinitely long.

David Brown

unread,

Jun 15, 2019, 11:13:23 AM6/15/19

to

On 14/06/2019 23:20, Alf P. Steinbach wrote:
> On 14.06.2019 20:58, Keith Thompson wrote:
>> Thomas David Rivers <riv...@dignus.com> writes:
>>> Paavo Helde wrote:
>>>> Another more portable solution is to measure the used stack space and
>>>> fail the operation if it is becoming too large. Measuring is actually
>>>> easy, you just compare the addresses of local variables in the current
>>>> function and in main() (or another thread start function).
>>>>
>>> That is not a portable solution as there is no requirement that the
>>> stack be
>>> a contiguous "chunk" of memory. On the mainframe, for example, it is
>>> not... so you can't tell
>>> how much stack is "used" by comparing addresses of automatic variables.
>
> I've seen this alleged fact before.
>
> Then it was a misunderstanding of a machine that linked up stack frames
> obtained from a single contiguous machine stack.
>
> So a concrete link to the spec of "the mainframe" would be nice.

I don't know about mainframes, but I can tell you that gcc has support
for non-continguous stacks on various architectures. Spit stacks can be
particularly useful in multi-threaded code (since each thread needs its
own stack).

<https://gcc.gnu.org/wiki/SplitStacks>

(I haven't used this feature, nor can I say how common it is.)

>
>> Even if the stack is contiguous, relational operators on pointers
>> to distinct objects have undefined behavior. It's likely to "work"
>> on most systems, but there's no guarantee. (And the stack could
>> grow either up or down.)
>
> Pointers to distinct objects can be compared using std::greater & family.

True (for C++14 onwards). They can also be compared, on most
architectures, by converting first to uintptr_t and comparing as integer
types. But they can't be compared with standard relational operators
unless the compiler documentation specifically says so. You can expect
older and simpler compilers to do such comparisons in the obvious way,
but newer tools are making more use of optimisations and re-arrangements
based on such operations being undefined. (This is precisely why newer
C++ additions, like std::greater, explicitly allow such comparisons - so
that you can get them if you need them.)

>
> On a segmented architecture one can imagine this being done by comparing
> first segment selectors, and if they're equal, the segment offsets.
>

You could imagine that, yes. But it would not match the physical
ordering of addresses in an architecture such as 16-bit x86.

David Brown

unread,

Jun 15, 2019, 11:15:53 AM6/15/19

to

Yes. On single-threaded embedded systems, it is common for the stack to
grow down from the top of memory, and the heap to grow upwards from top
of statically allocated data. You don't have to specify the sizes of
your heap and your stack, but bad things happen if they collide.

David Brown

unread,

Jun 15, 2019, 11:19:08 AM6/15/19

to

The great majority of systems are not x86, and don't have any kind of
memory paging or virtual memory.

It is true that it is unlikely that you will have trouble running out of
stack space on a modern PC, unless you have some kind of runaway
recursion, but for other kinds of system it is a very relevant question.

David Brown

unread,

Jun 15, 2019, 11:22:08 AM6/15/19

to

A destructor is just a function invocation - which may be a call, or it
may be inlined. If the compiler can arrange the recursive calls with
tail call optimisations, it will - destructors are not any different
from any other code.

Paavo Helde

unread,

Jun 15, 2019, 1:25:30 PM6/15/19

to

I wish that were true. With our script interpreter I can get a stack
overflow with just calculating factorial of 210 with a simple recursive
procedure. It implements recursion on real stack, from the debugger I
see that each script-level stack frame takes 4848 bytes and uses 5 C++
function call frames. The full C++ stack in the debugger is 1067 frames
deep and uses up 1MB (the default stack size in MSVC++).

Oh, and it takes just 0.15 seconds to get this stack overflow. There is
no alloca() and there are no special coding efforts spent on optimizing
the stack usage, in either direction.

Fortunately this script interpreter is not meant to be used for such
silly recursive procedures, so there is no urgent need to fix anything.
The relevant case has sit in the backlog over 10 years.

This example just shows that using such strong words like "never" and
"infinitely long" is not quite appropriate, one can get stack overflow
pretty easily. It's not a purely academic concern.

Alf P. Steinbach

unread,

Jun 15, 2019, 2:35:06 PM6/15/19

to

On 15.06.2019 17:13, David Brown wrote:
> On 14/06/2019 23:20, Alf P. Steinbach wrote:
>> On 14.06.2019 20:58, Keith Thompson wrote:
>>> Thomas David Rivers <riv...@dignus.com> writes:
>>>> Paavo Helde wrote:
>>>>> Another more portable solution is to measure the used stack space and
>>>>> fail the operation if it is becoming too large. Measuring is actually
>>>>> easy, you just compare the addresses of local variables in the current
>>>>> function and in main() (or another thread start function).
>>>>>
>>>> That is not a portable solution as there is no requirement that the
>>>> stack be
>>>> a contiguous "chunk" of memory. On the mainframe, for example, it is
>>>> not... so you can't tell
>>>> how much stack is "used" by comparing addresses of automatic variables.
>>
>> I've seen this alleged fact before.
>>
>> Then it was a misunderstanding of a machine that linked up stack
>> frames obtained from a single contiguous machine stack.
>>
>> So a concrete link to the spec of "the mainframe" would be nice.
>
> I don't know about mainframes, but I can tell you that gcc has support
> for non-continguous stacks on various architectures. Spit stacks can be
> particularly useful in multi-threaded code (since each thread needs its
> own stack).
>
> <https://gcc.gnu.org/wiki/SplitStacks>
>
> (I haven't used this feature, nor can I say how common it is.)

It sounds challenging for debuggers. I wasn't aware of it. Thanks.

>>> Even if the stack is contiguous, relational operators on pointers
>>> to distinct objects have undefined behavior. It's likely to "work"
>>> on most systems, but there's no guarantee. (And the stack could
>>> grow either up or down.)
>>
>> Pointers to distinct objects can be compared using std::greater & family.
>
> True (for C++14 onwards).

`std::greater` was there, providing a total ordering, in the first C++
standard, C++98. It's necessary for ordered containers of pointers.

I don't know if it was part of the de-facto standard library, as
specified by the ARM, in the pre-standard days.

I could check but that would involve either a lot of physical work
finding the book in a heap of boxes containing all kinds of stuff, or
semi-illegally downloading a PDF of the ARM.

> They can also be compared, on most
> architectures, by converting first to uintptr_t and comparing as integer
> types. But they can't be compared with standard relational operators
> unless the compiler documentation specifically says so. You can expect
> older and simpler compilers to do such comparisons in the obvious way,
> but newer tools are making more use of optimisations and re-arrangements
> based on such operations being undefined. (This is precisely why newer
> C++ additions, like std::greater, explicitly allow such comparisons - so
> that you can get them if you need them.)
>
>>
>> On a segmented architecture one can imagine this being done by
>> comparing first segment selectors, and if they're equal, the segment
>> offsets.
>>
>
> You could imagine that, yes. But it would not match the physical
> ordering of addresses in an architecture such as 16-bit x86.

I wouldn't call the original x86, with each physical memory address
belonging to (if my in-head arithmetic is correct) 4096 different
overlapping "segments", a segmented architecture. It was like a
receiver prepared for adaption to stereo, but not actually a stereo
receiver, in that it had registers and instructions that in a later
version, the 286, would be employed for segment handling. There were no
segment tables, just a simple arithmetic relationship (namely 16x)
between selector and physical segment start address.

Alf P. Steinbach

unread,

Jun 15, 2019, 2:41:26 PM6/15/19

to

On 15.06.2019 17:22, David Brown wrote:
>> [snip]

>
> A destructor is just a function invocation - which may be a call, or it
> may be inlined. If the compiler can arrange the recursive calls with
> tail call optimisations, it will - destructors are not any different
> from any other code.

In the context of tail recursion a destructor is different, in that it's
invoked implicitly, not explicitly visible in the source code.

That can lead a non-expert programmer to believe tail recursion
optimization will be employed, when in fact it can't be (easily) employed.

Well there are other gotcha's also, e.g. that at least a decade ago
Visual C++ stubbornly refused to optimize tail recursion floating point
type results. So if one relies on tail recursion in C++ one should
better know what one is doing, and know one's compilers.

>> That is one of the reasons why Rust has not implemented tail call
>> elimination despite it being heavily influenced by functional languages.
>> It has adopted C++'s approach of having determinative destruction/
>> deallocation of objects instead of garbage collection.

Interesting.

Cheers!,

- Alf

Keith Thompson

unread,

Jun 15, 2019, 4:03:24 PM6/15/19

to

David Brown <david...@hesbynett.no> writes:
> On 14/06/2019 23:20, Alf P. Steinbach wrote:
>> On 14.06.2019 20:58, Keith Thompson wrote:

[...]

>>> Even if the stack is contiguous, relational operators on pointers
>>> to distinct objects have undefined behavior. It's likely to "work"
>>> on most systems, but there's no guarantee. (And the stack could
>>> grow either up or down.)
>>
>> Pointers to distinct objects can be compared using std::greater & family.
>
> True (for C++14 onwards).

As Alf writes in a followup, std::greater has been in the
standard for a long time. There was a minor change in C++14.
https://en.cppreference.com/w/cpp/utility/functional/greater

> They can also be compared, on most
> architectures, by converting first to uintptr_t and comparing as
> integer types.

Yes, but that has even fewer guarantees than std::greater. If p and
q are pointers, you could have p==q but (uintptr_t)p != (uintptr_t)q
if p and q hold different representations of the same value.

std::greater guarantees a total ordering of pointer values, but
there's no guarantee that that ordering is meaningful. If you
compare addresses of three objects with automatic storage in the
order in which they're created, you'll *usually* get either
(&x < &y < &z) or (&x > &y > &z) (this is pseudo-code, not C++
notation), but all 6 orderings are legal.

(Since the built-in relational operators < <= > >= are undefined
on pointers to distinct objects, an implementation for a segmented
architecture can compare just the offsets, assuming each object
is in a single segment. == and != might have to do more work to
distinguish between pointers to different segments with the same
offset.)

David Brown

unread,

Jun 16, 2019, 5:55:49 AM6/16/19

to

Yes. (I misread a detail in a reference page - it wasn't added in
C++14, merely slightly modified.)

The architecture was called "segmented", whether you like the name or
not (and I tend to agree with you here).

As a more general point, systems can have multiple views of memory and
multiple ways to order it, and may not have a complete order for
addresses. Logical and physical addresses can have very different
mappings. You can have aliasing with different logical addresses
mapping to the same physical address. You can have the same logical
address mapping to multiple physical addresses in different
circumstances. You can have all sorts of combinations.

It can certainly be useful to have an ordering on addresses for things
like containers - all you need is an ordering, without any requirement
for it to be "real" in any sense. But it does not let you get
information about sizes of stacks or memories - each object is, in
effect, floating "somewhere" unconnected to any other object.

David Brown

unread,

Jun 16, 2019, 6:13:37 AM6/16/19

to

On 15/06/2019 20:41, Alf P. Steinbach wrote:
> On 15.06.2019 17:22, David Brown wrote:
>>> [snip]
>>
>> A destructor is just a function invocation - which may be a call, or
>> it may be inlined. If the compiler can arrange the recursive calls
>> with tail call optimisations, it will - destructors are not any
>> different from any other code.
>
> In the context of tail recursion a destructor is different, in that it's
> invoked implicitly, not explicitly visible in the source code.

In the source code, it is invoked implicitly, yes (that's the whole idea
of them). But as far as the compiler is concerned, when it translates
the source code into internal formats it fills in the destructor call
just like any other function invocation. /Then/ it starts to analyse
and manipulate the code, looking for optimisations such as tail recursion.

Destructors, being hidden in the source, might make it harder for
programmers to guess if a particular function is likely to be suitable
for tail recursion optimisation - but it makes no difference to the
compiler.

>
> That can lead a non-expert programmer to believe tail recursion
> optimization will be employed, when in fact it can't be (easily) employed.
>
> Well there are other gotcha's also, e.g. that at least a decade ago
> Visual C++ stubbornly refused to optimize tail recursion floating point
> type results. So if one relies on tail recursion in C++ one should
> better know what one is doing, and know one's compilers.
>

Exactly, yes.

Chris Vine

unread,

Jun 16, 2019, 6:35:56 AM6/16/19

to

Yes, but generally it can't do this, because destructors may only be
called at the end of the local scope in which the object concerned has
its lifetime, and in the reverse order of construction of other objects
in that scope. Any observable effect of object destruction, as required
by the C++ standard, must occur after the recursive function call.

The result of that is that a recursive function call at the end of local
scope, which might appear to be in tail position, in fact isn't - the
destructors still remain to execute. I cannot see how an optimizer can
get around this where the objects concerned are other than trivial (in
the C++-standard sense of that word).

The proof of the pudding is in the eating: can you actually produce some
code where tail call elimination is applied with non-trivial objects in
local scope? If you can, a few lines of code should suffice to
demonstrate this.

Chris Vine

unread,

Jun 16, 2019, 7:10:29 AM6/16/19

to

On Sun, 16 Jun 2019 11:35:40 +0100
Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]

> The proof of the pudding is in the eating: can you actually produce some
> code where tail call elimination is applied with non-trivial objects in
> local scope? If you can, a few lines of code should suffice to
> demonstrate this.

The kind of case to which I am referring is this:

void recfunc() {
std::string s;
... do something ...
recfunc();
}

You can bring back tail recursion by massaging the code:

void recfunc() {
{
std::string s;
... do something ...
}
recfunc();
}

But in a case where recfunc() takes an argument, you cannot easily use
the string to provide an argument for the recursive function, because
it is no longer in scope at the time of the recursive call.

David Brown

unread,

Jun 16, 2019, 7:56:11 AM6/16/19

to

Of course a compiler can't do magic - if the operation of the destructor
is too complicated to re-arrange to get tail recursion optimisation,
then you don't get the optimisation. My point - and I hope this is the
last time I have to write the same thing here - is that destructors are
not special. They are just like any other function invocation. If they
are simple enough, or if the compiler can figure out how to re-arrange
things - then they don't hinder the optimisation. (And many destructors
/are/ simple.) Destructors don't have to be called at the end of the
scope or in reverse order of construction - but their observable effects
must occur exactly as if they had been called in that place and order.
Just like any other function call.

Destructors that do complex things, like deallocate memory and
resources, make it unlikely that a compiler can do tail recursive
optimisations. But it is nothing special about destructors - it is the
same as any other code.

Alf P. Steinbach

unread,

Jun 16, 2019, 11:00:27 AM6/16/19

to

The following shows what the compiler would have to do.

I am not sure if it's /allowed/ to do such a rewrite.

(To make this compile without the referred library, if that's desired,
just #include the necessary standard library headers and replace
`$use_std` with `using`-declarations.)

----------------------------------------------------------------------
#include <cppx-core/all.hpp> // <url:
https://github.com/alf-p-steinbach/cppx-core>

namespace my{
$use_std( clog, endl, ostringstream, setw, stack, string );

class Logger
{
static inline int level = 0;

string m_indent;
string m_funcname;
string m_arglist;

public:
~Logger()
{
--level;
clog << m_indent << "< " << m_funcname << m_arglist << endl;
}

template< class... Args >
Logger( const string& funcname, const Args&... args ):
m_indent( 4*level, ' ' ),
m_funcname( funcname )
{
if constexpr( sizeof...( args ) > 0 ) {
ostringstream stream;
stream << "(";
auto output = [&stream, n = 0]( auto v ) mutable -> int
{
if( n > 0 ) { stream << ", "; }
stream << v;
++n;
return 0;
};
const int a[] = { output( args )... }; (void)a;
stream << ")";
m_arglist = stream.str();
}
clog << m_indent << "> " << m_funcname << m_arglist << endl;
++level;
}
};

auto recursive_gcd( const int a, const int b )
-> int
{
const Logger logging( __func__, a, b );
const int r = a % b;
if( r == 0 ) { return b; }
return recursive_gcd( b, r );
}

auto tail_optimized_gcd( int a, int b )
-> int
{
stack<Logger> logging_stack;
for( ;; )
{
logging_stack.emplace( __func__, a, b );
const int r = a % b; // const int r = a % b;
if( r == 0 ) // if( r == 0 ) {
return b; }
{
while( not logging_stack.empty() ) { logging_stack.pop(); }
return b;
}
a = b; b = r; // return
recursive_gcd( b, r );
}
}

} // namespace my

namespace app{
$use_std( cout, endl );

void run()
{
cout << "Recursive:" << endl;
const int r_gcd = my::recursive_gcd( 60, 16 );
cout << "recursive_gcd(60, 16) = " << r_gcd << endl;
cout << endl;
cout << "Tail-optimized:" << endl;
const int to_gcd = my::tail_optimized_gcd( 60, 16 );
cout << "tail_optimized_gcd(60, 16) = " << to_gcd << endl;
}

} // namespace app

auto main() -> int { app::run(); }
----------------------------------------------------------------------

Output:

Recursive:
> recursive_gcd(60, 16)
> recursive_gcd(16, 12)
> recursive_gcd(12, 4)
< recursive_gcd(12, 4)
< recursive_gcd(16, 12)
< recursive_gcd(60, 16)
recursive_gcd(60, 16) = 4

Tail-optimized:
> tail_optimized_gcd(60, 16)
> tail_optimized_gcd(16, 12)
> tail_optimized_gcd(12, 4)
< tail_optimized_gcd(12, 4)
< tail_optimized_gcd(16, 12)
< tail_optimized_gcd(60, 16)
tail_optimized_gcd(60, 16) = 4

Irrelevant to the question of tail optimization, but I was surprised
that one had to manually clear the stack to get LIFO destruction. Hm!

Cheers!,

- Alf

Chris Vine

unread,

Jun 16, 2019, 11:08:41 AM6/16/19

to

On Sun, 16 Jun 2019 13:55:59 +0200

> then you don't get the optimisation. My point - and I hope this is the
> last time I have to write the same thing here - is that destructors are
> not special.

I never said they were special. I said that in C++ they generally
remove the ability of the compiler to perform tail call elimination
where non-trivial objects are in local scope when the recursive call is
made, because that call is (contrary to appearances) not in tail
position. I was right. Insofar as you are saying otherwise, I think you
are wrong. I hope this is the last time I have to write the same thing
here.

But you now state that you are saying something else. Me: "Cooked
apples are best served with custard." You: "No, pears are best served
with cream."

> They are just like any other function invocation. If they
> are simple enough, or if the compiler can figure out how to re-arrange
> things - then they don't hinder the optimisation. (And many destructors
> /are/ simple.) Destructors don't have to be called at the end of the
> scope or in reverse order of construction - but their observable effects
> must occur exactly as if they had been called in that place and order.
> Just like any other function call.
>
> Destructors that do complex things, like deallocate memory and
> resources, make it unlikely that a compiler can do tail recursive
> optimisations. But it is nothing special about destructors - it is the
> same as any other code.
>
> > The proof of the pudding is in the eating: can you actually produce some
> > code where tail call elimination is applied with non-trivial objects in
> > local scope? If you can, a few lines of code should suffice to
> > demonstrate this.

To continue the culinary analogy, I still await some code, if you say
the case exists. (But I think you are now saying that you agree after
all with the point I made three posts up-thread.)

David Brown

unread,

Jun 16, 2019, 5:01:03 PM6/16/19

to

On 16/06/2019 17:08, Chris Vine wrote:
> On Sun, 16 Jun 2019 13:55:59 +0200
>>

>> Of course a compiler can't do magic - if the operation of the destructor
>> is too complicated to re-arrange to get tail recursion optimisation
>> then you don't get the optimisation. My point - and I hope this is the
>> last time I have to write the same thing here - is that destructors are
>> not special.
>
> I never said they were special. I said that in C++ they generally
> remove the ability of the compiler to perform tail call elimination
> where non-trivial objects are in local scope when the recursive call is
> made, because that call is (contrary to appearances) not in tail
> position. I was right. Insofar as you are saying otherwise, I think you
> are wrong. I hope this is the last time I have to write the same thing
> here.
>
> But you now state that you are saying something else. Me: "Cooked
> apples are best served with custard." You: "No, pears are best served
> with cream."
>

I think a better analogy would be Me: "Cooked apples are best served
with custard." You: "No, apples are not best served with cream. They
are best served with custard".

Let's leave it at that, because I think we agree on the actual technical
points and it helps no one to disagree about how we thought we were
saying different things.

Melzzzzz

unread,

Jun 16, 2019, 5:03:46 PM6/16/19

to

You have to show that your claim is right. But it isnt. Besides
destructors, bigger problem is exception handling. Each function have to
install exception handler you have either run time penalty or mem size
penalty.

Juha Nieminen

unread,

Jun 17, 2019, 8:30:41 AM6/17/19

to

Thiago Adams <thiago...@gmail.com> wrote:
> The solution for extra safety is is not to use recursion and dynamic allocations.

Some algorithms require either one.

(Some algorithms, even quite commonly used ones, are provably recursive,
meaning that they cannot be done iteratively. You can "emulate" the recursion
stack by creating your own stack data structure instead of relying on actual
recursive function calls, but it's still a recursive algorithm that requires
an amount of extra memory proportional to the amount of data to be processed.
An iterative algorithm is one that only requires a fixed amount of memory
regardless of the size of the data to be processed.)

Juha Nieminen

unread,

Jun 17, 2019, 8:41:14 AM6/17/19

to

Alf P. Steinbach <alf.p.stein...@gmail.com> wrote:
> That can lead a non-expert programmer to believe tail recursion
> optimization will be employed, when in fact it can't be (easily) employed.
>
> Well there are other gotcha's also, e.g. that at least a decade ago
> Visual C++ stubbornly refused to optimize tail recursion floating point
> type results. So if one relies on tail recursion in C++ one should
> better know what one is doing, and know one's compilers.

If relaying on such compiler optimizations, one should actually check
what kind of asm the compiler is producing from the code, for the
current particular target.

Of course this is not a safe bet if targeting multiple platforms
(even when using the same compiler, eg. gcc.) For example what may
become optimized for x86-64 might not become equally optimized
for ARM64.

James Kuyper

unread,

Jun 17, 2019, 9:02:02 AM6/17/19

to

On 6/16/19 6:35 AM, Chris Vine wrote:
...

> Yes, but generally it can't do this, because destructors may only be
> called at the end of the local scope in which the object concerned has
> its lifetime, and in the reverse order of construction of other objects
> in that scope. Any observable effect of object destruction, as required
> by the C++ standard, must occur after the recursive function call.
>
> The result of that is that a recursive function call at the end of local
> scope, which might appear to be in tail position, in fact isn't - the
> destructors still remain to execute. I cannot see how an optimizer can
> get around this where the objects concerned are other than trivial (in
> the C++-standard sense of that word).

You've already said the key phrase - but missed the implications. A
destructor might be non-trivial, yet calling it outside of the correct
order might also have no effect on the observable behavior of the
program. I suspect it's rare for non-trivial destructors to qualify for
such optimizations, but it's not impossible.

If that's the case, and particularly if the destructor is inline, an
implementation could inline the destructor call and execute the body of
the destructor before the recursive call.

> The proof of the pudding is in the eating: can you actually produce some
> code where tail call elimination is applied with non-trivial objects in
> local scope? If you can, a few lines of code should suffice to
> demonstrate this.

Note that since this is only allowed if it has no effect on the
observable behavior of the program, the best way to prove that
tail-recursion optimization has been performed would be to look that the
generated code, not at the behavior of that code. Running out of stack
space has undefined behavior, and does not qualify as observable
behavior, so you can't prove the optimization was performed just by
"observing" that it didn't run out of stack space.

Chris Vine

unread,

Jun 17, 2019, 11:46:28 AM6/17/19

to

On Mon, 17 Jun 2019 09:01:48 -0400
James Kuyper <james...@alumni.caltech.edu> wrote:
> On 6/16/19 6:35 AM, Chris Vine wrote:
> ...
> > Yes, but generally it can't do this, because destructors may only be
> > called at the end of the local scope in which the object concerned has
> > its lifetime, and in the reverse order of construction of other objects
> > in that scope. Any observable effect of object destruction, as required
> > by the C++ standard, must occur after the recursive function call.
> >
> > The result of that is that a recursive function call at the end of local
> > scope, which might appear to be in tail position, in fact isn't - the
> > destructors still remain to execute. I cannot see how an optimizer can
> > get around this where the objects concerned are other than trivial (in
> > the C++-standard sense of that word).
>
> You've already said the key phrase - but missed the implications. A
> destructor might be non-trivial, yet calling it outside of the correct
> order might also have no effect on the observable behavior of the
> program. I suspect it's rare for non-trivial destructors to qualify for
> such optimizations, but it's not impossible.
>
> If that's the case, and particularly if the destructor is inline, an
> implementation could inline the destructor call and execute the body of
> the destructor before the recursive call.

I hadn't missed the point. I just can't see that ever happening in
practice for an object which is not trivial (in the C++ sense), given
that the observable effects must be the same as if the destructor(s) had
executed after the recursive call(s). Optimization in such
circumstances is so theoretical as (in my view) likely to be
impossible. However, I was willing to be persuaded by actual code which
does it (see below).

> > The proof of the pudding is in the eating: can you actually produce some
> > code where tail call elimination is applied with non-trivial objects in
> > local scope? If you can, a few lines of code should suffice to
> > demonstrate this.
>
> Note that since this is only allowed if it has no effect on the
> observable behavior of the program, the best way to prove that
> tail-recursion optimization has been performed would be to look that the
> generated code, not at the behavior of that code. Running out of stack
> space has undefined behavior, and does not qualify as observable
> behavior, so you can't prove the optimization was performed just by
> "observing" that it didn't run out of stack space.

The way I would have been persuaded is by someone producing code
containing a recursive call with non-trivial object(s) remaining in
local scope which, when compiled to assembler (gcc -S), reuses the
existing stack frame, instead of constructing a new one and emitting a
call instruction. That should be trivial to see from the assembly code
emitted by the compiler.

Whilst that would persuade me that call elimination optimization which
is out of tail position is possible, I would still not want to rely on
it. It would be freak code.

Chris Vine

unread,

Jun 17, 2019, 12:01:23 PM6/17/19

to

On Mon, 17 Jun 2019 16:46:11 +0100

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]

> The way I would have been persuaded is by someone producing code
> containing a recursive call with non-trivial object(s) remaining in
> local scope which, when compiled to assembler (gcc -S), reuses the
> existing stack frame, instead of constructing a new one and emitting a
> call instruction. That should be trivial to see from the assembly code
> emitted by the compiler.
>
> Whilst that would persuade me that call elimination optimization which
> is out of tail position is possible, I would still not want to rely on
> it. It would be freak code.

By the way, the following would not persuade me, because the compiler
could elide construction of the string on the grounds that it is never
used and then eliminate the tail call[1]. It would have to be code
which actually constructs the object in local scope when optimized and
does something with it.

void recfunc() {
std::string s;

recfunc();
}

int main() {
recfunc(); // does this loop for ever or bust the stack?
}

[1] Surprisingly, neither gcc nor clang does so, even at -O3.

Paavo Helde

unread,

Jun 17, 2019, 12:19:55 PM6/17/19

to

On 17.06.2019 19:01, Chris Vine wrote:
> On Mon, 17 Jun 2019 16:46:11 +0100

> By the way, the following would not persuade me, because the compiler
> could elide construction of the string on the grounds that it is never
> used and then eliminate the tail call[1].
>

> [1] Surprisingly, neither gcc nor clang does so, even at -O3.

I was recently surprised to see that MSVC++2017 with full optimizations
did not optimize away the string construction in this scenario when
foo<std::string>() was called:

template<typename T> bool foo(T x) {return false;}
template<> bool foo<double>(double x) {return isnan(x);}

The memory allocator code lit up in the profiler. This was easy to fix
by using pass by reference instead, but how comes the optimizer is so
stupid? Or is there any hidden reason why it could not eliminate a
useless object copy?

Chris Vine

unread,

Jun 17, 2019, 12:50:55 PM6/17/19

to

It is odd, although the correct elimination of the construction of an
unused argument may be more difficult for the compiler to prove than the
correct elimination of the construction of an unused local object: the
observable behaviour has to be the same as if the argument were
constructed (I think).

Going back to unused local objects, it really surprised me that even at
-O3, this busts the stack with gcc and clang:

void recfunc() {
std::string s;
recfunc();
}

int main() {
recfunc(); // does this loop for ever or bust the stack?
}

But this eliminates the call instruction and emits a jump instruction
instead (so looping forever).

void recfunc() {

recfunc();
}

int main() {
recfunc(); // does this loop for ever or bust the stack?
}

Does MSVC do any better with the first one?

Paavo Helde

unread,

Jun 17, 2019, 1:37:13 PM6/17/19

to

Nope, the compiler warns:

warning C4717: 'recfunc': recursive on all control paths, function will
cause runtime stack overflow

And then dutifully goes and produces a stack overflow.

The second version issues the same warning, but then goes into infinite
jmp loop instead.

Chris Vine

unread,

Jun 17, 2019, 2:11:50 PM6/17/19

to

On Mon, 17 Jun 2019 20:36:55 +0300
Paavo Helde <myfir...@osa.pri.ee> wrote:
> On 17.06.2019 19:49, Chris Vine wrote:
[snip]

> > Does MSVC do any better with the first one?
>
> Nope, the compiler warns:
>
> warning C4717: 'recfunc': recursive on all control paths, function will
> cause runtime stack overflow
>
> And then dutifully goes and produces a stack overflow.
>
> The second version issues the same warning, but then goes into infinite
> jmp loop instead.

Interesting.

Presumably the "thinking" of gcc, clang and MSVC is that because the
destructor of std::string might have an observable effect, and because
it is mandated by the C++ standard that any such effect must take place
after the recursive call, it must construct a std::string object even
though it knows that the object is not otherwise used, and therefore
also must not eliminate the recursive call instruction.

That's pretty pathetic. std::basic_string is a template and the
bodies of the constructor and destructor must surely therefore be
visible to the compiler at compile time, which can see that there is no
effect other than the construction and destruction of the string
object itself.

It also demonstrates the improbability of call elimination being carried
out where there is a non-trivial object in local scope which _is_
actually used (so that the call cannot be in tail position).

Paavo Helde

unread,

Jun 17, 2019, 3:28:13 PM6/17/19

to

On 17.06.2019 19:49, Chris Vine wrote:
> On Mon, 17 Jun 2019 19:19:44 +0300
> Paavo Helde <myfir...@osa.pri.ee> wrote:
>> On 17.06.2019 19:01, Chris Vine wrote:
>>> On Mon, 17 Jun 2019 16:46:11 +0100
>>
>>> By the way, the following would not persuade me, because the compiler
>>> could elide construction of the string on the grounds that it is never
>>> used and then eliminate the tail call[1].
>>>
>>> [1] Surprisingly, neither gcc nor clang does so, even at -O3.
>>
>> I was recently surprised to see that MSVC++2017 with full optimizations
>> did not optimize away the string construction in this scenario when
>> foo<std::string>() was called:
>>
>> template<typename T> bool foo(T x) {return false;}
>> template<> bool foo<double>(double x) {return isnan(x);}
>>
>> The memory allocator code lit up in the profiler. This was easy to fix
>> by using pass by reference instead, but how comes the optimizer is so
>> stupid? Or is there any hidden reason why it could not eliminate a
>> useless object copy?
>
> It is odd, although the correct elimination of the construction of an
> unused argument may be more difficult for the compiler to prove than the
> correct elimination of the construction of an unused local object: the
> observable behaviour has to be the same as if the argument were
> constructed (I think).

I do not see how this might be the case. A local variable construction
of an otherwise unused object might be very significant (think of a
mutex lock object). So it cannot be optimized away unless the compiler
can really prove there is no change in observable behavior.

There are only limited situations where the standard allows to optimize
away code even with potentially observable behavior, and these are all
related to copy constructor elision. An unused argument might certainly
be constructed by a copy constructor, but by some reason this is not
listed in the standard as an allowed scenario for copy ctor elision.

So in both cases the observable behavior has to be retained, and I do
not see much difference in these scenarios.

I guess the reason why std::string construction cannot be optimized away
is that it potentially calls memory allocation and deallocation routines
in its ctors and dtor, which from the compiler viewpoint could have
unknown side effects.

Chris Vine

unread,

Jun 17, 2019, 4:10:16 PM6/17/19

to

On Mon, 17 Jun 2019 22:28:02 +0300

I agree it would be very difficult to optimize out a mutex lock object,
because mutex locking definitely has observable side-effects - it
causes memory to synchronize and might cause the emission of a fence
instruction. That is not the case here.

> There are only limited situations where the standard allows to optimize
> away code even with potentially observable behavior, and these are all
> related to copy constructor elision. An unused argument might certainly
> be constructed by a copy constructor, but by some reason this is not
> listed in the standard as an allowed scenario for copy ctor elision.
>
> So in both cases the observable behavior has to be retained, and I do
> not see much difference in these scenarios.

As you say, copy elision is a special permission given by the C++
standard which applies notwithstanding that the elision might be
observable. Optimization elision is relevant where there aren't any
observable effects (where the observable behaviour of the program does
not differ depending on whether or not the object concerned is
constructed).

Copy elision at any rate used to apply to (a) certain return statements,
and (b) certain object initializations. In C++17 as I understand it the
first of those is now mandatory where the return statement is formed
from a prvalue (a temporary). These are not relevant to the cases we
have been discussing.

> I guess the reason why std::string construction cannot be optimized away
> is that it potentially calls memory allocation and deallocation routines
> in its ctors and dtor, which from the compiler viewpoint could have
> unknown side effects.

So I agree with you except on two points. First, I don't think
allocating and deallocating memory counts as an observable side
effect. It is a feature of the language. The fact that it might lead
to an exhaustion of memory for other allocations doesn't seem to me to
count as an observable effect. If I am wrong it explains why the
compilers would not optimize out the string object in the code we have
discussed, but I would take a lot of persuading that memory
allocation does count as a side effect in this way (but I don't mind
being proved wrong).

Secondly, I still suspect that it could be more difficult for a compiler
to establish that an unused argument has no side effects than an unused
local object. But I am not a compiler writer and I defer on that to
those who are.

Chris Vine

unread,

Jun 17, 2019, 4:36:27 PM6/17/19

to

On Mon, 17 Jun 2019 21:09:58 +0100

Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
[snip]

> So I agree with you except on two points. First, I don't think
> allocating and deallocating memory counts as an observable side
> effect. It is a feature of the language. The fact that it might lead
> to an exhaustion of memory for other allocations doesn't seem to me to
> count as an observable effect. If I am wrong it explains why the
> compilers would not optimize out the string object in the code we have
> discussed, but I would take a lot of persuading that memory
> allocation does count as a side effect in this way (but I don't mind
> being proved wrong).

It follows that I can envisage a case where the existence of a
std::string object in local scope does not preclude call elimination
optimization even where the string is used, provided that the string is
not used in connection with the provision of an argument for the
recursive call (using it as an argument would prevent early
destruction).

I suppose one could think of this as the Socratic method in operation
but I not entirely happy with that conclusion.

Paavo Helde

unread,

Jun 17, 2019, 4:47:48 PM6/17/19

to

On 17.06.2019 23:09, Chris Vine wrote:
>
> So I agree with you except on two points. First, I don't think
> allocating and deallocating memory counts as an observable side
> effect. It is a feature of the language. The fact that it might lead
> to an exhaustion of memory for other allocations doesn't seem to me to
> count as an observable effect.

A typical memory allocation routine modifies the state of the memory
allocator data structures, either global or thread-specific. I guess
this would qualify as an observable side effect, regardless of memory
exhaustion.

Chris Vine

unread,

Jun 17, 2019, 5:01:39 PM6/17/19

to

I don't think so. I did think about whether the fact that an allocator
synchronizes its book-keeping data in a multi-threaded program, and
therefore as a side effect in some sense synchronizes between any
threads using the allocator, counts as an observable effect, and
concluded that it didn't. It seems to me that it is an observable
effect in the abstract machine which we are looking for. Perhaps the
one or two posters who read the C++ standard ten times before breakfast
could help us.

James Kuyper

unread,

Jun 17, 2019, 8:14:13 PM6/17/19

to

On 6/17/19 4:09 PM, Chris Vine wrote:
...

> So I agree with you except on two points. First, I don't think
> allocating and deallocating memory counts as an observable side
> effect. It is a feature of the language.

It doesn't count as observable behavior, but not for that reason -
whether or not something is a language feature doesn't play any roll in
the definition of "observable behavior":

"The least requirements on a conforming implementation are:
— Accesses through volatile glvalues are evaluated strictly according to
the rules of the abstract machine.
— At program termination, all data written into files shall be identical
to one of the possible results that execution of the program according
to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place
in such a fashion that prompting output is actually delivered before a
program waits for input. What constitutes an interactive device is
implementation-defined.
These collectively are referred to as the _observable behavior_ of the
program." (4.6p7)

The phrase "observable behavior" near the end of that clause is in
italics, an ISO convention which identifies this clause as the official
definition of that term.

Öö Tiib

unread,

Jun 18, 2019, 4:00:37 AM6/18/19

to

I suspect that implementation of standard allocators involves
(potentially) system calls that involve (potentially) accesses of
volatile glvalues.
Result is that code like "delete new int;" won't be optimized out.
For same reason usage of std::string won't be optimized out where
usage of std::string_view will be.

Bonita Montero

unread,

Jun 18, 2019, 4:54:27 AM6/18/19

to

> I wish that were true. With our script interpreter I can get a stack
> overflow with just calculating factorial of 210 with a simple recursive
> procedure. It implements recursion on real stack, ...

Ok, that's one of the rare cases where this might happen.
But running an interpreter on the real stack is a design-failure.
Usually you would parse the code into a code for s stack-machine
and interpret the code iterative with a stack in in an allocated
memory-block.

Bart

unread,

Jun 18, 2019, 6:48:54 AM6/18/19

to

I write interpreters with a software stack, but up to now have never put
in a stack overflow check. It would be too expensive to do so on every push.

However I've just put it such a check for a 'call' operation (with a
margin of 1000 slots to allow for typical use within a function to
evaluate expressions, build argument lists, list constructors etc), but
I couldn't measure any significant difference.

I tried the highly recursive Ackermann (for stack depth) and Fibonacci
(for quantity of calls) benchmarks. The former seemed to run 1% faster
(thanks to the crazy way that x64 works), the latter up to 0.4% slower.

The use of the stack in such a language is different from C or C++, in
that the stack entries consist exclusively of 16-byte slots. There is no
arbitrary-sized data such as C structs or fixed arrays or VLAs which
could consume 100s or 1000s of bytes or more for one variable. Each
would at most be 16 bytes.

-------------------

BTW here's a program to calculate 210! It appears to need 4 slots for
each call level (for return value, return address, the 'n' argument, and
one to hold the 'n' to multiply against), or 3 slots if the
multiplication order is reversed, so will work easily with a stack of
1000 entries:

proc start=
println factorial(210L)
end

function factorial(n)=
return (n<=1 | 1 | n*factorial(n-1))
end

Output is 105823...00000 (some 300 digits). Although each 'n' needs to
be a big number, each stack slot is a constant 16 bytes.

Bonita Montero

unread,

Jun 18, 2019, 7:28:57 AM6/18/19

to

> I write interpreters with a software stack, but up to now have never put
> in a stack overflow check. It would be too expensive to do so on every
> push.

If you simply use a vector<T> as a stack you won't have to explicitly
check for stack-overflows. Stack-overflows are then just an exception
in your code that could be bundeled with several other memory-collapse
conditions.

Bart

unread,

Jun 18, 2019, 8:07:19 AM6/18/19

to

Actually that wasn't written in C++, and if it was targeted at C++, it
wouldn't use such a type.

With interpreters you tend not to need to use the higher level features
of the host language, as you want as much control as possible (eg.
CPython is written in C).

Does vector<T> do a bounds check on every access, even when not pushing
anything to the stack it represents? That might be too big a hit.

In mine, /everything/ revolves around the stack - you are accessing
elements that you /know/ are on the stack and not outside it.

(The stack also is pushed downwards, but a type like vector<T> would
suit an upward-growing stack better. This was for several reasons, but
at one time when coding with the help of assembly, it did borrow the
machine's stack instructions, with the hardware stack pointer switched
to point to the interpreter stack.)

Paavo Helde

unread,

Jun 18, 2019, 8:49:13 AM6/18/19

to

On 18.06.2019 15:07, Bart wrote:
> On 18/06/2019 12:28, Bonita Montero wrote:
>>> I write interpreters with a software stack, but up to now have never
>>> put in a stack overflow check. It would be too expensive to do so on
>>> every push.
>>
>> If you simply use a vector<T> as a stack you won't have to explicitly
>> check for stack-overflows. Stack-overflows are then just an exception
>> in your code that could be bundeled with several other memory-collapse
>> conditions.

For emulating a stack std::deque would be a better option than std::vector.

>
> Actually that wasn't written in C++, and if it was targeted at C++, it
> wouldn't use such a type.
>
> With interpreters you tend not to need to use the higher level features
> of the host language, as you want as much control as possible (eg.
> CPython is written in C).

But tensorflow is written in C++, for example.

C++ gives you all the control you can imagine and is especially useful
in writing script interpreters. For example, I have written some C++
code interfacing Python at the C level, and the needed manual reference
counting was a pain. I ended up with writing a little C++ wrapper which
performs this reference counting automatically.

>
> Does vector<T> do a bounds check on every access, even when not pushing
> anything to the stack it represents? That might be too big a hit.

operator[] does not check bounds, the at() member function does. Also,
for some compilers you need extra tuning for suppressing all checks,
like '#define _SECURE_SCL 0' (however, from the docs it looks like they
have come to their senses by now and disabled checked iterators by
default in Release mode).

>
> In mine, /everything/ revolves around the stack - you are accessing
> elements that you /know/ are on the stack and not outside it.
>
> (The stack also is pushed downwards, but a type like vector<T> would
> suit an upward-growing stack better. This was for several reasons, but
> at one time when coding with the help of assembly, it did borrow the
> machine's stack instructions, with the hardware stack pointer switched
> to point to the interpreter stack.)

Using a single line of assembler will drastically reduce the portability
of the program and increase the cost of maintenance. I have not yet
encountered a performance problem which required assembler coding. The
closest I have come is to use compiler intrinsics like __builtin_clzll().

Bonita Montero

unread,

Jun 18, 2019, 10:59:17 AM6/18/19

to

> For emulating a stack std::deque would be a better option than std::vector.

A vector would be faster since it is extended only when its grown beyond
its largest size so far. A deque<T> allocates new items on every push.

Bart

unread,

Jun 18, 2019, 12:15:55 PM6/18/19

to

The version of this particular project about a year ago (which is not
written in C), included an /optional/ 6000-line ASM module for x64. Yet
it could be translated to C (building as C++ would be feasible with
loads of extra casts, but the target would be C), and even run on ARM.

The ASM module provided an acceleration layer, which, even combined with
its non-optimised native language compiler for the bulk of the code, can
make many programs run 2-3 times as fast, compared with translating to C
and compiling with gcc-O3.

Some applications are special cases where use of ASM can make a real
difference, and an interpreter is one such case. Ie. where there are
clear bottlenecks.

(The trend now seems to be the use special kinds of JIT compilation, to
dynamically turn some interpreted code-paths into native code ones, but
that is a different to just making byte-code run as fast as possible
which is what I do.)

Paavo Helde

unread,

Jun 18, 2019, 12:53:24 PM6/18/19

to

No, it does not. Maybe you confuse it with std::list?

James Kuyper

unread,

Jun 18, 2019, 7:57:24 PM6/18/19

to

On 6/18/19 8:49 AM, Paavo Helde wrote:
...

> For emulating a stack std::deque would be a better option than std::vector.

Why?

Actually, I'd think that
std::stack<T, std::vector<T> > would be slightly better for emulating a
stack: it would allow you to use push() and pop().

Melzzzzz

unread,

Jun 19, 2019, 12:29:39 AM6/19/19

to

There is no point. Local variables and constants cannot be placed in
that vector. We will always depend on stack size being limited.
Compiler can always use segmented stacks, but alas for performance
reasons they don't want to do that....

--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Paavo Helde

unread,

Jun 19, 2019, 1:38:36 AM6/19/19

to

On 19.06.2019 2:57, James Kuyper wrote:
> On 6/18/19 8:49 AM, Paavo Helde wrote:
> ...
>> For emulating a stack std::deque would be a better option than std::vector.
>
> Why?

The same reasons why std::stack defaults to using std::deque underneath.
With a stack you typically only work with a small part of it, std::deque
ensures that small parts are compact in memory, and at the same time it
does not waste time on reallocating and copying over all elements when
it needs to grow.

OTOH, std::deque has a bit more indirection when accessing the elements,
which might or might not be significant.

Anyway, the choice depends on the usage case and should be
measured/profiled.

> Actually, I'd think that
> std::stack<T, std::vector<T> > would be slightly better for emulating a
> stack: it would allow you to use push() and pop().

I have often tried to use std::stack, then discovered I still need to
access elements beneath top(), and switched over to plain std::deque or
std::vector.

David Brown

unread,

Jun 19, 2019, 5:18:43 AM6/19/19

to

On 19/06/2019 06:29, Melzzzzz wrote:
> On 2019-06-18, James Kuyper <james...@alumni.caltech.edu> wrote:
>> On 6/18/19 8:49 AM, Paavo Helde wrote:
>> ...
>>> For emulating a stack std::deque would be a better option than std::vector.
>>
>> Why?
>>
>> Actually, I'd think that
>> std::stack<T, std::vector<T> > would be slightly better for emulating a
>> stack: it would allow you to use push() and pop().
>
> There is no point. Local variables and constants cannot be placed in
> that vector. We will always depend on stack size being limited.
> Compiler can always use segmented stacks, but alas for performance
> reasons they don't want to do that....
>

Compilers /do/ use segmented stacks - with the right compiler, options,
target cpu, host OS, etc. Look up gcc's "split stack" feature for
example - and no doubt other compilers have support too. It is not a
feature you'd want without reason, as it complicates (and thus slows
down) the code - but in some cases it is worth using.

Tim Rentsch

unread,

Jun 19, 2019, 4:18:51 PM6/19/19

to

Juha Nieminen <nos...@thanks.invalid> writes:

> Alf P. Steinbach <alf.p.stein...@gmail.com> wrote:
>
>> That can lead a non-expert programmer to believe tail recursion
>> optimization will be employed, when in fact it can't be (easily) employed.
>>
>> Well there are other gotcha's also, e.g. that at least a decade ago
>> Visual C++ stubbornly refused to optimize tail recursion floating point
>> type results. So if one relies on tail recursion in C++ one should
>> better know what one is doing, and know one's compilers.
>
> If relaying on such compiler optimizations, one should actually check
> what kind of asm the compiler is producing from the code, [for all
> platforms being targeted].

Absolutely. And ideally with an automated tool so the check can
be done reliably.

Tim Rentsch

unread,

Jun 20, 2019, 9:29:05 AM6/20/19

to

Paavo Helde <myfir...@osa.pri.ee> writes:

> On 15.06.2019 14:59, Bonita Montero wrote:
>
>>> But what exactly happens if eg. your recursion level is too big and
>>> you run out of stack space?
>>
>> If you won't do massive alloca()s on most platforms there will
>> never be such a deep recursion-level that the stack.space is
>> exhausted. Consider that if you have thousands of recursion-levels
>> you also will have not only an according depth of the stack but
>> also a with of the recursions per node, i.e. your computations will
>> last infinitely long.

>
> I wish that were true. With our script interpreter I can get a
> stack overflow with just calculating factorial of 210 with a simple

> recursive procedure. It implements recursion on real stack, from
> the debugger I see that each script-level stack frame takes 4848
> bytes and uses 5 C++ function call frames. The full C++ stack in
> the debugger is 1067 frames deep and uses up 1MB (the default stack
> size in MSVC++).

(<aside>A default of only 1MB for the stack is -- please excuse
my emphatic comment -- inexcusable in this day and age, and was
even 20 years ago.</aside>)

> Oh, and it takes just 0.15 seconds to get this stack overflow.
> There is no alloca() and there are no special coding efforts spent
> on optimizing the stack usage, in either direction.
>
> Fortunately this script interpreter is not meant to be used for such
> silly recursive procedures, so there is no urgent need to fix
> anything. The relevant case has sit in the backlog over 10 years.
>
> This example just shows that using such strong words like "never"
> and "infinitely long" is not quite appropriate, one can get stack
> overflow pretty easily. It's not a purely academic concern.

Nice example. The conclusion is directly on point.

I think there is another lesson here that is worth pointing out.
The difficulty in this case is not recursion but using linear
recursion rather than logarithmic recursion. It is fairly
straightforward to write a recursive factorial algorithm that
divides the range into two halves at each step, giving a stack
depth that is logarithmic in N rather than linear in N. (Doing
this also gives a significant performance improvement in cases
where multiple-precision arithmetic is involved.) Recursion that
is logarithmic in problem size should never present a problem
with using too much stack space (not counting very severely
constrained execution environments, which have their own special
considerations).

Recursion is a powerful tool, but like any tool it should not be
used indiscriminately or naively. Which is basically the same
as what you were saying.

Tim Rentsch

unread,

Jun 20, 2019, 9:43:13 AM6/20/19

to

I believe this conclusion is correct. Observable behavior is
always about behavior specified in the abstract machine, not
about what might occur in the actual machine.

> Perhaps the one or two posters who read the C++ standard ten
> times before breakfast could help us.

My comment is based on a memory of an earlier reading in the C++
standard (and many earlier readings of the C standard). I
haven't gone back (at least not yet) and double checked this;
I'll try to post a followup if and when I do.

Paavo Helde

unread,

Jun 20, 2019, 10:30:26 AM6/20/19

to

On 20.06.2019 16:28, Tim Rentsch wrote:
> (<aside>A default of only 1MB for the stack is -- please excuse
> my emphatic comment -- inexcusable in this day and age, and was
> even 20 years ago.</aside>)

I remember I once tried to raise the stack size to 20 MB and my
colleagues were very upset when the programs started to run out of
virtual address space when trying to create about 100 threads.

To be honest, this was in the era of 32-bit programs (but still less
than 20 years back!). 64-bit address space is supposed to take away such
worries, but for example the current x86-64 actually supports only
48-bit virtual address space, which is only 65,000 times more - not so
infinite as it first feels.

David Brown

unread,

Jun 20, 2019, 11:34:21 AM6/20/19

to

That is a very good point.

> It is fairly
> straightforward to write a recursive factorial algorithm that
> divides the range into two halves at each step, giving a stack
> depth that is logarithmic in N rather than linear in N. (Doing
> this also gives a significant performance improvement in cases
> where multiple-precision arithmetic is involved.)

A fun challenge with this sort of thing is deciding how to split up the
calculations most efficiently. I did a quick test (using Python,
because it has large number support effortlessly) and saw a difference
in splitting strategies. Method 1 was:

Π(1, 2, 3, 4, 5, 6, 7, 8)
=> Π(1, 2, 3, 4) * Π(5, 6, 7, 8)
=> (Π(1, 2) * Π(3, 4)) * (Π(5, 6) * Π(7, 8))

Method 2 was:

Π(1, 2, 3, 4, 5, 6, 7, 8)
=> Π(1, 3, 5, 7) * Π(2, 4, 6, 8)
=> (Π(1, 5) * Π(3, 7)) * (Π(2, 6) * Π(4, 8))

This second split was 5% faster for 200000!. Pulling out powers of 2
here could probably speed things up some more and also allow memoizing.

Tim Rentsch

unread,

Jun 20, 2019, 11:58:02 AM6/20/19

to

The statement in the last sentence is too strong. Consider the
following scenario. With a particular code transformation applied
(for example, removing an unused local object, along with its
constructor/destructor calls), the observable behavior is X. If
the transformation were not applied, the observable behavior might
be X or might be Y (depending, say, on the timing of some OS
process scheduling, but not otherwise distinguishable). It cannot
be shown that there will be no change in the observable behavior,
because there might have been. Even so, it's okay to transform
the code (and remove the unused local), because the untransformed
behavior X could have happened serendipitously, and there is no
way for the program to tell otherwise.

Admittedly, in practice this distinction is unlikely to make a
difference. In most cases it's hard to show that a change _might_
not happen without also showing that a change _will_ not happen.
But the mathematician part of my brain feels obliged to point out
the distinction, even if in practice it might make a difference
only very rarely.

Tim Rentsch

unread,

Jun 22, 2019, 7:14:27 AM6/22/19

to

Point taken.

I still think 1MB of stack space is on the low side, by at least
a factor of 2, even for 32-bit environments. For 64-bit
environments, the default could be 1GB, and even 2500 threads
would use only 1% of the (usable) address space.

David Brown

unread,

Jun 22, 2019, 8:27:26 AM6/22/19

to

There is a cost in allocating virtual space to a stack, even if it is
not used (and therefore never mapped to physical memory). Since very
few programs ever need anything like 1 MB of stack space, it is not a
cost worth paying. It would make more sense to have a smaller stack
space by default (1 MB is probably much bigger than needed, especially
for threads other than the main thread) and automatically grow the stack
when needed. (I don't know if this is done at the moment - for the
small amount of PC programming I have done with C and C++, stack space
is not an issue.)

Tim Rentsch

unread,

Jun 22, 2019, 9:09:49 AM6/22/19

to

This has been an interesting thread. Out of curiosity I tried
some examples. Here are two, with some amusing results:

#include <iostream>

int
one( int x ){
{ std::string s; }
return x ? one( x-1 ) : 7;
}

int
two( int x ){
delete new int;
return x ? two( x-1 ) : 5;
}

Compiling with clang (-O3), function two() compiles to just a
straight return of the value 5, and function one() compiles to
several instructions including a conditional branch and a
recursive call (bad! bad compiler!). So the 'delete new int;'
got optimized away, but the unused variable didn't.

Compiling with g++ (-O3), function one() compiles to just a
straight return of the value 7, and function two() compiles to a
loop with an allocation and deallocation each time around before
eventually always returning the value 5. So the unused variable
got optmized away, but the 'delete new int;' didn't (which didn't
prevent the tail call being optimized).

Ian Collins

unread,

Jun 22, 2019, 7:39:05 PM6/22/19

to

The default stack limit on Linux is 8MB, which can cause much amusement
on small systems when the abomination known as over-commit is disabled :)

--
Ian.

Nathaniel Tripp

unread,

Jun 22, 2019, 7:50:55 PM6/22/19

to

> How exactly should programs be designed, if one wanted to be extra
> paranoid about the possibility of running out of stack space? Or is
> it something that there's simply no recourse against?

Generally speaking this is handled by implementing a finite state
machine in your coding logic, e.g. for (;;) { switch (var) {} } and then
intermittently using things like setjmp/longjmp or similar (cringe).

Most all scripting languages have to cope with this, if you dig through
the source code for whichever you find easiest to read you'll find they
all mostly do the same thing. The short answer however is they mostly
avoid recursing where possible and emulate functionality via a FSM.

the *printf() family of functions is vaguely similar in form and function.

Chris M. Thomasson

unread,

Jun 23, 2019, 4:06:58 AM6/23/19

to

On 6/14/2019 5:44 AM, Juha Nieminen wrote:
> One thing I have always felt as being astonishingly under-defined
> both in C and C++ is the concept of the stack, how much of it a
> program has available, and especially, what happens if you run
> out of it.
>
> With dynamic memory allocation there's always a mechanism to deal
> with the case that it fails: Either a null pointer is returned, or
> an exception is thrown (depending on how exactly you are doing the
> allocation). In either case, you can deal with the problem.

>
> But what exactly happens if eg. your recursion level is too big and
> you run out of stack space?
>

> To be honest, I actually don't know what the standard says about this
> (and in fact, this is a question I'm presenting here.) I would imagine
> it's UB or implementation-defined behavior. And I'm guessing there's
> simply no way of knowing *when* you are going to run out of stack
> space, or to gracefully recover from it.

>
> How exactly should programs be designed, if one wanted to be extra
> paranoid about the possibility of running out of stack space? Or is
> it something that there's simply no recourse against?
>

Funny thing. One time I created two programs. A and B. A would run B on
the system during the "setup" phase, and see how far it could recurse
before B crashed. A would notice the failure, record the current number,
and try again with a slightly smaller number. If B runs to completion, A
recorded the recursion level as a so-called "critical max" to be
avoided. Of course I want to avoid recursion when possible. But, the
code had to use some third-party stuff that used it.

Bonita Montero

unread,

Jun 23, 2019, 9:36:57 AM6/23/19

to

> The default stack limit on Linux is 8MB, which can cause much amusement
> on small systems when the abomination known as over-commit is disabled :)

That's not an issue since the untouched pages are just subtracted from
the pagefine on creation of the stack. I.e. the address-range is logi-
cally allocated but not physically until the pages are written.

Alf P. Steinbach

unread,

Jun 23, 2019, 9:53:17 AM6/23/19

to

What's the difference between that (which I believe is called virtual
alloc) and over-commit?

Cheers!,

- Alf

Bonita Montero

unread,

Jun 23, 2019, 11:56:59 AM6/23/19

to

> What's the difference between that (which I believe is called virtual
> alloc) and over-commit?

With overcommit the pages arent subtracted from the pagefile on
allocation.

Ian Collins

unread,

Jun 23, 2019, 3:48:48 PM6/23/19

to

If you are going to reply to my posts, don't snip the attributions.

--
Ian.

Tim Rentsch

unread,

Jun 23, 2019, 3:51:44 PM6/23/19

to

James Kuyper <james...@alumni.caltech.edu> writes:

> On 6/14/19 8:44 AM, Juha Nieminen wrote:
>
>> One thing I have always felt as being astonishingly under-defined
>> both in C and C++ is the concept of the stack, how much of it a
>> program has available, and especially, what happens if you run out
>> of it.
>>
>> With dynamic memory allocation there's always a mechanism to deal
>> with the case that it fails: Either a null pointer is returned, or
>> an exception is thrown (depending on how exactly you are doing the
>> allocation). In either case, you can deal with the problem.
>>
>> But what exactly happens if eg. your recursion level is too big and
>> you run out of stack space?
>>
>> To be honest, I actually don't know what the standard says about
>> this
>

> [...] The closest it comes to talking about such matters is to
> define the term "implementation limits": "restrictions imposed
> upon programs by the implementation". [...]

What happens when program execution runs out of stack space does
not involve implementation limits, because the software that
determines how much stack space is available is not part of the
implementation. Rather it is part of "a data-processing system"
that supports the implementation, and about which the standard is
specifically and explicitly mute (per the C standard, incorporated
normatively by the C++ standard in section 1 paragraph 2).

> It's all very vague and open to dispute. [...]

It is open to dispute. How vague it is is also open to dispute.

Tim Rentsch

unread,

Jun 23, 2019, 4:03:43 PM6/23/19

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:

> James Kuyper <james...@alumni.caltech.edu> writes:
>
>> Nothing. Almost all implementations of C++ use a stack structure to
>> store local variables, and many of them do so using a hardware stack,
>> but the standard says almost nothing about such issues.
>
> There is,
>
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1637.pdf
>
> which should discuss this under
>
> VI. STRICTLY CONFORMING C PROGRAMS DO NOT EXIST

Their analysis is flawed. The program

int
main( void ){
char a[65000];
return 0;
}

has defined behavior, regardless of what happens when the
program is run.

> . The situation for C++ should be similar.

An argument could be made that "undefined behavior" means
something different in C++ than it does in C, so the same
inferences might not apply. However, AFAIAA everyone familiar
with the two languages either expects or assumes that the term
means the same thing in both languages, which implies that the
analysis is flawed for C++ also.

Chris M. Thomasson

unread,

Jun 23, 2019, 5:07:57 PM6/23/19

to

She always seemed to have a little problem with this. I tried to get her
to change way back in the following thread:

https://groups.google.com/d/msg/comp.lang.c++/kAz1VAxD2lI/hm6BMuEJAQAJ

;^)

Ian Collins

unread,

Jun 24, 2019, 12:05:53 AM6/24/19

to

If everyone ignored rude posters, they might mend their ways. There's
no reason for bad behaviour other than pigheadedness.

--
Ian.

Bonita Montero

unread,

Jun 24, 2019, 2:03:19 AM6/24/19

to

>>> The default stack limit on Linux is 8MB, which can cause much amusement
>>> on small systems when the abomination known as over-commit is
>>> disabled :)

>> That's not an issue since the untouched pages are just subtracted from
>> the pagefine on creation of the stack. I.e. the address-range is logi-
>> cally allocated but not physically until the pages are written.

> If you are going to reply to my posts, don't snip the attributions.

If you are going to reply to my posts, strip the attributions.
Otherwise I consider you as rude!!!!!!!!!!!!!1111

David Brown

unread,

Jun 24, 2019, 4:36:30 AM6/24/19

to

On 24/06/2019 08:03, Bonita Montero wrote:

> If you are going to reply to my posts, strip the attributions.
> Otherwise I consider you as rude!!!!!!!!!!!!!1111

I'm sorry, but if /you/ are wanting to join in a Usenet group, then
/you/ should follow the conventions of Usenet that were established some
forty years ago and which are followed by the huge majority of people
here. You don't get to change the rules.

However, I am sure plenty of people will comply with your request by
simply not replying to you, and we expect you to do the same by not
replying to others without proper attributions.

I will certainly welcome you back to the group when you have dropped
your unusual selfish and utterly pointless attitude.

Bonita Montero

unread,

Jun 24, 2019, 5:21:17 AM6/24/19

to

> However, I am sure plenty of people will comply with your request by

> simply not replying to you, ...

LOL; it won't make a difference because people think it's more important
to tell something than to sanction this gewgaw.

Scott Lurndal

unread,

Jun 24, 2019, 11:12:46 AM6/24/19

to

Hence may not be available, causing the process to recieve a fatal
(if uncaught) signal.

Bonita Montero

unread,

Jun 24, 2019, 3:25:52 PM6/24/19

to

> Nothing. Almost all implementations of C++ use a stack structure to
> store local variables, and many of them do so using a hardware stack,
> but the standard says almost nothing about such issues.

C and C++ allow recursions.
And recursions aren't possible without a stack.

Keith Thompson

unread,

Jun 24, 2019, 4:09:37 PM6/24/19

to

*plonk*

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Queequeg

unread,

Jun 24, 2019, 5:30:15 PM6/24/19

to

Bonita Montero <Bonita....@gmail.com> wrote:

> If you are going to reply to my posts, strip the attributions.
> Otherwise I consider you as rude!!!!!!!!!!!!!1111

Why is that, Bonita?

--
https://www.youtube.com/watch?v=9lSzL1DqQn0

Real Troll

unread,

Jun 24, 2019, 9:37:39 PM6/24/19

to

On 24/06/2019 22:30, Queequeg wrote:
> Bonita Montero <Bonita....@gmail.com> wrote:
>
>> If you are going to reply to my posts, strip the attributions.
>> Otherwise I consider you as rude!!!!!!!!!!!!!1111
> Why is that, Bonita?
>

Because she doesn't want the evidence of her being a nincompoop
available to readers longer, even after she is no longer alive. The bad
thing is they can access it quickly and accurately because of that thing
called Google Search Engine.

Does this answer your question?

Bonita Montero

unread,

Jun 25, 2019, 2:08:42 AM6/25/19

to

>>>>> The default stack limit on Linux is 8MB, which can cause much amusement
>>>>> on small systems when the abomination known as over-commit is
>>>>> disabled :)

>>>>> That's not an issue since the untouched pages are just subtracted from
>>>> the pagefine on creation of the stack. I.e. the address-range is logi-
>>>> cally allocated but not physically until the pages are written.

>>> If you are going to reply to my posts, don't snip the attributions.

>> If you are going to reply to my posts, strip the attributions.
>> Otherwise I consider you as rude!!!!!!!!!!!!!1111

> *plonk*

LOL.

Bo Persson

unread,

Jun 25, 2019, 4:44:08 AM6/25/19

to

But it doesn't have to be a *hardware* stack.

C and C++ are used on systems with no dedicated stack pointer and where
a subroutine call doesn't store the return address in memory.

IBM mainframes is one important example.

Bonita Montero

unread,

Jun 25, 2019, 7:19:46 AM6/25/19

to

>>> Nothing. Almost all implementations of C++ use a stack structure to
>>> store local variables, and many of them do so using a hardware stack,
>>> but the standard says almost nothing about such issues.

>> C and C++ allow recursions.
>> And recursions aren't possible without a stack.

> But it doesn't have to be a *hardware* stack.

There's nothing like a "hardware stack", it's just memory, but only
special instructions which alleviate you pushing and popping onto
the stack.

> C and C++ are used on systems with no dedicated stack pointer and
> where a subroutine call doesn't store the return address in memory.

At least on the next call level the instruction-pointer stored in
a register in the meantime will be pushed on something like a stack.
So there's really no way around a stack with full recursion-capa-
bilities.

James Kuyper

unread,

Jun 25, 2019, 7:56:30 AM6/25/19

to

On Tuesday, June 25, 2019 at 7:19:46 AM UTC-4, Bonita Montero wrote:
> >>> Nothing. Almost all implementations of C++ use a stack structure to
> >>> store local variables, and many of them do so using a hardware stack,
> >>> but the standard says almost nothing about such issues.
>
> >> C and C++ allow recursions.
> >> And recursions aren't possible without a stack.
>
> > But it doesn't have to be a *hardware* stack.
>
> There's nothing like a "hardware stack", it's just memory, but only
> special instructions which alleviate you pushing and popping onto
> the stack.

That's what "hardware stack" means: memory that can be accessed by
special instructions for pushing and popping onto the stack. And it
isn't necessary for recursion.

> > C and C++ are used on systems with no dedicated stack pointer and
> > where a subroutine call doesn't store the return address in memory.
>
> At least on the next call level the instruction-pointer stored in
> a register in the meantime will be pushed on something like a stack.
> So there's really no way around a stack with full recursion-capa-
> bilities.

On some real life machines, each function call uses a different
dynamically allocated block of memory called activation records. Those
records are, as is typical of dynamically allocated memory, in no
guaranteed order, and certainly not required to be adjacent to each
other. Certainly the return address for each function call is stored
somewhere, but unlike a hardware stack, there's no particular guaranteed
relationship between the location where the return address of a called
function is stored, and where the return address for it's calling
function is stored.

About the only thing these activation records share with a hardware
stack is Last In First Out (LIFO) semantics, as is implied by the
requirements of the C standard: since the lifetime of objects local to
a particular instance of a called function is a subset of the lifetime
of objects local to the calling function, the memory for the called
function must (modulo the as-if rule) be allocated after, and
deallocated before, the memory for the calling function. A hardware
stack is one way to implement LIFO semantics, but it's not the only
way. Those LIFO semantics mean that activation records could be
described as a logical stack, but they certainly do not qualify as
forming hardware stack.

Bart

unread,

Jun 25, 2019, 8:00:14 AM6/25/19

to

On 25/06/2019 12:19, Bonita Montero wrote:
>>>> Nothing. Almost all implementations of C++ use a stack structure to
>>>> store local variables, and many of them do so using a hardware stack,
>>>> but the standard says almost nothing about such issues.
>
>>> C and C++ allow recursions.
>>> And recursions aren't possible without a stack.
>
>> But it doesn't have to be a *hardware* stack.
>
> There's nothing like a "hardware stack", it's just memory, but only
> special instructions which alleviate you pushing and popping onto
> the stack.

But a hardware stack (even if you define it as one using hardware
support) is very commonly used. It has to be, to avoid inefficiency.

And hardware stacks (like the ones on x86 and ARM) have certain
characteristics, one of which is not dealing gracefully with overflows.

>

Bonita Montero

unread,

Jun 25, 2019, 8:30:32 AM6/25/19

to

> On some real life machines, each function call uses a different
> dynamically allocated block of memory called activation records.

It doesn't depend on the type physical layout if a stack is a stack.

> Those records are, as is typical of dynamically allocated memory, in no
> guaranteed order, and certainly not required to be adjacent to each
> other. Certainly the return address for each function call is stored
> somewhere, but unlike a hardware stack, there's no particular guaranteed
> relationship between the location where the return address of a called
> function is stored, and where the return address for it's calling
> function is stored.

You can also call this a stack.

> About the only thing these activation records share with a hardware
> stack is Last In First Out (LIFO) semantics, as is implied by the
> requirements of the C standard: since the lifetime of objects local to
> a particular instance of a called function is a subset of the lifetime
> of objects local to the calling function, the memory for the called
> function must (modulo the as-if rule) be allocated after, and
> deallocated before, the memory for the calling function. A hardware
> stack is one way to implement LIFO semantics, but it's not the only
> way. Those LIFO semantics mean that activation records could be
> described as a logical stack, but they certainly do not qualify as
> forming hardware stack.

You were the one to define a hardware-stack as something that could
be only LIFO-contignous-memory. And I wasn't discussing hardware-stacks
but only stacks in general. And without whatever type of stack a C/C++
-language-implementation is impossible.

Bonita Montero

unread,

Jun 25, 2019, 8:34:14 AM6/25/19

to

>> There's nothing like a "hardware stack", it's just memory, but only
>> special instructions which alleviate you pushing and popping onto
>> the stack.

> But a hardware stack (even if you define it as one using hardware
> support) is very commonly used. It has to be, to avoid inefficiency.

The notion of a "hardware-stack" is just a phantasm as any architecture
cout support stacks made of physically contignous memory. The only thing
that makes some architectures different is the alleviated support with
call-/ret- and push-/pop-instructions.

> And hardware stacks (like the ones on x86 and ARM) have certain
> characteristics, one of which is not dealing gracefully with overflows.

That's not a part of the stack-support but of the support of virtual
memory (non-readable/-writable pages).