Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

alloca()-support

132 views
Skip to first unread message

Bonita Montero

unread,
Sep 30, 2019, 8:02:16 AM9/30/19
to
I know alloca() neither is an official part of the C nor C++ program-
ming-language. But I use it for performance-reasons. Can anyone name
compilers that don't support alloca()?

Frederick Gotham

unread,
Sep 30, 2019, 10:02:49 AM9/30/19
to
I'm curious, can you give an example of where you'd use this?

Bonita Montero

unread,
Sep 30, 2019, 10:09:32 AM9/30/19
to
>> I know alloca() neither is an official part of the C nor C++ program-
>> ming-language. But I use it for performance-reasons. Can anyone name
>> compilers that don't support alloca()?

> I'm curious, can you give an example of where you'd use this?

At a point where I did know I needed only a small variable sized array
of pointers.

Anton Shepelev

unread,
Sep 30, 2019, 10:12:52 AM9/30/19
to
Bonita Montero:

>I know alloca() neither is an official part of the C nor
>C++ programming-language. But I use it for performance-
>reasons. Can anyone name compilers that don't support
>alloca()?

At least, it is absent from TCC, but the specific compilers
are of no importance. The problem is that code with
`alloca()' is no longer starndard C and depends on specific
compilers, which harms the freedom of the users of your code
to compile it with *any* C compiler. Are the benefits of
`alloca(I)' worth it? Is not there a reasonably simple
solution within standard C, such as an array, a variable-
length array, or a pre-allocated memory block on the heap?

--
() ascii ribbon campaign - against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Bonita Montero

unread,
Sep 30, 2019, 10:16:22 AM9/30/19
to
> At least, it is absent from TCC, but the specific compilers
> are of no importance. The problem is that code with
> `alloca()' is no longer starndard C and depends on specific
> compilers, which harms the freedom of the users of your code
> to compile it with *any* C compiler.

Depending on the set of compilers that support alloca() this doesn't
count. And usually you have dependencies on functions becond those
of standard-C/C++ anyway.

> Are the benefits of `alloca(I)' worth it?

Yes, alloca() is fast.

> Is not there a reasonably simple solution within standard C,
> such as an array, a variable-length array, or a pre-allocated
> memory block on the heap?

VLAs aren't part of C++ and heap-allocations haver a magnitude
higher cost.


Scott Lurndal

unread,
Sep 30, 2019, 10:30:30 AM9/30/19
to
Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
>Bonita Montero:
>
>>I know alloca() neither is an official part of the C nor
>>C++ programming-language. But I use it for performance-
>>reasons. Can anyone name compilers that don't support
>>alloca()?
>
> [snip]
> The problem is that code with
>`alloca()' is no longer starndard C and depends on specific
>compilers, which harms the freedom of the users of your code
>to compile it with *any* C compiler.

Good grief. 90% of all C code isn't open source and the programmer
is free to use whatever features necessary for the program to
function as required.

And, FWIW, alloca() has never been "starndard" C.

Anton Shepelev

unread,
Sep 30, 2019, 10:30:44 AM9/30/19
to
Bonita Montero:

>Depending on the set of compilers that support `alloca()'
>this doesn't count.

I disagree. It is a difference between standard code and
compiler-dependent code. You are forcing users to use
specific compilers, regardless of how large a subset it is.

>And usually you have dependencies on functions becond those
>of standard-C/C++ anyway.

Those dependencies are available as C code or libraries,
whereas `alloca()' is not and cannot be.

>>Are the benefits of `alloca(I)' worth it?
>
>Yes, alloca() is fast.

I didn't say it wasn't, but suggested that you consider a
fast solution in standard C. By the way, have you made
certain that dynamic memory allocation would be the
bottleneck of your algorithm?

>>Is not there a reasonably simple solution within standard
>>C, such as an array, a variable-length array, or a pre-
>>allocated memory block on the heap?
>
>VLAs aren't part of C++ ->

You have written to comp.lang.c too.

>-> and heap-allocations haver a magnitude higher cost.

I proposed to reuse the same heap block in many invocations
of your functions. Sometimes it is possibe.

Anton Shepelev

unread,
Sep 30, 2019, 10:34:01 AM9/30/19
to
Scott Lurndal to Anton Shepelev:

>>The problem is that code with `alloca()' is no longer
>>starndard C and depends on specific compilers, which harms
>>the freedom of the users of your code to compile it with
>>*any* C compiler.
>
>Good grief. 90% of all C code isn't open source and the
>programmer is free to use whatever features necessary for
>the program to function as required.

Reliance on compiler features puts the same fetters on free
and commerical code.

>And, FWIW, alloca() has never been "starndard" C.

You misunderstood my usage of "no longer". I meant that
once you start using alloca() your program is no longer
standard C.

Bonita Montero

unread,
Sep 30, 2019, 10:45:08 AM9/30/19
to
>> Depending on the set of compilers that support `alloca()'
>> this doesn't count.

> I disagree. It is a difference between standard code and
> compiler-dependent code. You are forcing users to use
> specific compilers, regardless of how large a subset it is.

Most programs use platform-specific means; that's rarely an issue.

>> And usually you have dependencies on functions becond
>> those of standard-C/C++ anyway.

> Those dependencies are available as C code or libraries,
> whereas `alloca()' is not and cannot be.

I only wanted to give an example that you often depend not
only on C or C++. And alloca() is only one of this examples.

> I didn't say it wasn't, but suggested that you consider a
> fast solution in standard C.

I need it in C++. But the genral issue applies for both C
and C++. So I posted it in both newsgroups.

>> VLAs aren't part of C++ ->

> You have written to comp.lang.c too.

I don't want to discuss VLAs but alloca().

David Brown

unread,
Sep 30, 2019, 10:56:41 AM9/30/19
to
On 30/09/2019 16:33, Anton Shepelev wrote:
> Scott Lurndal to Anton Shepelev:
>
>>> The problem is that code with `alloca()' is no longer
>>> starndard C and depends on specific compilers, which harms
>>> the freedom of the users of your code to compile it with
>>> *any* C compiler.
>>
>> Good grief. 90% of all C code isn't open source and the
>> programmer is free to use whatever features necessary for
>> the program to function as required.
>
> Reliance on compiler features puts the same fetters on free
> and commerical code.

The great majority of C and C++ programs are dependent on features that
are limited to specific target processors, compilers, and/or OS's. Some
programs are written in a way to minimise these dependencies, or at
least to isolate them in a small number of files. For many other
programs, it really doesn't matter - the code will only be used with one
compiler or on one target.

And some features are non-standard but found on many compilers -
alloca() is one of these. If you are using a serious compiler, it will
support alloca(). Your alloca() code might not work on a tiny
microcontroller, or on a toy compiler like TCC, but it is highly likely
that a lot more of the code will have trouble there too.

>
>> And, FWIW, alloca() has never been "starndard" C.
>
> You misunderstood my usage of "no longer". I meant that
> once you start using alloca() your program is no longer
> standard C.
>

It is, I believe, extremely unlikely that use of alloca() will be the
deciding factor for which compilers can be used with the code.

Melzzzzz

unread,
Sep 30, 2019, 11:29:17 AM9/30/19
to
On 2019-09-30, Anton Shepelev <anton.txt@g{oogle}mail.com> wrote:
> Bonita Montero:
>
>>I know alloca() neither is an official part of the C nor
>>C++ programming-language. But I use it for performance-
>>reasons. Can anyone name compilers that don't support
>>alloca()?
>
> At least, it is absent from TCC, but the specific compilers
> are of no importance. The problem is that code with
> `alloca()' is no longer starndard C and depends on specific
> compilers, which harms the freedom of the users of your code
> to compile it with *any* C compiler. Are the benefits of
> `alloca(I)' worth it? Is not there a reasonably simple
> solution within standard C, such as an array, a variable-
> length array, or a pre-allocated memory block on the heap?

VLA elliminates need for alloca...
>


--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

Bo Persson

unread,
Sep 30, 2019, 11:29:45 AM9/30/19
to
If it is a *very* small number, you might of course always allocate the
upper limit (like 10), and just use the 5-7 needed by a specific invocation.

If the upper limit is significantly larger, the stack usage might make
alloca() less than 100% portable anyway.



Bo Persson

Bonita Montero

unread,
Sep 30, 2019, 11:32:02 AM9/30/19
to
Am 30.09.2019 um 17:29 schrieb Melzzzzz:
>>> I know alloca() neither is an official part of the C nor
>>> C++ programming-language. But I use it for performance-
>>> reasons. Can anyone name compilers that don't support
>>> alloca()?

>> At least, it is absent from TCC, but the specific compilers
>> are of no importance. The problem is that code with
>> `alloca()' is no longer starndard C and depends on specific
>> compilers, which harms the freedom of the users of your code
>> to compile it with *any* C compiler. Are the benefits of
>> `alloca(I)' worth it? Is not there a reasonably simple
>> solution within standard C, such as an array, a variable-
>> length array, or a pre-allocated memory block on the heap?

> VLA elliminates need for alloca...

C++ hasn't VLAs.
I only want to discuss this issue in both groups as this is a special
compiler-specific problem. A compiler that supports C++ also usually
compiles C-code. And if it supports alloca() it supports it in both
languages.

Bonita Montero

unread,
Sep 30, 2019, 11:34:02 AM9/30/19
to
> If it is a *very* small number, you might of course always allocate
> the upper limit (like 10), and just use the 5-7 needed by a specific
> invocation.

No, the limit is higher, at most about 100 pointers.

> If the upper limit is significantly larger, the stack usage might
> make alloca() less than 100% portable anyway.

The code I write is not for Arduinos.

Bonita Montero

unread,
Sep 30, 2019, 12:05:10 PM9/30/19
to
I just came across another issue:
With my primary compiler theres's an internal function called __chkstk
called when I do alloca. This function is usually called when there are
more than a page full of variables in the stack-frame. That's because
Windows recognizes the need for more stack space only through touching
the next unmapepd page down the stack. So why hasn't MS so be so clever
as to design Windows that it would also recoginize noncontignous acces-
ses to the stack-pages? Are these pages handled are like being over-
comitted?

Kenny McCormack

unread,
Sep 30, 2019, 12:26:20 PM9/30/19
to
In article <e2720f0b-ca77-4e65...@googlegroups.com>,
I think "man alloca" covers this well enough.

--
The randomly chosen signature file that would have appeared here is more than 4
lines long. As such, it violates one or more Usenet RFCs. In order to remain
in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/Infallibility

Soviet_Mario

unread,
Sep 30, 2019, 12:48:36 PM9/30/19
to
sorry, mine is not an answer but a related question

Since when (if at any time) was variable size automatic
array supported ?

I mean a temporary array local to a function whose size
becomes known only at runtime (seems to me a generalization
of variadic functions, which, too, does not know at compile
time how much stack space will consume).

If implemented, It seems equivalent to alloca () ... is it
there some limitation then in ORDER of automatic variable
creations ?

I mean, im my simple mind I'd guess such variable size
should follow all "sure" (known sized) variabiles and not to
exceed one single dynamic sized variable (or at least to
avoid strange and heavy overhead in access, not just _ESP, _EBP)

a single variable array on stack with no further variables
does not seem that difficult to implement, or am I wrong ?

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
Soviet_Mario - (aka Gatto_Vizzato)

Bonita Montero

unread,
Sep 30, 2019, 12:51:52 PM9/30/19
to
> Since when (if at any time) was variable size automatic array supported ?

Since C99.

Soviet_Mario

unread,
Sep 30, 2019, 12:56:00 PM9/30/19
to
for curiosity, this feature has one or more of the
limitations I feared ... or even none of them ?

Bonita Montero

unread,
Sep 30, 2019, 12:58:14 PM9/30/19
to
>>> Since when (if at any time) was variable size automatic array
>>> supported ?

>> Since C99.

> for curiosity, this feature has one or more of the limitations I feared
> ... or even none of them ?

Yes, you could run out of stack-space, maybe hitting a guard-page
in the best case so that you wouldn't corrupt any other data.

Philipp Klaus Krause

unread,
Sep 30, 2019, 1:14:22 PM9/30/19
to
Am 30.09.19 um 18:51 schrieb Bonita Montero:
>> Since when (if at any time) was variable size automatic array supported ?
>
> Since C99.
>
And C99 is the only version where support for them is mandatory.

As of C11, they are optional.

Bonita Montero

unread,
Sep 30, 2019, 1:16:28 PM9/30/19
to
>> Since C99.

> And C99 is the only version where support for them is mandatory.

> As of C11, they are optional.

And with C++ you can ask the compiler via __STDC_NO_VLA__ if VLAs
are not supported.

Joe Pfeiffer

unread,
Sep 30, 2019, 1:20:40 PM9/30/19
to
Soviet_Mario <Sovie...@CCCP.MIR> writes:

> On 30/09/2019 14:02, Bonita Montero wrote:
>> I know alloca() neither is an official part of the C nor C++
>> program-
>> ming-language. But I use it for performance-reasons. Can anyone name
>> compilers that don't support alloca()?
>
>
> sorry, mine is not an answer but a related question
>
> Since when (if at any time) was variable size automatic array
> supported ?
>
> I mean a temporary array local to a function whose size becomes known
> only at runtime (seems to me a generalization of variadic functions,
> which, too, does not know at compile time how much stack space will
> consume).
>
> If implemented, It seems equivalent to alloca () ... is it there some
> limitation then in ORDER of automatic variable creations ?

There is no such limitation on your C program. The VLA can be first,
last, or in the middle. And you can have more than one of them.

Where in the activation record a compiler chooses to put it is, AFAIK,
not specified.

Bonita Montero

unread,
Sep 30, 2019, 1:42:14 PM9/30/19
to
>> If implemented, It seems equivalent to alloca () ... is it there some
>> limitation then in ORDER of automatic variable creations ?

> There is no such limitation on your C program. The VLA can be first,
> last, or in the middle. And you can have more than one of them.

alloca() can be called at almost any point in a function (with some
compilers not within a function-call).

Scott Lurndal

unread,
Sep 30, 2019, 1:52:29 PM9/30/19
to
Bonita Montero <Bonita....@gmail.com> writes:
>> Since when (if at any time) was variable size automatic array supported ?
>
>Since C99.
>

Albeit available pre-C99 standardization in several compilers.

Soviet_Mario

unread,
Sep 30, 2019, 1:53:30 PM9/30/19
to
uhm, I didn't mean THAT kind of risk (unavoidable), but the
possible difficulties in defining variables beyond the point
of declaration of the variable-size array.

I can't figure out how the compiler would reference them in
standard efficient ways (use of registers).

Anyway, the difference with alloca () would be in a safer
management of possible collisions and running out of stack
space ? (compared with the plain declarations) ?

Soviet_Mario

unread,
Sep 30, 2019, 1:54:59 PM9/30/19
to
ah ! So you are suggestion sort of internal reordering on
the stack (like : all known types before, then the var-sized) ?

Soviet_Mario

unread,
Sep 30, 2019, 2:01:21 PM9/30/19
to
On 30/09/2019 19:20, Joe Pfeiffer wrote:
I have QT that does not supports them, but, if sb would be
so kind to copy a minimal asm generated code for some

int Funz (int A, int B)
{
int C = 3;
int Buf [A];
int D = 2;
Buf [A-1] = B;
return (A + B * D - C);
}

int main (void)
{
return Funz (4, 3);
}

I'd like to examine the assembler generated in order to
figure out how Buf (and D, beyond it apparently) are addressed

ah, if not to annoying, TURNING OFF most if not all
optimizationz (maybe the example is too minimal to be sort
of resolved at compile time :\)

tnx in advance

Soviet_Mario

unread,
Sep 30, 2019, 2:02:16 PM9/30/19
to
I've not clear what differences there are in alloca () usage
vs plain declaration, then ....

Keith Thompson

unread,
Sep 30, 2019, 2:13:16 PM9/30/19
to
Soviet_Mario <Sovie...@CCCP.MIR> writes:
> On 30/09/2019 18:58, Bonita Montero wrote:
>>>>> Since when (if at any time) was variable size automatic
>>>>> array supported ?
>>
>>>> Since C99.
>>
>>> for curiosity, this feature has one or more of the
>>> limitations I feared ... or even none of them ?
>>
>> Yes, you could run out of stack-space, maybe hitting a
>> guard-page
>> in the best case so that you wouldn't corrupt any other data.
>
> uhm, I didn't mean THAT kind of risk (unavoidable), but the
> possible difficulties in defining variables beyond the point
> of declaration of the variable-size array.
>
> I can't figure out how the compiler would reference them in
> standard efficient ways (use of registers).
>
> Anyway, the difference with alloca () would be in a safer
> management of possible collisions and running out of stack
> space ? (compared with the plain declarations) ?

No. Neither VLAs nor alloca() provide any protection against running
out of space. Both have undefined behavior if there isn't enough memory
for the allocation.

alloca() (which, again, is non-standard) is a function that returns a
pointer. It has no provision to return a null pointer on failure.

Defining a VLA, on the other hand, creates an array object to which you
can meaningfully apply sizeof.

Also, space allocated by alloca() is deallocated on return from the
enclosing function, not at the end of the enclosing block.

Since alloca() is a function, it can be invoked in contexts where a VLA
can't be created. If you call alloca() as part of a function argument,
it's likely to corrupt the stack.

VLAs were introduced in C99 and made optional in C11 and are not
supported, except perhaps as an extension, in C++.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

unread,
Sep 30, 2019, 2:14:44 PM9/30/19
to
Anton Shepelev <anton.txt@g{oogle}mail.com> writes:
[...]
> At least, it is absent from TCC, but the specific compilers
> are of no importance.

I just used alloca() with tcc on my system. It generates a call to the
implementation in the GNU C library. (With gcc, calls to alloca() are
inlined.)

William Ahern

unread,
Sep 30, 2019, 2:15:15 PM9/30/19
to
The compiler does this to ensure the alloca'd region doesn't extend past the
stack guard page. The kernel can handle noncontiguous stack access, but what
happens if your block extends *past* the stack into the dynamic heap region?

See https://blog.qualys.com/securitylabs/2017/06/19/the-stack-clash
and https://lwn.net/Articles/725832/ for more info

Paavo Helde

unread,
Sep 30, 2019, 3:26:20 PM9/30/19
to
On 30.09.2019 19:04, Bonita Montero wrote:
> I just came across another issue:
> With my primary compiler theres's an internal function called __chkstk
> called when I do alloca. This function is usually called when there are
> more than a page full of variables in the stack-frame. That's because
> Windows recognizes the need for more stack space only through touching
> the next unmapepd page down the stack. So why hasn't MS so be so clever
> as to design Windows that it would also recoginize noncontignous acces-
> ses to the stack-pages?

I guess this would mean setting up e.g. 64 guard pages instead of one or
two, and most of those pages would remain untouched and unused
(many-many service threads use extremely little stack space).

Also, this would not avoid the need for checks at alloca(), it would
still need to check that the allocation fits in the total free space of
the current stack.

> Are these pages handled are like being over-comitted?

As a little experiment shows, in Windows the virtual address space for
the stack is reserved, but not committed. Strictly speaking, you cannot
over-commit something which is not committed. But I can see how this
might feel similar.

Scott Lurndal

unread,
Sep 30, 2019, 3:39:07 PM9/30/19
to
Soviet_Mario <Sovie...@CCCP.MIR> writes:
>On 30/09/2019 18:58, Bonita Montero wrote:
>>>>> Since when (if at any time) was variable size automatic
>>>>> array supported ?
>>
>>>> Since C99.
>>
>>> for curiosity, this feature has one or more of the
>>> limitations I feared ... or even none of them ?
>>
>> Yes, you could run out of stack-space, maybe hitting a
>> guard-page
>> in the best case so that you wouldn't corrupt any other data.
>
>uhm, I didn't mean THAT kind of risk (unavoidable), but the
>possible difficulties in defining variables beyond the point
>of declaration of the variable-size array.
>
>I can't figure out how the compiler would reference them in
>standard efficient ways (use of registers).


mov %rsp, %rax # rax now has starting address of alloca() region
sub %rcx, %rsp # rcx is the number of bytes being allocated.

....

mov %rsp, %rbx # rbx now has starting address of second alloca() region
sub %rdx, %rsp # rdx is the number of bytes being allocated
....
ret # all alloca storage is deallocated

(AT&T form)

It can always spill rax or rbx to the stack under register pressure.

Scott Lurndal

unread,
Sep 30, 2019, 3:46:45 PM9/30/19
to
Paavo Helde <myfir...@osa.pri.ee> writes:
>On 30.09.2019 19:04, Bonita Montero wrote:
>> I just came across another issue:
>> With my primary compiler theres's an internal function called __chkstk
>> called when I do alloca. This function is usually called when there are
>> more than a page full of variables in the stack-frame. That's because
>> Windows recognizes the need for more stack space only through touching
>> the next unmapepd page down the stack. So why hasn't MS so be so clever
>> as to design Windows that it would also recoginize noncontignous acces-
>> ses to the stack-pages?
>
>I guess this would mean setting up e.g. 64 guard pages instead of one or

A "guard page" is just a PTE with the valid bit reset. In most cases,
that PTE is there regardless of whether that page is actually being
used (or has been allocated).

The OS can look at the faulting address vis-a-vis the stack region allocated at
process creation and heuristically determine if the stack should be
extended to include that address.

>two, and most of those pages would remain untouched and unused
>(many-many service threads use extremely little stack space).

Virtual memory is a rare commodity on 32-bit split virtual address
space systems - it's the virtual address space that is consumed by
guard pages; and often the stack grows down towards the heap, allocation
a sufficiently large VLA or alloca() may corrup the heap if the OS can't
distinguish beween the accesses (which is likely).



Joe Pfeiffer

unread,
Sep 30, 2019, 4:29:03 PM9/30/19
to
Soviet_Mario <Sovie...@CCCP.MIR> writes:

> On 30/09/2019 19:20, Joe Pfeiffer wrote:
>> Soviet_Mario <Sovie...@CCCP.MIR> writes:
>>
>>> On 30/09/2019 14:02, Bonita Montero wrote:
>>>> I know alloca() neither is an official part of the C nor C++
>>>> program-
>>>> ming-language. But I use it for performance-reasons. Can anyone name
>>>> compilers that don't support alloca()?
>>>
>>>
>>> sorry, mine is not an answer but a related question
>>>
>>> Since when (if at any time) was variable size automatic array
>>> supported ?
>>>
>>> I mean a temporary array local to a function whose size becomes known
>>> only at runtime (seems to me a generalization of variadic functions,
>>> which, too, does not know at compile time how much stack space will
>>> consume).
>>>
>>> If implemented, It seems equivalent to alloca () ... is it there some
>>> limitation then in ORDER of automatic variable creations ?
>>
>> There is no such limitation on your C program. The VLA can be first,
>> last, or in the middle. And you can have more than one of them.
>>
>> Where in the activation record a compiler chooses to put it is, AFAIK,
>> not specified.
>
> ah ! So you are suggestion sort of internal reordering on the stack
> (like : all known types before, then the var-sized) ?

I'm saying there is nothing (I know of) prohibiting a compiler from
doing that; without thinking too deeply about it, I suspect that if I
were writing the compiler that would be my first approach (since, as you
suggest, that would permit the other variables to be accessed through
constant offsets).

James Kuyper

unread,
Sep 30, 2019, 9:36:33 PM9/30/19
to
On 9/30/19 12:55 PM, Soviet_Mario wrote:
> On 30/09/2019 18:51, Bonita Montero wrote:
>>> Since when (if at any time) was variable size automatic
>>> array supported ?
>>
>> Since C99.
>>
>
> for curiosity, this feature has one or more of the
> limitations I feared ... or even none of them ?

There are several limitations on VLAs:
They were not part of standard C until C99, and became optional in
C2011. A C2011 implementation that does not support them will pre#define
__STDC_NO_VLA__ with a value of 1. They have never yet been part of
standard C++.

The type name in a compound literal is not allowed ot be a VLA (6.5.2.5p1).

A VLA can be declared with a size that is unspecified - this is
specified by using "*" for the size, and can only be done in the
declaration of a function parameter that is NOT part of the definition
of that function (6.7.2p4).

The expression for the size of a VLA must be an integer expression with
a value greater than 0 (6.7.2p5).

"Where a size expression is part of the operand of a sizeof operator and
changing the value of the size expression would not affect the result of
the operator, it is unspecified whether or not the size expression is
evaluated." (6.7.2p5)

Thus, portable code cannot rely upon the value of being changed (or
unchanged) by evaluation of sizeof(int[n++]) - as a general rule, that
would make it a bad idea to write such an expression.

"If, in the nested sequence of declarators in a full declarator, there
is a declarator specifying a variable length array type, the type
specified by the full declarator is said to be variably modified.
Furthermore, any type derived by declarator type derivation from a
variably modified type is itself variably modified." (6.7.6p3)

For instance, int (*vla)[n] declares a pointer to a VLA, but is not
itself a VLA. But it is a variably modified type .

A typedef for a VM type cannot be redefined, not even to the same type
(6.7p3).

"A member of a structure or union may have any complete object type
other than a variably modified type." (6.7.2.1p9)

"If an identifier is declared as having a variably modified type, it
shall be an ordinary identifier (as defined in 6.2.3), have no linkage,
and have either block scope or function prototype scope. If an
identifier is declared to be an object with static or thread storage
duration, it shall not have a variable length array type." (6.7.6.2p2)

"If a typedef name specifies a variably modified type then it shall have
block scope." (6.7.8p2)

"If a switch statement has an associated case or default label within
the scope of an identifier with a variably modified type, the entire
switch statement shall be within the scope of that identifier." (6.8.4.2p2)

"A goto statement shall not jump from outside the scope of an identifier
having a variably modified type to inside the scope of that identifier."
(6.8.6.1)

"... if the invocation of the setjmp macro was within the scope of an
identifier with variably modified type and execution has left that scope
in the interim, the behavior [of longjmp()] is undefined." (7.13.2.1p2)

Robert Wessel

unread,
Oct 1, 2019, 1:18:46 AM10/1/19
to
Are C++ compilers actually obligated to define _STDC_NO_VLA_?

Bonita Montero

unread,
Oct 1, 2019, 1:19:48 AM10/1/19
to
>> I just came across another issue:
>> With my primary compiler theres's an internal function called __chkstk
>> called when I do alloca. This function is usually called when there are
>> more than a page full of variables in the stack-frame. That's because
>> Windows recognizes the need for more stack space only through touching
>> the next unmapepd page down the stack. So why hasn't MS so be so clever
>> as to design Windows that it would also recoginize noncontignous acces-
>> ses to the stack-pages? Are these pages handled are like being over-
>> comitted?

> The compiler does this to ensure the alloca'd region doesn't extend past
> the stack guard page.

1. Not the compiler does do this but the OS.
2. The OS could do this with a single guard-page or maybe by reserving
a larger number of pages at the end of the stack.

> The kernel can handle noncontiguous stack access, ...

No, It can't as I said. I'm talking about Windows.

Bonita Montero

unread,
Oct 1, 2019, 1:33:04 AM10/1/19
to
I just wrote a little program:

1. #include <Windows.h>
2. #include <iostream>
3. #include <thread>
4. #include <mutex>
5. #include <vector>
6. #include <cstddef>
7. #include <functional>
8.
9. using namespace std;
10.
11. int main()
12. {
13. size_t const ALLOCSIZE = (size_t)1 << 30;
14. void *p = VirtualAlloc( nullptr, ALLOCSIZE, MEM_RESERVE
| MEM_COMMIT, PAGE_READWRITE );
15. memset( p, 0, ALLOCSIZE );
16.
17. auto threadFn = []( mutex &mtx, condition_variable &cv, bool
&start )
18. {
19. auto getSignal = []( mutex &mtx, condition_variable &cv,
bool &start )
20. {
21. unique_lock<mutex> ul( mtx );
22. while( !start )
23. cv.wait( ul );
24. };
25. getSignal( ref( mtx ), ref( cv ), ref( start ) );
26. __try
27. {
28. size_t const PAGESIZE = 0x1000;
29. char volatile c[1], *p;
30. for( p = c; ; *p, p -= PAGESIZE );
31. }
32. __except( EXCEPTION_EXECUTE_HANDLER )
33. {
34. }
35. Sleep( INFINITE );
36. };
37. mutex mtx;
38. condition_variable cv;
39. bool start = false;
40. vector<thread> vt;
41.
42. for( size_t i = 0; i != 1000; ++i )
43. vt.emplace_back( threadFn, ref( mtx ), ref( cv ), ref( start
) );
44. {
45. unique_lock<mutex> ul( mtx );
46. start = true;
47. cv.notify_all();
48. }
49.
50. Sleep( INFINITE );
51. }
52.

If I stop at line 15, I can see with the Process Explorer that the
virtual size as well as the private bytes of the process is raised
by 1GB. So There's no overcommit. The memset at line 15 doesn't
change this.
Then I spawn 1.000 threads and they first gonna wait for a condvar.
The threads get their standard threadsize set by the linker, which
is 1MB. At the point where they wait the virtual size of the pro-
cess is raised by about 1GB, but the private size is raised only
a small amount; so there's definitely a differnce in the commit
-behaviour between stack-allocation and VirtualAlloc / MEM_RESERVE
| MEM_COMMIT. When I waken the threads the read the whole stack
down to the guard page and the access to this page is catched by
a SEH-handler (I had to pack the waiting-part in a lambda because
I can't use functions with unwind-handlers that are using SEH).
So Windows is definitely overcommitting stacks.

Bonita Montero

unread,
Oct 1, 2019, 1:39:44 AM10/1/19
to
> uhm, I didn't mean THAT kind of risk (unavoidable), but the possible
> difficulties in defining variables beyond the point of declaration of
> the variable-size array.

The compiler usually collects the size of all the "static" variables in
the stack and reserves the space for them at the function-prolog. The
VLA-spaces are handled within the function at the point where they are
reserved (coudln't be handled differently because their size is only
known at the point where they are reserved).

Bonita Montero

unread,
Oct 1, 2019, 1:42:01 AM10/1/19
to
> mov %rsp, %rax # rax now has starting address of alloca() region

The compiler usually uses RPB to remember the frame-pointer-address
at the prolog of the function. EBP is more suitable because it can
be restored in one step by the x86 LEAVE-instruction.

Bonita Montero

unread,
Oct 1, 2019, 1:43:18 AM10/1/19
to
>> And with C++ you can ask the compiler via __STDC_NO_VLA__ if VLAs
>> are not supported.

> Are C++ compilers actually obligated to define _STDC_NO_VLA_?

It's at least not part of the C++17-standard.


Paavo Helde

unread,
Oct 1, 2019, 2:08:58 AM10/1/19
to
On 1.10.2019 8:32, Bonita Montero wrote:

> so there's definitely a differnce in the commit
> -behaviour between stack-allocation and VirtualAlloc / MEM_RESERVE
> | MEM_COMMIT.

Yes, it looks like the whole stack space is reserved with MEM_RESERVE,
but is not committed with MEM_COMMIT.

> When I waken the threads the read the whole stack
> down to the guard page and the access to this page is catched by
> a SEH-handler

No, what happens that the guard page is moved through the whole stack
until it reaches the end of it. The purpose of the guard page approach
is exactly to avoid the need to commit extra stack space until it
becomes needed.

> So Windows is definitely overcommitting stacks.

You cannot over-commit something what is not committed.

Bonita Montero

unread,
Oct 1, 2019, 2:35:52 AM10/1/19
to
>> When I waken the threads the read the whole stack
>> down to the guard page and the access to this page is catched by
>> a SEH-handler

> No, what happens that the guard page is moved through the whole stack
> until it reaches the end of it. ...

The guard-page will stop moving at the end.

>> So Windows is definitely overcommitting stacks.

> You cannot over-commit something what is not committed.

This is a flavour of overcommitting because I get a commit by touching
a page and the application will crash if the "dynamic commmit" fails.

Melzzzzz

unread,
Oct 1, 2019, 3:03:38 AM10/1/19
to
On 2019-09-30, Bonita Montero <Bonita....@gmail.com> wrote:
> Am 30.09.2019 um 17:29 schrieb Melzzzzz:
>>>> I know alloca() neither is an official part of the C nor
>>>> C++ programming-language. But I use it for performance-
>>>> reasons. Can anyone name compilers that don't support
>>>> alloca()?
>
>>> At least, it is absent from TCC, but the specific compilers
>>> are of no importance. The problem is that code with
>>> `alloca()' is no longer starndard C and depends on specific
>>> compilers, which harms the freedom of the users of your code
>>> to compile it with *any* C compiler. Are the benefits of
>>> `alloca(I)' worth it? Is not there a reasonably simple
>>> solution within standard C, such as an array, a variable-
>>> length array, or a pre-allocated memory block on the heap?
>
>> VLA elliminates need for alloca...
>
> C++ hasn't VLAs.

No, neiter has alloca...





--
press any key to continue or any other to quit...
U ničemu ja ne uživam kao u svom statusu INVALIDA -- Zli Zec
Na divljem zapadu i nije bilo tako puno nasilja, upravo zato jer su svi
bili naoruzani. -- Mladen Gogala

David Brown

unread,
Oct 1, 2019, 3:38:46 AM10/1/19
to
Exactly. It is a likely that the allocation would be handled something
like that. But the compiler can arrange the details in any way it deems
best.

Alternatives include allocating some fixed frame variables first, then
copying the stack pointer to a frame pointer (call it "fp"). Then the
alloca() or VLA adds to the stack, then there is some more stack data
(maybe another VLA or alloca() ). The first data can be accessed as "fp
+ offset", later data as "sp + offset". Intermediary anchor points can
be saved as necessary.

Mixing multiple VLA's and alloca's in the same function is likely to
give you inefficient code and wasted stack space (alloca data lives
until the end of the function, VLA's live until the end of the block).
But the compiler's prime job is to give you correct object code, and it
should handle that okay.


Bonita Montero

unread,
Oct 1, 2019, 4:57:27 AM10/1/19
to
>>> VLA elliminates need for alloca...

>> C++ hasn't VLAs.

> No, neiter has alloca...

Doesn't matter since most compilers support alloca().

Keith Thompson

unread,
Oct 1, 2019, 5:42:54 AM10/1/19
to
Robert Wessel <robert...@yahoo.com> writes:
[...]
> Are C++ compilers actually obligated to define _STDC_NO_VLA_?

No, __STDC_NO_VLA__ (note the double underscores) is specific to C11 and
later. C++ doesn't support VLAs at all, so there's no need for a C++
compiler to indicate its lack of support, and the C++ standard doesn't
require it to do so.

James Kuyper

unread,
Oct 1, 2019, 8:11:57 AM10/1/19
to
On 10/1/19 5:42 AM, Keith Thompson wrote:
> Robert Wessel <robert...@yahoo.com> writes:
> [...]
>> Are C++ compilers actually obligated to define _STDC_NO_VLA_?
>
> No, __STDC_NO_VLA__ (note the double underscores) is specific to C11 and
> later. C++ doesn't support VLAs at all, so there's no need for a C++
> compiler to indicate its lack of support, and the C++ standard doesn't
> require it to do so.

I'm sure you know this, but for the sake of other readers, I'd like to
point out that the fact that __STDC_NO_VLA__ is a C-specific macro does
not, in itself, prevent the C++ standard from saying anything about it.
The C++ standard simply failed to do so. It has said things about other
C pre#defined macros.

The C++ standard mandates the setting of __STDC_HOSTED__ to either 0 or
1, as appropriate, and allows the setting of __STDC_MB_MIGHT_NEQ_WC__,
__STDC__, __STDC_VERSION__, and __STDC_ISO_10646__. If
__STDC_MB_MIGHT_NEQ_WC__ is pre#defined, it's required expand to the
integer literal 1, and if __STDC_ISO_10646__ is pre#defined, it's
required to expand to "An integer literal of the form yyyymmL".

Bonita Montero

unread,
Oct 1, 2019, 8:53:27 AM10/1/19
to
> No, __STDC_NO_VLA__ (note the double underscores) is specific to C11 and
> later. C++ doesn't support VLAs at all, so there's no need for a C++
> compiler to indicate its lack of support, and the C++ standard doesn't
> require it to do so.

It might help to port C99-code to C++ when C++ would define this macro
appropriately. Code could conditionally disable VLA-dependent parts.

Scott Lurndal

unread,
Oct 1, 2019, 11:19:16 AM10/1/19
to
Bonita Montero <Bonita....@gmail.com> writes:
>> mov %rsp, %rax # rax now has starting address of alloca() region
>
>The compiler usually uses RPB to remember the frame-pointer-address
>at the prolog of the function. EBP is more suitable because it can
>be restored in one step by the x86 LEAVE-instruction.

So what? The RSP is restored automatically by the ret instruction.

int f(unsigned long len)
{
char *f = alloca(len);

for (int i=0; i < 0; i++) {
*f++ = '\0';
}
return 0;
}

gcc output:

0000000000400540 <f>:
#include <alloca.h>
#include <stdlib.h>

int f(unsigned long len)
{
400540: 55 push %rbp
400541: 48 89 e5 mov %rsp,%rbp
400544: 48 83 ec 20 sub $0x20,%rsp
400548: 48 89 7d e8 mov %rdi,-0x18(%rbp)
char *f = alloca(len);
40054c: 48 8b 45 e8 mov -0x18(%rbp),%rax
400550: 48 8d 50 0f lea 0xf(%rax),%rdx
400554: b8 10 00 00 00 mov $0x10,%eax
400559: 48 83 e8 01 sub $0x1,%rax
40055d: 48 01 d0 add %rdx,%rax
400560: b9 10 00 00 00 mov $0x10,%ecx
400565: ba 00 00 00 00 mov $0x0,%edx
40056a: 48 f7 f1 div %rcx
40056d: 48 6b c0 10 imul $0x10,%rax,%rax


400571: 48 29 c4 sub %rax,%rsp <===== Alloca(len)
400574: 48 89 e0 mov %rsp,%rax <===== Remember f in %rax


400577: 48 83 c0 0f add $0xf,%rax <===== Align it
40057b: 48 c1 e8 04 shr $0x4,%rax
40057f: 48 c1 e0 04 shl $0x4,%rax
400583: 48 89 45 f8 mov %rax,-0x8(%rbp) <== And save in stack frame

for (int i=0; i < 0; i++) {
400587: c7 45 f4 00 00 00 00 movl $0x0,-0xc(%rbp)
40058e: eb 13 jmp 4005a3 <f+0x63>
*f++ = '\0';
400590: 48 8b 45 f8 mov -0x8(%rbp),%rax
400594: 48 8d 50 01 lea 0x1(%rax),%rdx
400598: 48 89 55 f8 mov %rdx,-0x8(%rbp)
40059c: c6 00 00 movb $0x0,(%rax)

40059f: 83 45 f4 01 addl $0x1,-0xc(%rbp)
4005a3: 83 7d f4 00 cmpl $0x0,-0xc(%rbp)
4005a7: 78 e7 js 400590 <f+0x50>
*f++ = '\0';
}
return 0;
4005a9: b8 00 00 00 00 mov $0x0,%eax
}
4005ae: c9 leaveq
4005af: c3 retq

Bonita Montero

unread,
Oct 1, 2019, 11:31:25 AM10/1/19
to
>> The compiler usually uses RPB to remember the frame-pointer-address
>> at the prolog of the function. EBP is more suitable because it can
>> be restored in one step by the x86 LEAVE-instruction.

> So what? The RSP is restored automatically by the ret instruction.

Yes, but LEAVE does three instructions in one step. Usually this
results in the same uOPs, but it is only one slot in the decoder.

Manfred

unread,
Oct 1, 2019, 11:45:30 AM10/1/19
to
On 9/30/2019 8:01 PM, Soviet_Mario wrote:
> On 30/09/2019 19:20, Joe Pfeiffer wrote:
>> Soviet_Mario <Sovie...@CCCP.MIR> writes:
>>
>>> On 30/09/2019 14:02, Bonita Montero wrote:
>>>> I know alloca() neither is an official part of the C nor C++
>>>> program-
>>>> ming-language. But I use it for performance-reasons. Can anyone name
>>>> compilers that don't support alloca()?
>>>
>>>
>>> sorry, mine is not an answer but a related question
>>>
>>> Since when (if at any time) was variable size automatic array
>>> supported ?
>>>
>>> I mean a temporary array local to a function whose size becomes known
>>> only at runtime (seems to me a generalization of variadic functions,
>>> which, too, does not know at compile time how much stack space will
>>> consume).
>>>
>>> If implemented, It seems equivalent to alloca () ... is it there some
>>> limitation then in ORDER of automatic variable creations ?
>>
>> There is no such limitation on your C program.  The VLA can be first,
>> last, or in the middle.  And you can have more than one of them.
>>
>> Where in the activation record a compiler chooses to put it is, AFAIK,
>> not specified.
>>
>
>
> I have QT that does not supports them, but, if sb would be so kind to
> copy a minimal asm generated code for some
>
> int Funz (int A, int B)
> {
> int C = 3;
> int Buf [A];
> int D = 2;
> Buf [A-1] = B;
> return (A + B * D - C);
> }
>
> int main (void)
> {
> return Funz (4, 3);
> }
>
> I'd like to examine the assembler generated in order to figure out how
> Buf (and D, beyond it apparently) are addressed
>
> ah, if not to annoying, TURNING OFF most if not all optimizationz (maybe
> the example is too minimal to be sort of resolved at compile time :\)
>
> tnx in advance
>

AFAIU this is a feature of the language and compiler, not of QT, so
probably your limitation is not because you use QT per se, but instead
because C++ does not have VLAs.

Trying to force VLAs via asm in C++ would probably open a can of worms
not worth the benefit.
A more viable alternative could be having the code that needs VLAs
implemented in C modules linked to C++, or switch to std::vector.
If it is directly coupled to QT itself, e.g. GUI handlers, then the
performance gap of using std::vector is most probably neglectable.
(and, of course, there still is _alloca() if it is supported by the
compiler)

Melzzzzz

unread,
Oct 1, 2019, 2:50:10 PM10/1/19
to
Which one does not supports VLA and supports alloca?

Melzzzzz

unread,
Oct 1, 2019, 2:59:02 PM10/1/19
to
On 2019-10-01, Bonita Montero <Bonita....@gmail.com> wrote:
I never saw any compiler that uses `leave`. Especially since RBP doesn't
have to be used as optimization...

David Brown

unread,
Oct 1, 2019, 3:11:01 PM10/1/19
to
I don't know this case, but typically on modern cpus "complex"
instructions like "leave" are slower than manual expanding the
instructions using "pop", "ret", etc.

The easiest way to find out is to do a test compile (godbolt.org is
great for it) making sure you have cpu-specific tuning flags and
optimisations enabled, and look at the output. The compiler writers
know better than most of us which instruction patterns are the most
efficient.

Bonita Montero

unread,
Oct 1, 2019, 3:21:04 PM10/1/19
to
>>>> C++ hasn't VLAs.

>>> No, neiter has alloca...

>> Doesn't matter since most compilers support alloca().

> Which one does not supports VLA and supports alloca?

I'm using C++ and C++ hasn't VLAs.
But almost any C++-compiler supports alloca().

Bonita Montero

unread,
Oct 1, 2019, 3:24:41 PM10/1/19
to

>> Yes, but LEAVE does three instructions in one step. Usually this
>> results in the same uOPs, but it is only one slot in the decoder.

> I never saw any compiler that uses `leave`.

MSVC and gcc always use LEAVE when you have dynamic stack-frames
with VLAs (gcc) or alloca(). Here's an example of gcc with VLAs:

#include <stddef.h>

void f( size_t s )
{
int volatile a[s];
a[s - 1] = 123;
}


.file "x.c"
.text
.p2align 4,,15
.globl f
.type f, @function
f:
.LFB0:
.cfi_startproc
leaq 18(,%rdi,4), %rax
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
subq $1, %rdi
andq $-16, %rax
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq %rax, %rsp
movl $123, (%rsp,%rdi,4)
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size f, .-f
.ident "GCC: (Debian 6.3.0-18+deb9u1) 6.3.0 20170516"
.section .note.GNU-stack,"",@progbits

Bonita Montero

unread,
Oct 1, 2019, 3:25:23 PM10/1/19
to
> I don't know this case, but typically on modern cpus "complex"
> instructions like "leave" are slower than manual expanding the
> instructions using "pop", "ret", etc.

That's wrong.

Scott Lurndal

unread,
Oct 1, 2019, 4:39:54 PM10/1/19
to
Bonita Montero <Bonita....@gmail.com> writes:
>>>>> C++ hasn't VLAs.
>
>>>> No, neiter has alloca...
>
>>> Doesn't matter since most compilers support alloca().
>
>> Which one does not supports VLA and supports alloca?
>
>I'm using C++ and C++ hasn't VLAs.

g++ does, as an extension.

Scott Lurndal

unread,
Oct 1, 2019, 4:44:23 PM10/1/19
to
From the Intel optimization guide:

The Intel 64 and IA-32 architectures have several commonly used instructions for
parameter passing and procedure entry and exit: PUSH, POP, CALL, LEAVE and RET.
These instructions implicitly update the stack pointer register (RSP), maintaining a
combined control and parameter stack without software intervention. These instruc-
tions are typically implemented by several ops in previous microarchitectures.

The Stack Pointer Tracker moves all these implicit RSP updates to logic contained in
the decoders themselves. The feature provides the following benefits:

· Improves decode bandwidth, as PUSH, POP and RET are single op instructions in Intel Core microarchitecture.
· Conserves execution bandwidth as the RSP updates do not compete for execution resources.
· Improves parallelism in the out of order execution engine as the implicit serial dependencies between ops are removed.
· Improves power efficiency as the RSP updates are carried out on small, dedicated hardware.

Assembly/Compiler Coding Rule 31. (ML impact, M generality) Avoid using
complex instructions (for example, enter, leave, or loop) that have more than four
µops and require multiple cycles to decode. Use sequences of simple instructions
instead.

Complex instructions may save architectural registers, but incur a penalty of 4 µops to
set up parameters for the microsequencer ROM.

David Brown

unread,
Oct 1, 2019, 5:27:05 PM10/1/19
to
As a general point, I don't think it is wrong - but from your posting it
looks like it does not apply in the case of "leave". At least, gcc uses
it, which is a reasonable indicator that it is more efficient than
separate instructions.

A little googling suggests that the "enter" instruction is slower than
the individual instructions, but that "leave" is quite fast. There are
plenty of other legacy CISC instructions that are slower than breaking
them apart into equivalent simpler instructions, such as the "LOOP"
instructions. I'm sure x86 assembly experts could give you more.

I've seen the same thing on other processors too. IIRC, on the 68060
the hardware integer division instruction was slower than a software
division routine.

Anton Shepelev

unread,
Oct 1, 2019, 5:36:22 PM10/1/19
to
Bonita Montero to Anton Shepelev:

> > > Depending on the set of compilers that support
> > > `alloca()' this doesn't count.
> >
> > I disagree. It is a difference between standard code
> > and compiler-dependent code. You are forcing users to
> > use specific compilers, regardless of how large a subset
> > it is.
>
> Most programs use platform-specific means; that's rarely
> an issue.

Platform-specific facilities, indeed, are not an issue
because they come with the platform. Some are even
standartised (POSIX), so that general-purpose software can
often be written in largely a platform-independent manner,
e.g. the NetPBM suite and the dcraw developer. Other
programs can be ported to new platform by supplying custom
implementations of the platform-specific functions that they
use.

But `alloca()' is not a platform-pecific facility, because
it may exist or be absent on the majority of platforms. Nor
is it a normal, suppliable, dependency because one cannot
simply implement it whenever needed.

--
() ascii ribbon campaign -- against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]

Joe Pfeiffer

unread,
Oct 1, 2019, 11:08:26 PM10/1/19
to
There was a time when it was true. I'd be pretty surprised if it still
were.

Melzzzzz

unread,
Oct 2, 2019, 12:15:53 AM10/2/19
to
On 2019-10-01, Bonita Montero <Bonita....@gmail.com> wrote:
Any C++11 compil.er that supports C99 also supports VLA, I bet...

Bonita Montero

unread,
Oct 2, 2019, 12:28:23 AM10/2/19
to
> But `alloca()' is not a platform-pecific facility, because
> it may exist or be absent on the majority of platforms. ..

Doesn't matter, almost any compiler supports it.

Melzzzzz

unread,
Oct 2, 2019, 12:28:40 AM10/2/19
to
On 2019-10-01, Bonita Montero <Bonita....@gmail.com> wrote:
>
Yes, this is against optimisations as you use volatile...

Bonita Montero

unread,
Oct 2, 2019, 12:29:05 AM10/2/19
to

>> I'm using C++ and C++ hasn't VLAs.
>> But almost any C++-compiler supports alloca().

> Any C++11 compil.er that supports C99 also supports VLA, I bet...

Boy, you're such a mega-idiot.

Melzzzzz

unread,
Oct 2, 2019, 12:30:01 AM10/2/19
to
Look into mirror...

Bonita Montero

unread,
Oct 2, 2019, 12:30:33 AM10/2/19
to
I use volatile to prevent the compiler from optimizing away the
function body.

Melzzzzz

unread,
Oct 2, 2019, 12:36:46 AM10/2/19
to
On 2019-10-02, Bonita Montero <Bonita....@gmail.com> wrote:
Linus Torvalds has expressed his displeasure in the past over VLA usage
for arrays with predetermined small sizes, with comments like "USING
VLA'S IS ACTIVELY STUPID! It generates much more code, and much slower
code (and more fragile code), than just using a fixed key size would
have done." [6] With the Linux 4.20 kernel, Linux kernel is effectively
VLA-free.[7]

Bonita Montero

unread,
Oct 2, 2019, 12:39:54 AM10/2/19
to
>> That's wrong.

> As a general point, I don't think it is wrong - ...

CISC-like instructions that are used often in x86 are usually mapped to
the same uOPS, but they use only one slot in the decoder.

> A little googling suggests that the "enter" instruction is slower
> than the individual instructions, but that "leave" is quite fast.

ENTER isn't used because it is used to allocate static stack-frames,
i.e. stack-frames without variable parts like VLAs or alloca()-frames.
But static stack-frames are used by the compilers without EBP, so ENTER
isn't needed. But that doesn't mean that it is not efficient.

> There are plenty of other legacy CISC instructions that are slower
> than breaking them apart into equivalent simpler instructions, such
> as the "LOOP" instructions. ...

My Ryzen needs 2 clock-cycles for LOOP if the branch is taken; that's
the same number of clock cycles if you would assemble it from simpler
instructions.

Melzzzzz

unread,
Oct 2, 2019, 12:42:48 AM10/2/19
to
On 2019-10-02, Bonita Montero <Bonita....@gmail.com> wrote:
Try more realistic example:
float read_val();
float process(int,float*);
float read_and_process(int n)
{
float vals[n];

for (int i = 0; i < n; ++i)
vals[i] = read_val();

return process(n, vals);
}

Where is `leave` now?

Frederick Gotham

unread,
Oct 2, 2019, 3:03:45 AM10/2/19
to
On Monday, September 30, 2019 at 1:02:16 PM UTC+1, Bonita Montero wrote:
> I know alloca() neither is an official part of the C nor C++ program-
> ming-language. But I use it for performance-reasons. Can anyone name
> compilers that don't support alloca()?


I wonder if this would be a good reason to add a new compile-time operator to the C++ language? We could re-use the keyword 'extern' as follows:

#include <some_header.hpp>

int main()
{
if extern(alloca)
{
/* alloca is declared so we can use it */
}
else
{
/* Do some other trick */
}
}

David Brown

unread,
Oct 2, 2019, 4:02:02 AM10/2/19
to
On 02/10/2019 06:36, Melzzzzz wrote:
> On 2019-10-02, Bonita Montero <Bonita....@gmail.com> wrote:
>>>> MSVC and gcc always use LEAVE when you have dynamic stack-frames
>>>> with VLAs (gcc) or alloca(). Here's an example of gcc with VLAs:
>>>>
>>>> #include <stddef.h>
>>>>
>>>> void f( size_t s )
>>>> {
>>>> int volatile a[s];
>>>> a[s - 1] = 123;
>>>> }
>>>>
<snip>
>>
>>> Yes, this is against optimisations as you use volatile...
>>
>> I use volatile to prevent the compiler from optimizing away the
>> function body.

There are usually better ways to get this, such as giving a return value
that depends on the function inputs. If you have to use "volatile" to
ensure you are generating code that can be examined, then minimise its
use rather than making the whole array volatile.

(In a test function that used the VLA for calculations, gcc still
generated a "leave" instruction.)

> "
> Linus Torvalds has expressed his displeasure in the past over VLA usage
> for arrays with predetermined small sizes, with comments like "USING
> VLA'S IS ACTIVELY STUPID! It generates much more code, and much slower
> code (and more fragile code), than just using a fixed key size would
> have done." [6] With the Linux 4.20 kernel, Linux kernel is effectively
> VLA-free.[7]
> "
>

VLA's with pre-determined small sizes result in /identical/ code to
arrays with fixed sizes.

void foo1(void) {
const int arr_size = 12;
int arr[arr_size];
...
}

void foo2(void) {
#define ARR_SIZE 12
int arr[ARR_SIZE];
...
}


"foo1" has a VLA with a predetermined small size, while "foo2" has a
normal array with fixed size. These are different as far as the C
language is concerned, but identical in the generated code.


I /think/ what Torvalds is trying to say is that if you know that the
size "n" of your VLA is going be at most "max_n", and you know "max_n"
is small, then you should use a fixed size array of size "max_n" instead
of a VLA. I believe that has both advantages and disadvantages. When
the array is of fixed size (i.e., known at compile time - even if it is
technically a VLA) the code for allocating and deallocating the space is
simpler. Code for accessing the contents of the array or other data
might be a little simpler. But it is all just one-time calculations.
Code that is run repeatedly - and if you have an array, you probably
have a loop - is going to be identical.

So you get the trade-off between a few extra instructions for the
variable VLA, balanced against the benefits of reduced cache use from a
smaller array. I'd not like to be categorical about which is best - it
would need measuring.


If the VLA is making the code fragile, then I expect it would be equally
fragile with a fixed size array.


(Fixed size arrays are also easier for analysis of stack usage, if that
is important.)

David Brown

unread,
Oct 2, 2019, 4:10:37 AM10/2/19
to
The compiler will use it if it is convenient, and not use it otherwise.
This sample code /does/ generate "leave" with gcc :

int f(int s )
{
int a[s];
for (int i = 0; i < s; i++) a[i] = i;
int t = 0;
for (int i = 0; i < s; i++) t += a[i];
return t;
}

(clang does not generate a "leave" - it generates its usual unrolled
vector monstrosities that are highly efficient for huge loop counts, and
terrible for tiny counts.)


Basically, it seems that "leave" is equivalent to "mov %ebp, %esp" then
"pop %ebp". If the compiler would otherwise have generated these
instructions, then it uses "leave". If not, then it doesn't. "leave"
appears to be marginally more efficient than the separate instructions -
enough so to use it when there is the option, but not enough to go out
of its way to use it.

(The matching "enter" instruction is much slower than manual
manipulation of the registers, and is thus avoided.)

David Brown

unread,
Oct 2, 2019, 4:20:11 AM10/2/19
to
I'm not quite sure what you mean by that. You are surely not saying the
68060 hardware division has got faster recently, or that software
division code for it has got slower!


Paavo Helde

unread,
Oct 2, 2019, 4:30:28 AM10/2/19
to
Please, when proposing new extensions for C++, could we refrain
ourselves from things which would increase the danger of poorly
predictable nasty fatalities like stack overflows? There are already
enough ways to shoot someone's legs off in C++.

Ian Collins

unread,
Oct 2, 2019, 4:34:20 AM10/2/19
to
Not much use what a "function" is actually inlined code..

Best not to use dodgy things like alloca. Even its Linux man page
recommends against its use.

--
Ian.

David Brown

unread,
Oct 2, 2019, 4:35:07 AM10/2/19
to
On 02/10/2019 06:39, Bonita Montero wrote:
>>> That's wrong.
>
>> As a general point, I don't think it is wrong - ...
>
> CISC-like instructions that are used often in x86 are usually mapped to
> the same uOPS, but they use only one slot in the decoder.
>
>> A little googling suggests that the "enter" instruction is slower
>> than the individual instructions, but that "leave" is quite fast.
>
> ENTER isn't used because it is used to allocate static stack-frames,
> i.e. stack-frames without variable parts like VLAs or alloca()-frames.
> But static stack-frames are used by the compilers without EBP, so ENTER
> isn't needed. But that doesn't mean that it is not efficient.
>

"enter" is only potentially useful when you have a fixed stack frame but
want a frame pointer - that is true. And usually you don't bother with
a frame pointer (they used to be popular for debuggers, but debuggers
are smarter these days).

But "enter" is not used even when it could be used, because it is very
slow. It is microcoded, and takes 12 micro-ops on modern x86 devices -
far more than "push ebp", "mov ebp, esp", "sub esp, xxxx" that replace it.

>> There are plenty of other legacy CISC instructions that are slower
>> than breaking them apart into equivalent simpler instructions, such
>> as the "LOOP" instructions. ...
>
> My Ryzen needs 2 clock-cycles for LOOP if the branch is taken; that's
> the same number of clock cycles if you would assemble it from simpler
> instructions.

Interesting. On an Intel Skylake it is 7 µ-ops for a "loop" and 11 for
a "loopne".

<https://www.agner.org/optimize/instruction_tables.pdf>
<https://www.agner.org/optimize/>


This is why I rely on the compiler to generate good code for particular
chips. The x86 assembly world is too chaotic for my liking.

Frederick Gotham

unread,
Oct 2, 2019, 4:36:42 AM10/2/19
to
On Wednesday, October 2, 2019 at 9:34:20 AM UTC+1, Ian Collins wrote:

>
> Not much use what a "function" is actually inlined code..
>
> Best not to use dodgy things like alloca. Even its Linux man page
> recommends against its use.


That's as ridiculous as telling people not to use "goto".

I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code.

Ian Collins

unread,
Oct 2, 2019, 4:41:14 AM10/2/19
to
On 02/10/2019 21:36, Frederick Gotham wrote:
> On Wednesday, October 2, 2019 at 9:34:20 AM UTC+1, Ian Collins wrote:
>
>>
>> Not much use what a "function" is actually inlined code..
>>
>> Best not to use dodgy things like alloca. Even its Linux man page
>> recommends against its use.
>
>
> That's as ridiculous as telling people not to use "goto".

Using goto is a tar and feather offense in my team. There is never a
justifiable use case in C++.

alloca and VLAs are a bomb waiting to go off in your code.

> I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code.

And you admit it?

--
Ian.

Frederick Gotham

unread,
Oct 2, 2019, 5:28:37 AM10/2/19
to
On Wednesday, October 2, 2019 at 9:41:14 AM UTC+1, Ian Collins wrote:

> > I'm working on an embedded Linux x86_64 firmware program right now and I've got two "goto" calls in my code.
>
> And you admit it?


I could show you if you like.

Chris M. Thomasson

unread,
Oct 2, 2019, 5:35:31 AM10/2/19
to
On 9/30/2019 5:02 AM, Bonita Montero wrote:
> I know alloca() neither is an official part of the C nor C++ program-
> ming-language. But I use it for performance-reasons. Can anyone name
> compilers that don't support alloca()?

Fwiw, the last time I use alloca was to offset threads stacks, using a
thread id, to get around the 64k aliasing problem on old Intel
hyperthreaded processors. It was needed to get around false sharing wrt
thread stacks! alloca was a simple convenience hack to get the damn job
done. Iirc, it was mentioned in an Intel paper. Well, I have something
even more hackish. The region allocator that can be stack based. It can
be fed with memory reaped from alloca:

http://pastebin.com/raw/f37a23918

https://groups.google.com/forum/#!original/comp.lang.c/7oaJFWKVCTw/sSWYU9BUS_QJ


Frederick Gotham

unread,
Oct 2, 2019, 5:43:23 AM10/2/19
to
On Wednesday, October 2, 2019 at 10:35:31 AM UTC+1, Chris M. Thomasson wrote:
> Well, I have something
> even more hackish. The region allocator that can be stack based. It can
> be fed with memory reaped from alloca:
>
> http://pastebin.com/raw/f37a23918


I still remember the little earthquake I felt in my head 10 years ago the first time I ever walked into a morgue. I'm not saying that looking at your code was in any way similar to the experience I had 10 years ago, but I definitely did feel a mild unbalancing shake in my head as I reluctantly clicked the link and scrolled downward.

Anton Shepelev

unread,
Oct 2, 2019, 9:08:56 AM10/2/19
to
Bonita Montero to Anton Shepelev:

An approach such as yours may lead to the unjustified
discrimination of any compiler that does not implement one
or another non-standard function that many other compilers
have implemented.

Mel

unread,
Oct 2, 2019, 9:13:37 AM10/2/19
to
This is not reallistic example as well.

--
Press any key to continue or any other to quit

Mel

unread,
Oct 2, 2019, 9:15:05 AM10/2/19
to
On Tue, 1 Oct 2019 21:25:12 +0200, Bonita Montero
<Bonita....@gmail.com> wrote:
> > I don't know this case, but typically on modern cpus "complex"
> > instructions like "leave" are slower than manual expanding the
> > instructions using "pop", "ret", etc.


> That's wrong.

Experience with x86 tells opozite. Obvious example is loop
instruction...

Mel

unread,
Oct 2, 2019, 9:19:10 AM10/2/19
to
On Tue, 1 Oct 2019 17:31:09 +0200, Bonita Montero
<Bonita....@gmail.com> wrote:
> >> The compiler usually uses RPB to remember the
frame-pointer-address
> >> at the prolog of the function. EBP is more suitable because it
can
> >> be restored in one step by the x86 LEAVE-instruction.


> > So what? The RSP is restored automatically by the ret
instruction.


> Yes, but LEAVE does three instructions in one step. Usually this
> results in the same uOPs, but it is only one slot in the decoder.

Problem is that you need rbp only for debugging purpose... One common
option is omit-frame-pointer... Redusing bloat significantly....

Bonita Montero

unread,
Oct 2, 2019, 9:21:10 AM10/2/19
to
> Problem is that you need rbp only for debugging purpose... One common
> option is omit-frame-pointer... Redusing bloat significantly....

We're talking about dynamic stack-frames with alloca() or VLAs.
In this case EBP is usually used to store the frame-pointer even
with optimized code.

Anton Shepelev

unread,
Oct 2, 2019, 9:26:19 AM10/2/19
to
Chris M. Thomasson:

> Well, I have something even more hackish. The region
> allocator that can be stack based. It can be fed with
> memory reaped from alloca:
> http://pastebin.com/raw/f37a23918

I find it hard to comprehend because the code has no
comments and uses macros extensively. Do I understand
aright that you have written, as it were, a secondary memory
manager that uses a C memory block instead of RAM and does
not support deallocation except by flushing the whole thing?

When is it more convenient than, say, local fixed arrays for
dynamic structures of limited size?

Bonita Montero

unread,
Oct 2, 2019, 9:42:54 AM10/2/19
to
>> Doesn't matter, almost any compiler supports it.

> An approach such as yours may lead to the unjustified
> discrimination of any compiler that does not implement one
> or another non-standard function that many other compilers
> have implemented.

That's ok if these compilers don't implement common convenience
-functions.

Chris M. Thomasson

unread,
Oct 2, 2019, 3:33:28 PM10/2/19
to
On 10/2/2019 6:26 AM, Anton Shepelev wrote:
> Chris M. Thomasson:
>
>> Well, I have something even more hackish. The region
>> allocator that can be stack based. It can be fed with
>> memory reaped from alloca:
>> http://pastebin.com/raw/f37a23918
>
> I find it hard to comprehend because the code has no
> comments and uses macros extensively. Do I understand
> aright that you have written, as it were, a secondary memory
> manager that uses a C memory block instead of RAM and does
> not support deallocation except by flushing the whole thing?

Will have more time to get back to you later on tonight. However, take a
deep look at reaps:

https://people.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

Anton Shepelev

unread,
Oct 2, 2019, 6:48:04 PM10/2/19
to
Chris M. Thomasson:

> Will have more time to get back to you later on tonight.
> However, take a deep look at reaps:
> https://people.cs.umass.edu/~emery/pubs/berger-oopsla2002.pdf

Their meaning of heap refers the group's previous researh
(Heap Layers) rather than to the general usage of the term
w.r.t. computer memory organisation. Heap layers seem to
comprise a sort of hierarchical and flexibly composable
memory manager where each layer allocates and deallocates
memory from the layer on top (!) of it, implemented as a
mix-in "superclass", but I don't understand why mix-ins in
particular and OOP in general should be essential to this
architencture. The rationale that they provide in the two
articles seems to be equally valid for conventional
dependency inversion. Remeber at least the parser
combinators thread here (comp.lang.c) to see what level of
composability is possible in C.

This 12-page article devotes only a page-worth of space
(including diagrams) to the explanation of
reaps -- allegedly its central subject and the only original
contribution described. When an object on in the region
part of the reap dies, it goes to an associated heap, which
is used for subsequent allocations until exhausted, when the
reap continues to grow. How exactly it works with respect
to objects of variable size is unclear, and the only example
does not include the case of resurrections (allocations from
the heap part).

In fig. 3b, which fails to explain the meanig of arrows,
`sbrk' is connected only with RegionHeap. Does it mean that
LeaHeap is composed directly of the removed chunks in the
region area? if so, whence does it take the memory to store
its metadata, which can hardly be of constant or upper-
bounded size? In general, LeaHeap seems to operate on the
"holes" in the region in order to prevent growing the region
unless absolutely unavoidable. I wonder, therefore, if this
allcoator does not, after an initial period of region
growth, degrade to LeaHeap performance in case of an
approximately balanced alloc/free sequence, because
ClearOptimizedHeap, by definition, will then work with
LeaHeap most of the time.

My personal and, as usual, naive thought was to manage a
collection of custom relocatable pointers:

struct relpointer
{ void* _; };

or indirect poiners void**, and defragment the region from
time to time. The user will use these pointers and the
memory allocator will make sure survive the rearangement.
In order avoid double indirection at every pointer access, a
lock mechanism can be introduced to ensure that a pointer is
not relocated while the lock holds. Locks should guard
performace-critical secsions with intensive pointer access.

In order to avoid serious slow-downs due to this occasional
tidying-up, one could do it no more than one "hole" at a
time using a fancy algorithm of step-wise defragmentation,
where each step is a fast operation that decreases
fragmentaion in one of several ways:

1. by moving a hole up the stack, so that collapsing it
requires the relocation of a smaller memory block from
top of the region-stack,

2. by moving a hole up the stack to a place adjacent with
another hole,

3. by collapsing a hole (located sufficiently close to
top of stack) through shifting all the memory in front
of it to the left by the size of the hole,

4. by plugging a hole with a block or blocks of the same
total size somewhere up the stack,

5. by allocating the space of a hole for a new object of
the same size,

Existing algorithms for efficient real-time disk
defragmentation may serve as an inspiration.

Melzzzzz

unread,
Oct 3, 2019, 12:09:34 AM10/3/19
to
On 2019-10-02, Bonita Montero <Bonita....@gmail.com> wrote:
So, alloca works against optimizations...

Bonita Montero

unread,
Oct 3, 2019, 1:48:37 AM10/3/19
to
>> We're talking about dynamic stack-frames with alloca() or VLAs.
>> In this case EBP is usually used to store the frame-pointer even
>> with optimized code.

> So, alloca works against optimizations...

alloca() is an optimization.


Melzzzzz

unread,
Oct 3, 2019, 1:49:25 AM10/3/19
to
What optimization as it slows down code?

Bonita Montero

unread,
Oct 3, 2019, 1:53:05 AM10/3/19
to
>> alloca() is an optimization.

> What optimization as it slows down code?

Boy, you're so stupid.

Melzzzzz

unread,
Oct 3, 2019, 2:46:34 AM10/3/19
to
On 2019-10-03, Bonita Montero <Bonita....@gmail.com> wrote:
Look into mirror...
It is loading more messages.
0 new messages