On 16/01/17 16:41, BartC wrote:
> On 16/01/2017 13:29, David Brown wrote:
>> On 16/01/17 12:28, BartC wrote:
>
>>> But the adjustment isn't done at entry to a function with all the other
>>> fixed-size objects (or not at all if a conventional stack frame is not
>>> used).
>>
>> Why do you think adjustment is done at entry to a function for
>> fixed-size objects? Adjustment /may/ be done on entry, but it may be
>> done later.
>
> If that is a lot of code that has to be executed before the VLA size is
> determined, then it is reasonable to assume that any allocations for the
> local variables are involved are done separately. Whether the earlier
> allocations are done at the actual function entry point or not, or are
> not done at all, is just nit-picking on your part.
/All/ considerations about when the allocation is done are nit-picking
with regard to the time spent (late allocation can be a good thing in
saving stack space overall). The actual allocation on the stack is
pretty much "free", whether it is a VLA or a constant size allocation.
>
>>> Furthermore, on some implementations such as gcc on x86, it can mean
>>> calling a function (__chkstk_ms()) to check the allocation isn't so
>>> large that it exceeds the stack guard page. This is not needed if it
>>> knows the local array will not be a beyond a certain size (I think 4KB
>>> or less total stack allocation within a function).
>>>
>>
>> You mean "gcc on windows", not "gcc on x86".
>
> Why wouldn't gcc not on Windows not use such a function? And why doesn't
> gcc use the same fast method on Windows?
The function is only needed on Windows, where you only have a single
spare 4 KB or 8 KB virtual page beyond the current stack. On Linux, you
have (by default) 8 MB virtual memory for the stack - it's all available
(modulo memory over-commit issues) for use when you want it. In both
Linux and Windows you need to ask for a bigger stack if you want to
exceed the default limits (1 MB on Windows, 8 MB on Linux).
The clue here is in the name - __chkstk_ms - "check stack Microsoft".
>
> (As I understand it, it allows Windows programs to commit a large stack
> size without needing to allocate the memory until needed. There is
> already a mechanism to grow the stack automatically if it increases by
> small increments, but not if it skips one or more VM pages. So how does
> gcc on Linux for example manage it? Or does it have to allocate the
> maximum stack size right from the start?)
On Windows, the system only allocates a single /virtual/ page beyond the
current stack end. That gets a physical page when it is touched, and
another virtual page is allocated (up to a limit which I believe is 1 MB
on Windows - I guess a program can change that if it wants to). If a
program tries to touch a stack page more than the next page down without
forcing the physical allocation for the pages along the way, you get a
fault.
On Linux, the system allocates 8 MB virtual space for the stack straight
away. Physical pages are mapped whenever a virtual page is touched.
The Linux method is more efficient and saves having an equivalent of
__chkstk_ms calls, but it relies on overcommit to avoid wasting real memory.
>
>> Yes, it is possible that __chkstk_ms() might be called more times than
>> necessary with a VLA than it would with fixed-size allocations, because
>> the compiler may not know an upper bound on the size for the VLA. It is
>> unlikely to be a big issue, because __chkstk_ms() is a fast function on
>> average.
>
> Nevertheless, it is an extra function call.
Yes, unless it can be inlined (it doesn't do much - just touches memory
every 4K or 8K until the the target size has been reached) and may not
be needed if the compiler knows a limit on the allocation size.
>
> I create one benchmark which ran twice as fast with a fixed size array
> as using a VLA of the same size (I knew it was the same; the compiler
> didn't). On Linux however which apparently doesn't use chkstk, it was
> only 10% faster.
Without any more information, it is hard to guess the details here.
>
>> And of course you can usually tell the compiler about the
>> limits for the VLA sizes (which you should know, to use them safely).
>
> You can't use them safely if you don't know what recursion depth is
> likely.
You can't use /any/ recursion safely unless you know the recursion
depth. You can't use /any/ function calls, or any local variables,
safely unless you know you are not going to exceed the stack limits.
VLAs are no different from fixed size allocations here, except that you
might need to think a little more to be sure of your limits for VLAs.
> BTW what does the compiler actually check the VLA size against
> the limits at runtime? If so this is no longer one or two instructions.
>
If a compiler implements stack size checking, then it probably does. If
the compiler does not implement stack size checking, then it will not
check the size of VLAs.
gcc has a number of flags that can be used for tracing stack usage at
compile time or run time, if that is what you mean by "the" compiler.