[LLVMdev] Adding a stack probe function attribute

John Kåre Alsaker

unread,

Jul 29, 2014, 2:31:53 AM7/29/14

to llv...@cs.uiuc.edu

Hi, I want to add a stack probe function attribute which would insert stack probes on all platforms, not just Windows. This will be useful for Rust since it must guarantee that the stack can't overflow, which it currently abuses the segmented stack support for. I'm not sure which kind of attribute is appropriate here. It must be added to the caller when inlined and clients of LLVM should be able to tell if code generation supports it. I would like some tips on how to implement this.

John Kåre Alsaker

unread,

Jul 31, 2014, 12:26:27 AM7/31/14

to llv...@cs.uiuc.edu, jab...@arxan.com

Here's an implementation of this: http://reviews.llvm.org/D4717

There's no way to tell if stack probes actually are being generated though...

Philip Reames

unread,

Jul 31, 2014, 12:44:39 PM7/31/14

to John Kåre Alsaker, llv...@cs.uiuc.edu

Giving a bit of background and motivation would be good here. What are you trying to accomplish and why?

Philip

On 07/28/2014 04:16 PM, John Kåre Alsaker wrote:

Hi, I want to add a stack probe function attribute which would insert stack probes on all platforms, not just Windows. This will be useful for Rust since it must guarantee that the stack can't overflow, which it currently abuses the segmented stack support for. I'm not sure which kind of attribute is appropriate here. It must be added to the caller when inlined and clients of LLVM should be able to tell if code generation supports it. I would like some tips on how to implement this.

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

John Kåre Alsaker

unread,

Jul 31, 2014, 10:30:59 PM7/31/14

to Philip Reames, llv...@cs.uiuc.edu

The point of this is to cheaply detect all stack overflows using a guard page. For a guard page to actually detect all stack overflows, we need to ensure that the code touches each page of the stack in the right order, otherwise it could skip the guard page and write outside the stack. That is very bad for languages such as Rust which provides memory safety, so it currently does an explicit comparison against the end of the stack for each function, which is again bad for performance. This would correspond to GCC's -fstack-check (if that worked).

Philip Reames

unread,

Aug 1, 2014, 5:15:54 PM8/1/14

to John Kåre Alsaker, llv...@cs.uiuc.edu

Thanks for the explanation. I'm used to hearing the term "stack banging" used for this mechanism, but I understand your objective.

I believe having a general mechanism here would be valuable, but only if the implementation doesn't make assumptions about runtime environment. For example, let's say my runtime uses a three page guard region and considers anything in that region to be a stack expansion. This should work with your attribute.

There's also different ways of implementing this. Depending on your runtime, you might want to a) call a function, b) emit some special loads. There's also numerous optimizations which apply for the later implementation choice.

Worth noting is that stack banging only when the stack size is larger than a page size is NOT sufficient unless you can *prove* that every smaller frame actually reads or writes to the frame before calling a subroutine.

As an example, consider the following recursive function:
int test(int i) { char buff[50]; if( i == 0 ) return 0; else return test(i-1); }

The arguments will be passed in registers. The buffer won't be initialized (assuming it's not compiled away), and you'll push a stack frame without touching the stack memory.

It would be nice if your attribute could also represent an explicit conditional check implementation as well.

Also, are you expecting the runtime to be able to throw an exception at the site of the check? If so, there's a bunch of other issues which need handled.

Straw man ideas:
- Start with an string attribute, work out the semantics and implementation, then propose a "real attribute" once we've settled on a workable implementation.
- Pick a more generic name. Possibly "StackOverflowGuard"?
- Use two parameters. First, "minimum guard region size" (non-negative integer number of bytes). Second, "test mechanism" (enum (FuncCall, Load, Store, ConditionalCheck)). For the conditional check version, you'd need a way to specify a failure handler. For the func call version, you need a way to set the routine.

Now, I realize this is well beyond what you originally wanted to implement. If you wanted to make the minimum change you could to support forcing the enable of the existing stack probe mechanism, we can discuss specifically that. I'd lean away from a general attribute for that purpose, but am open to being convinced otherwise. :)

Philip

John Kåre Alsaker

unread,

Aug 1, 2014, 8:41:45 PM8/1/14

to Philip Reames, llv...@cs.uiuc.edu

On Fri, Aug 1, 2014 at 11:12 PM, Philip Reames <list...@philipreames.com> wrote:

Thanks for the explanation. I'm used to hearing the term "stack banging" used for this mechanism, but I understand your objective.

I believe having a general mechanism here would be valuable, but only if the implementation doesn't make assumptions about runtime environment. For example, let's say my runtime uses a three page guard region and considers anything in that region to be a stack expansion. This should work with your attribute.

There's also different ways of implementing this. Depending on your runtime, you might want to a) call a function, b) emit some special loads. There's also numerous optimizations which apply for the later implementation choice.

Worth noting is that stack banging only when the stack size is larger than a page size is NOT sufficient unless you can *prove* that every smaller frame actually reads or writes to the frame before calling a subroutine.

As an example, consider the following recursive function:
int test(int i) { char buff[50]; if( i == 0 ) return 0; else return test(i-1); }

The arguments will be passed in registers. The buffer won't be initialized (assuming it's not compiled away), and you'll push a stack frame without touching the stack memory.

I don't think that will happen unless there's a separate call stack. Either the call instruction itself will write to the stack or it will be tail recursive call and the stack will be reused.

It would be nice if your attribute could also represent an explicit conditional check implementation as well.

The "split-stack" attribute can be used for that, and that's what Rust currently does.

Also, are you expecting the runtime to be able to throw an exception at the site of the check? If so, there's a bunch of other issues which need handled.

Straw man ideas:
- Start with an string attribute, work out the semantics and implementation, then propose a "real attribute" once we've settled on a workable implementation.

I seemed to have read that string attributes were for platform specific things, would that be incorrect?

- Pick a more generic name. Possibly "StackOverflowGuard"?
- Use two parameters. First, "minimum guard region size" (non-negative integer number of bytes). Second, "test mechanism" (enum (FuncCall, Load, Store, ConditionalCheck)). For the conditional check version, you'd need a way to specify a failure handler. For the func call version, you need a way to set the routine.

Now, I realize this is well beyond what you originally wanted to implement. If you wanted to make the minimum change you could to support forcing the enable of the existing stack probe mechanism, we can discuss specifically that. I'd lean away from a general attribute for that purpose, but am open to being convinced otherwise. :)

I think this should be limited to only lightweight stack probing, which either emits probe instructions or calls a function (provided by libgcc/compiler-rt) which does exactly the same. One extension I'm open for is to always force a probe which ensures the stack overflow happens in a well defined point in the prologue. That could be used for languages which can recover from stack overflows.

Philip Reames

unread,

Aug 4, 2014, 3:13:55 PM8/4/14

to John Kåre Alsaker, llv...@cs.uiuc.edu

On 08/01/2014 05:38 PM, John Kåre Alsaker wrote:

On Fri, Aug 1, 2014 at 11:12 PM, Philip Reames <list...@philipreames.com> wrote:

Thanks for the explanation. I'm used to hearing the term "stack banging" used for this mechanism, but I understand your objective.

I believe having a general mechanism here would be valuable, but only if the implementation doesn't make assumptions about runtime environment. For example, let's say my runtime uses a three page guard region and considers anything in that region to be a stack expansion. This should work with your attribute.

There's also different ways of implementing this. Depending on your runtime, you might want to a) call a function, b) emit some special loads. There's also numerous optimizations which apply for the later implementation choice.

Worth noting is that stack banging only when the stack size is larger than a page size is NOT sufficient unless you can *prove* that every smaller frame actually reads or writes to the frame before calling a subroutine.

As an example, consider the following recursive function:
int test(int i) { char buff[50]; if( i == 0 ) return 0; else return test(i-1); }

The arguments will be passed in registers. The buffer won't be initialized (assuming it's not compiled away), and you'll push a stack frame without touching the stack memory.

I don't think that will happen unless there's a separate call stack. Either the call instruction itself will write to the stack or it will be tail recursive call and the stack will be reused.

Er, yeah. Please ignore my momentary stupidity. You're entirely correct.

It would be nice if your attribute could also represent an explicit conditional check implementation as well.

The "split-stack" attribute can be used for that, and that's what Rust currently does.

Interesting. I did not know this. Using it for that purpose honestly feels like a bit of a hack. I'd rather factor the stack bounds checking and the handling separately.

Looking around, it looks like we also have an Erlang specific implementation which does essentially the same thing. Yuck.

Not saying we need to actually do so, just that I'd prefer that design in an ideal world. :)

Also, are you expecting the runtime to be able to throw an exception at the site of the check? If so, there's a bunch of other issues which need handled.

Straw man ideas:
- Start with an string attribute, work out the semantics and implementation, then propose a "real attribute" once we've settled on a workable implementation.

I seemed to have read that string attributes were for platform specific things, would that be incorrect?

Generally, they're either a) platform specific, and b) prototyping something. t.m.k. there is nothing that says a string attribute can't be used long term, but generally, they don't seem to be.

- Pick a more generic name. Possibly "StackOverflowGuard"?
- Use two parameters. First, "minimum guard region size" (non-negative integer number of bytes). Second, "test mechanism" (enum (FuncCall, Load, Store, ConditionalCheck)). For the conditional check version, you'd need a way to specify a failure handler. For the func call version, you need a way to set the routine.

Now, I realize this is well beyond what you originally wanted to implement. If you wanted to make the minimum change you could to support forcing the enable of the existing stack probe mechanism, we can discuss specifically that. I'd lean away from a general attribute for that purpose, but am open to being convinced otherwise. :)

I think this should be limited to only lightweight stack probing, which either emits probe instructions or calls a function (provided by libgcc/compiler-rt) which does exactly the same.

Given what you said about split-stack, I'm okay with this. The general cleanup is a worthwhile, but separate task.

One extension I'm open for is to always force a probe which ensures the stack overflow happens in a well defined point in the prologue. That could be used for languages which can recover from stack overflows.

This would need to be carefully specified.

Going back to your original proposal. I've come around to accepting the idea of adding a stack-probe attribute, but I'll request you improve the documentation some. In particular, mention the split-stack case, and mention that the called function is platform specific. I still think a better factored design is possibly, but you didn't create the mess we have, so I can't really expect you to fix it. :)

I'll add a couple of more detailed code comments on the review thread as well.

John Kåre Alsaker

unread,

Aug 5, 2014, 11:56:41 AM8/5/14

to Philip Reames, llv...@cs.uiuc.edu

Would the __probestack functions be a suitable addition to compiler-rt? Does it already have __chkstk or is that provided by something else on Windows? I noticed that libgcc implemented them in cygwin.S.

Is there a place for documentation of string attributes, or should that just go in the commit messages?

Reid Kleckner

unread,

Aug 5, 2014, 1:48:00 PM8/5/14

to John Kåre Alsaker, llv...@cs.uiuc.edu

On Tue, Aug 5, 2014 at 8:53 AM, John Kåre Alsaker <john.mai...@gmail.com> wrote:

Would the __probestack functions be a suitable addition to compiler-rt? Does it already have __chkstk or is that provided by something else on Windows? I noticed that libgcc implemented them in cygwin.S.

It seems reasonable to put them in compiler-rt/lib/builtins.

I don't think anyone is currently using compiler-rt's builtins on Windows, so it doesn't have __chkstk. Typically that is provided by the Cygwin, MinGW, or MSVC C runtime.

John Kåre Alsaker

unread,

Aug 6, 2014, 11:37:35 AM8/6/14

to Reid Kleckner, llv...@cs.uiuc.edu

I updated http://reviews.llvm.org/D4717 and also wrote an __probestack implementation: https://github.com/Zoxc/compiler-rt/compare/llvm-mirror:master...stprobe

Which instruction would be the preferable one to probe with? I used OR since that's what GCC/libgcc does, but I don't see why that would be better than a write.

John Kåre Alsaker

unread,

Jul 26, 2015, 5:24:49 AM7/26/15

to Reid Kleckner, llv...@cs.uiuc.edu

Since David Majnemer doesn't seem overly eager to merge my patches,
let's see if we can't figure things out here.

I noticed a string function attribute appeared in LangRef.rst, would
that be the correct place to document this?

For reference:
http://reviews.llvm.org/D9653
http://reviews.llvm.org/D9654
http://reviews.llvm.org/D9858

On Wed, Aug 6, 2014 at 5:34 PM, John Kåre Alsaker

<john.mai...@gmail.com> wrote:
> I updated http://reviews.llvm.org/D4717 and also wrote an __probestack
> implementation:
> https://github.com/Zoxc/compiler-rt/compare/llvm-mirror:master...stprobe
>
> Which instruction would be the preferable one to probe with? I used OR since
> that's what GCC/libgcc does, but I don't see why that would be better than a
> write.
>
>
>
> On Tue, Aug 5, 2014 at 7:34 PM, Reid Kleckner <r...@google.com> wrote:
>>
>> On Tue, Aug 5, 2014 at 8:53 AM, John Kåre Alsaker
>> <john.mai...@gmail.com> wrote:
>>>
>>> Would the __probestack functions be a suitable addition to compiler-rt?
>>> Does it already have __chkstk or is that provided by something else on
>>> Windows? I noticed that libgcc implemented them in cygwin.S.
>>
>>
>> It seems reasonable to put them in compiler-rt/lib/builtins.
>>
>> I don't think anyone is currently using compiler-rt's builtins on Windows,
>> so it doesn't have __chkstk. Typically that is provided by the Cygwin,
>> MinGW, or MSVC C runtime.
>
>

_______________________________________________

Reid Kleckner

unread,

Jul 27, 2015, 6:51:19 PM7/27/15

to John Kåre Alsaker, Joseph Tremoulet, llv...@cs.uiuc.edu

Yeah, the function attributes section of LangRef is a reasonable place to put stuff like this:

http://llvm.org/docs/LangRef.html#function-attributes

I think we should add this. I also know that LLILAC needs something like this as well. I propose the following:

- Add a string attribute called "stack-probe-symbol"="foo".

- The presence of this attribute indicates that stack probes should be emitted, even on non-Windows OSs.

- (future work) For LLILAC, if this attribute is present but the string is empty, this can be a signal that the check must be emitted inline, either as a sequence of stores or a loop.

This also addresses David's concern with the hardcoded __probestack symbol name.

LLVM user

unread,

Jul 27, 2015, 11:04:58 PM7/27/15

to Reid Kleckner, llv...@cs.uiuc.edu

For what it's worth, I need stack probes as well. My language usecase has two relevant properties:

1. The language guarantees total memory safety/sandboxing. Malicious programs are unable to negatively affect the rest of the system beyond consuming resources.

2. The user of the language is allowed significant expressive power, letting him manipulate the stack.

Stack probes prevent users of a language from skipping over the guard page; it's the last memory hole to be plugged once the rest of the system is secure. I think in general, any language that has these two properties will require either stack probes or a much more complicated system (like geordi, segmented stacks, or obsessive loading of allocated memory). A great number of languages fall into this situation.

John Kåre Alsaker

unread,

Jul 28, 2015, 5:29:18 AM7/28/15

to Reid Kleckner, llv...@cs.uiuc.edu

On Tue, Jul 28, 2015 at 12:44 AM, Reid Kleckner <r...@google.com> wrote:
> Yeah, the function attributes section of LangRef is a reasonable place to
> put stuff like this:
> http://llvm.org/docs/LangRef.html#function-attributes

I'll see if I can't sneak something in there.

>
> I think we should add this. I also know that LLILAC needs something like
> this as well. I propose the following:
> - Add a string attribute called "stack-probe-symbol"="foo".
> - The presence of this attribute indicates that stack probes should be
> emitted, even on non-Windows OSs.
> - (future work) For LLILAC, if this attribute is present but the string is
> empty, this can be a signal that the check must be emitted inline, either as
> a sequence of stores or a loop.
>
> This also addresses David's concern with the hardcoded __probestack symbol
> name.

First of all, LLVM should be free to choose how it does stack probes,
it could call ___chkstk_ms, ___chkstk_ms, __chkstk, _alloca, _chkstk,
__probestack or any other stack probe function it knows about, it
could unroll and inline it for smaller allocation amounts, it could
inline the function entirely or it could do nothing, for platforms
which does stack overflow checks in hardware.

I don't see why hardcoding __probestack is different from every other
hardcoded thing in LLVM. Furthermore since calls to it can be elided
it is not useful for clients to specify their own function, so they
would just point it to whatever the platform stack probing function
would be (replicating the ugly logic in
X86FrameLowering::emitStackProbeCall). If LLVM in the future always
inlined the call, the stack probe function would never be called and
the attribute argument is useless.

-

_______________________________________________

Reid Kleckner

unread,

Jul 28, 2015, 12:46:31 PM7/28/15

to John Kåre Alsaker, llv...@cs.uiuc.edu

The difference between __probestack and __chkstk etc is that we are happy to call into existing interfaces that are somehow guaranteed by the environment. Sometimes we do invent our own in compiler-rt for obscure cases like i128 division, but it's rare. After years of adapting to fit pre-existing interfaces, we are naturally very cautious to define our own. Since not everyone uses compiler-rt, I worry about a situation where people fight over the definition of __probestack, or where users want to override __probestack to call into their runtime, rather than dealing with signals.

John Kåre Alsaker

unread,

Jul 28, 2015, 5:04:51 PM7/28/15

to Reid Kleckner, llv...@cs.uiuc.edu

The code does need to go somewhere though.

> Since not everyone uses compiler-rt, I worry about a situation where people
> fight over the definition of __probestack

Wouldn't this be resolved by defining what __probestack does?

> , or where users want to override
> __probestack to call into their runtime, rather than dealing with signals.

As I said before, calls to __probestack are not guaranteed to be
emitted, so clients can't rely on it doing anything other than probing
the stack. Also clients must always deal with guard page faults. Those
will usually happen outside of __probestack, since functions with
large stack frames are rare.

John Kåre Alsaker

unread,

Aug 5, 2015, 4:56:58 PM8/5/15

to Reid Kleckner, llvm...@lists.llvm.org

Rust is going ahead and removing it's abuse of segmented stack support
for stack overflow checking
(https://github.com/rust-lang/rust/pull/27338). This leaves it without
foolproof stack overflow checking. So it would be nice if this could
land soonish.

I suggest the following:

Add a "probe-stack" attribute (which can become a proper attribute
named probestack later), leaving the implementation of stack probing
up to LLVM.

Implement stack probing for x86 on non-Windows platforms with a call
to __probestack, with __probestack being added to compiler-rt.

Implement stack probing for ARM using the same code as Windows only
with __chkstk renamed to __probestack.

Other backends should use __probestack as the name for the stack
probing function if needed.

If there turns out to be an issue with the stack probe function name,
add a "stack-probe-function" attribute later (which goes along with
"stack-probe-size") allowing users to use their own name for the stack
probe function.

_______________________________________________
LLVM Developers mailing list

llvm...@lists.llvm.org http://llvm.cs.uiuc.edu
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

John Kåre Alsaker

unread,

Aug 16, 2015, 2:25:42 PM8/16/15

to Reid Kleckner, llvm...@lists.llvm.org

I started to implement inlining of the stack probe function based on
Microsoft's inlined stack probes in
https://github.com/Microsoft/llvm/tree/MS.

Do we know why the stack pointer cannot be updated in a loop (which
results in ideal code)? I noticed that was commented in Microsoft's
code.
I suspect this is due to debug or unwinding information, since it is
allowed on Windows x86-32.

I ran into two issues while implementing this.

The epilog was inserted into the wrong basic block. This is because
the basic blocks to insert epilogs into is calculated before
emitPrologue is called.
I fixed this by adding the following code after the emitPrologue call
in PEI::insertPrologEpilogCode:

RestoreBlocks.clear();
calculateSets(Fn);

This doesn't seem like a very nice solution. It's also unclear to me
how Microsoft's branch handles this.

The other issue is with code generation. All the tests passed but the
one were segmented stacks are used and stack probes are required.
I don't see what is actually wrong here, and would like some help.
Here is the output:

# After Prologue/Epilogue Insertion & Frame Finalization
# Machine code for function test_large: Post SSA
Frame Objects:
fi#0: size=40000, align=4, at location [SP-40000]

BB#4:
%R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg
CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def>
JA_1 <BB#0>, %EFLAGS<imp-use>
Successors according to CFG: BB#3 BB#0

BB#3:
Predecessors according to CFG: BB#4
%R10<def> = MOV64ri 40040
%R11<def> = MOV64ri 32
CALL64pcrel32 <es:__morestack>, %RSP<imp-use>
MORESTACK_RET
Successors according to CFG: BB#0

BB#0: derived from LLVM BB %0
Predecessors according to CFG: BB#3 BB#4
%EAX<def> = MOV32ri 40040; flags: FrameSetup
%RDX<def> = MOV64rr %RAX; flags: FrameSetup
%RCX<def> = MOV64rr %RSP; flags: FrameSetup
Successors according to CFG: BB#1

BB#1: derived from LLVM BB %0
Predecessors according to CFG: BB#0 BB#1
OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>;
flags: FrameSetup
%RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
%RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup
Successors according to CFG: BB#2 BB#1

BB#2: derived from LLVM BB %0
Predecessors according to CFG: BB#1
%RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX, %EFLAGS<imp-def>;
flags: FrameSetup
SEH_StackAlloc 40040; flags: FrameSetup
SEH_EndPrologue; flags: FrameSetup
%RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg
%EDX<def> = MOV32r0 %EFLAGS<imp-def,dead>
CALL64pcrel32 <ga:@dummy_use>, <regmask>, %RSP<imp-use>,
%RCX<imp-use>, %EDX<imp-use,kill>, %RSP<imp-def>
SEH_Epilogue
%RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040, %EFLAGS<imp-def,dead>
RETQ

# End machine code for function test_large.

*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: OR64mi8- operand 0: %RCX

*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RCX<def,tied1> = SUB64ri32- operand 1: %RCX<tied0>

*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RDX<def,tied1> = SUB64ri32- operand 1: %RDX<tied0>

*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#2 (0x40b1e20)
- instruction: %RSP<def,tied1> = SUB64rr- operand 2: %RAX
LLVM ERROR: Found 4 machine code errors.

Andy Ayers via llvm-dev

unread,

Aug 17, 2015, 1:46:39 PM8/17/15

to John Kåre Alsaker, Reid Kleckner, llvm...@lists.llvm.org

The stack pointer can't ever safely point beyond the valid range. I don't 100% know the exact reason but suspect the OS may verify this on context save/restore, which can happen unpredictably. This can be tricky to guarantee if you're modifying the stack pointer in flight without some extra math here and there, at which point you might as well use another register and defer the update like we do.

I'm also not sure what you mean by updating the stack pointer being "allowed" for x86-32 -- if you look at the implementation of __chkstk for x86-32, which we provide in sources with VS, it does not modify ESP in the loop -- it uses EAX to probe the stack.

[from] c:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\intel\chkstk.asm

mov eax, esp ; current TOS
and eax, not ( _PAGESIZE_ - 1) ; Round down to current page boundary

cs10:
cmp ecx, eax ; Is new TOS
jb short cs20 ; in probed page?
mov eax, ecx ; yes.
pop ecx
xchg esp, eax ; update esp
mov eax, dword ptr [eax] ; get return address
mov dword ptr [esp], eax ; and put it at new TOS
ret

; Find next lower page and probe
cs20:
sub eax, _PAGESIZE_ ; decrease by PAGESIZE
test dword ptr [eax],eax ; probe page.
jmp short cs10

Another common pattern which we haven't optimized for yet is to probe and zero the newly grown stack region.

We have the same issue with epilogs that you saw. We've worked around it for now by generating epilogs first, but this seems a bit hacky. I'd like to see us fix this by delaying the inline expansion of a stack probe until after prolog/epilog generation. That also removes the need for us to update the prolog block in ::emitProlog, which is a bit clunky. But we haven't gotten around to this yet.

-----Original Message-----
From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of John Kåre Alsaker via llvm-dev
Sent: Sunday, August 16, 2015 11:26 AM
To: Reid Kleckner <r...@google.com>
Cc: llvm...@lists.llvm.org
Subject: Re: [llvm-dev] [LLVMdev] Adding a stack probe function attribute

I started to implement inlining of the stack probe function based on Microsoft's inlined stack probes in https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fMicrosoft%2fllvm%2ftree%2fMS.&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=pm63Kdg2E%2bprrwKZKpcGpQNEYjwBaY7%2beZrX8eYfyYo%3d

llvm...@lists.llvm.org https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2fllvm.cs.uiuc.edu&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=iQjl5g5NZHMOzJ4%2fhFEN6ECfAnFBR5Evud7KBAWO%2b7Y%3d
https://na01.safelinks.protection.outlook.com/?url=http%3a%2f%2flists.llvm.org%2fcgi-bin%2fmailman%2flistinfo%2fllvm-dev%0a&data=01%7c01%7candya%40microsoft.com%7c7f3665f51a1a45a612c008d2a668163e%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=BiZb%2bhINDhjdc0YshtClHXKkOS10wY7Gjw6LWjzozP0%3d

John Kåre Alsaker

unread,

Aug 30, 2015, 4:16:48 PM8/30/15

to Andy Ayers, llvm...@lists.llvm.org

I've submited my patch to inline stack probes here:
http://reviews.llvm.org/D12483

On Mon, Aug 17, 2015 at 7:46 PM, Andy Ayers <an...@microsoft.com> wrote:
> The stack pointer can't ever safely point beyond the valid range. I don't 100% know the exact reason but suspect the OS may verify this on context save/restore, which can happen unpredictably. This can be tricky to guarantee if you're modifying the stack pointer in flight without some extra math here and there, at which point you might as well use another register and defer the update like we do.
>
> I'm also not sure what you mean by updating the stack pointer being "allowed" for x86-32 -- if you look at the implementation of __chkstk for x86-32, which we provide in sources with VS, it does not modify ESP in the loop -- it uses EAX to probe the stack.

I was thinking of the fact that __chkstk modified SP on x86-32, but
not on x86-64.

> We have the same issue with epilogs that you saw. We've worked around it for now by generating epilogs first, but this seems a bit hacky. I'd like to see us fix this by delaying the inline expansion of a stack probe until after prolog/epilog generation. That also removes the need for us to update the prolog block in ::emitProlog, which is a bit clunky. But we haven't gotten around to this yet.

Hm.. I did try just generating epilogs first, but I ran into test failures.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org

http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Reply all

Reply to author

Forward