Hi, I want to add a stack probe function attribute which would insert stack probes on all platforms, not just Windows. This will be useful for Rust since it must guarantee that the stack can't overflow, which it currently abuses the segmented stack support for. I'm not sure which kind of attribute is appropriate here. It must be added to the caller when inlined and clients of LLVM should be able to tell if code generation supports it. I would like some tips on how to implement this.
_______________________________________________ LLVM Developers mailing list LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Thanks for the explanation. I'm used to hearing the term "stack banging" used for this mechanism, but I understand your objective.
I believe having a general mechanism here would be valuable, but only if the implementation doesn't make assumptions about runtime environment. For example, let's say my runtime uses a three page guard region and considers anything in that region to be a stack expansion. This should work with your attribute.
There's also different ways of implementing this. Depending on your runtime, you might want to a) call a function, b) emit some special loads. There's also numerous optimizations which apply for the later implementation choice.
Worth noting is that stack banging only when the stack size is larger than a page size is NOT sufficient unless you can *prove* that every smaller frame actually reads or writes to the frame before calling a subroutine.
As an example, consider the following recursive function:
int test(int i) { char buff[50]; if( i == 0 ) return 0; else return test(i-1); }
The arguments will be passed in registers. The buffer won't be initialized (assuming it's not compiled away), and you'll push a stack frame without touching the stack memory.
It would be nice if your attribute could also represent an explicit conditional check implementation as well.
Also, are you expecting the runtime to be able to throw an exception at the site of the check? If so, there's a bunch of other issues which need handled.
Straw man ideas:
- Start with an string attribute, work out the semantics and implementation, then propose a "real attribute" once we've settled on a workable implementation.
- Pick a more generic name. Possibly "StackOverflowGuard"?
- Use two parameters. First, "minimum guard region size" (non-negative integer number of bytes). Second, "test mechanism" (enum (FuncCall, Load, Store, ConditionalCheck)). For the conditional check version, you'd need a way to specify a failure handler. For the func call version, you need a way to set the routine.
Now, I realize this is well beyond what you originally wanted to implement. If you wanted to make the minimum change you could to support forcing the enable of the existing stack probe mechanism, we can discuss specifically that. I'd lean away from a general attribute for that purpose, but am open to being convinced otherwise. :)
On Fri, Aug 1, 2014 at 11:12 PM, Philip Reames <list...@philipreames.com> wrote:
Thanks for the explanation. I'm used to hearing the term "stack banging" used for this mechanism, but I understand your objective.
I believe having a general mechanism here would be valuable, but only if the implementation doesn't make assumptions about runtime environment. For example, let's say my runtime uses a three page guard region and considers anything in that region to be a stack expansion. This should work with your attribute.
There's also different ways of implementing this. Depending on your runtime, you might want to a) call a function, b) emit some special loads. There's also numerous optimizations which apply for the later implementation choice.
Worth noting is that stack banging only when the stack size is larger than a page size is NOT sufficient unless you can *prove* that every smaller frame actually reads or writes to the frame before calling a subroutine.
As an example, consider the following recursive function:
int test(int i) { char buff[50]; if( i == 0 ) return 0; else return test(i-1); }
The arguments will be passed in registers. The buffer won't be initialized (assuming it's not compiled away), and you'll push a stack frame without touching the stack memory.
I don't think that will happen unless there's a separate call stack. Either the call instruction itself will write to the stack or it will be tail recursive call and the stack will be reused.
It would be nice if your attribute could also represent an explicit conditional check implementation as well.
The "split-stack" attribute can be used for that, and that's what Rust currently does.
Also, are you expecting the runtime to be able to throw an exception at the site of the check? If so, there's a bunch of other issues which need handled.
Straw man ideas:
- Start with an string attribute, work out the semantics and implementation, then propose a "real attribute" once we've settled on a workable implementation.
I seemed to have read that string attributes were for platform specific things, would that be incorrect?
- Pick a more generic name. Possibly "StackOverflowGuard"?
- Use two parameters. First, "minimum guard region size" (non-negative integer number of bytes). Second, "test mechanism" (enum (FuncCall, Load, Store, ConditionalCheck)). For the conditional check version, you'd need a way to specify a failure handler. For the func call version, you need a way to set the routine.
Now, I realize this is well beyond what you originally wanted to implement. If you wanted to make the minimum change you could to support forcing the enable of the existing stack probe mechanism, we can discuss specifically that. I'd lean away from a general attribute for that purpose, but am open to being convinced otherwise. :)I think this should be limited to only lightweight stack probing, which either emits probe instructions or calls a function (provided by libgcc/compiler-rt) which does exactly the same.
One extension I'm open for is to always force a probe which ensures the stack overflow happens in a well defined point in the prologue. That could be used for languages which can recover from stack overflows.
Would the __probestack functions be a suitable addition to compiler-rt? Does it already have __chkstk or is that provided by something else on Windows? I noticed that libgcc implemented them in cygwin.S.
I noticed a string function attribute appeared in LangRef.rst, would
that be the correct place to document this?
For reference:
http://reviews.llvm.org/D9653
http://reviews.llvm.org/D9654
http://reviews.llvm.org/D9858
On Wed, Aug 6, 2014 at 5:34 PM, John Kåre Alsaker
<john.mai...@gmail.com> wrote:
> I updated http://reviews.llvm.org/D4717 and also wrote an __probestack
> implementation:
> https://github.com/Zoxc/compiler-rt/compare/llvm-mirror:master...stprobe
>
> Which instruction would be the preferable one to probe with? I used OR since
> that's what GCC/libgcc does, but I don't see why that would be better than a
> write.
>
>
>
> On Tue, Aug 5, 2014 at 7:34 PM, Reid Kleckner <r...@google.com> wrote:
>>
>> On Tue, Aug 5, 2014 at 8:53 AM, John Kåre Alsaker
>> <john.mai...@gmail.com> wrote:
>>>
>>> Would the __probestack functions be a suitable addition to compiler-rt?
>>> Does it already have __chkstk or is that provided by something else on
>>> Windows? I noticed that libgcc implemented them in cygwin.S.
>>
>>
>> It seems reasonable to put them in compiler-rt/lib/builtins.
>>
>> I don't think anyone is currently using compiler-rt's builtins on Windows,
>> so it doesn't have __chkstk. Typically that is provided by the Cygwin,
>> MinGW, or MSVC C runtime.
>
>
_______________________________________________
>
> I think we should add this. I also know that LLILAC needs something like
> this as well. I propose the following:
> - Add a string attribute called "stack-probe-symbol"="foo".
> - The presence of this attribute indicates that stack probes should be
> emitted, even on non-Windows OSs.
> - (future work) For LLILAC, if this attribute is present but the string is
> empty, this can be a signal that the check must be emitted inline, either as
> a sequence of stores or a loop.
>
> This also addresses David's concern with the hardcoded __probestack symbol
> name.
First of all, LLVM should be free to choose how it does stack probes,
it could call ___chkstk_ms, ___chkstk_ms, __chkstk, _alloca, _chkstk,
__probestack or any other stack probe function it knows about, it
could unroll and inline it for smaller allocation amounts, it could
inline the function entirely or it could do nothing, for platforms
which does stack overflow checks in hardware.
I don't see why hardcoding __probestack is different from every other
hardcoded thing in LLVM. Furthermore since calls to it can be elided
it is not useful for clients to specify their own function, so they
would just point it to whatever the platform stack probing function
would be (replicating the ugly logic in
X86FrameLowering::emitStackProbeCall). If LLVM in the future always
inlined the call, the stack probe function would never be called and
the attribute argument is useless.
-
_______________________________________________
> Since not everyone uses compiler-rt, I worry about a situation where people
> fight over the definition of __probestack
Wouldn't this be resolved by defining what __probestack does?
> , or where users want to override
> __probestack to call into their runtime, rather than dealing with signals.
As I said before, calls to __probestack are not guaranteed to be
emitted, so clients can't rely on it doing anything other than probing
the stack. Also clients must always deal with guard page faults. Those
will usually happen outside of __probestack, since functions with
large stack frames are rare.
I suggest the following:
Add a "probe-stack" attribute (which can become a proper attribute
named probestack later), leaving the implementation of stack probing
up to LLVM.
Implement stack probing for x86 on non-Windows platforms with a call
to __probestack, with __probestack being added to compiler-rt.
Implement stack probing for ARM using the same code as Windows only
with __chkstk renamed to __probestack.
Other backends should use __probestack as the name for the stack
probing function if needed.
If there turns out to be an issue with the stack probe function name,
add a "stack-probe-function" attribute later (which goes along with
"stack-probe-size") allowing users to use their own name for the stack
probe function.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org http://llvm.cs.uiuc.edu
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Do we know why the stack pointer cannot be updated in a loop (which
results in ideal code)? I noticed that was commented in Microsoft's
code.
I suspect this is due to debug or unwinding information, since it is
allowed on Windows x86-32.
I ran into two issues while implementing this.
The epilog was inserted into the wrong basic block. This is because
the basic blocks to insert epilogs into is calculated before
emitPrologue is called.
I fixed this by adding the following code after the emitPrologue call
in PEI::insertPrologEpilogCode:
RestoreBlocks.clear();
calculateSets(Fn);
This doesn't seem like a very nice solution. It's also unclear to me
how Microsoft's branch handles this.
The other issue is with code generation. All the tests passed but the
one were segmented stacks are used and stack probes are required.
I don't see what is actually wrong here, and would like some help.
Here is the output:
# After Prologue/Epilogue Insertion & Frame Finalization
# Machine code for function test_large: Post SSA
Frame Objects:
fi#0: size=40000, align=4, at location [SP-40000]
BB#4:
%R11<def> = LEA64r %RSP, 1, %noreg, -40040, %noreg
CMP64rm %R11, %noreg, 1, %noreg, 40, %GS, %EFLAGS<imp-def>
JA_1 <BB#0>, %EFLAGS<imp-use>
Successors according to CFG: BB#3 BB#0
BB#3:
Predecessors according to CFG: BB#4
%R10<def> = MOV64ri 40040
%R11<def> = MOV64ri 32
CALL64pcrel32 <es:__morestack>, %RSP<imp-use>
MORESTACK_RET
Successors according to CFG: BB#0
BB#0: derived from LLVM BB %0
Predecessors according to CFG: BB#3 BB#4
%EAX<def> = MOV32ri 40040; flags: FrameSetup
%RDX<def> = MOV64rr %RAX; flags: FrameSetup
%RCX<def> = MOV64rr %RSP; flags: FrameSetup
Successors according to CFG: BB#1
BB#1: derived from LLVM BB %0
Predecessors according to CFG: BB#0 BB#1
OR64mi8 %RCX, 1, %noreg, 0, %noreg, 0, %EFLAGS<imp-def>;
flags: FrameSetup
%RCX<def,tied1> = SUB64ri32 %RCX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
%RDX<def,tied1> = SUB64ri32 %RDX<tied0>, 4096,
%EFLAGS<imp-def>; flags: FrameSetup
JAE_1 <BB#1>, %EFLAGS<imp-use>; flags: FrameSetup
Successors according to CFG: BB#2 BB#1
BB#2: derived from LLVM BB %0
Predecessors according to CFG: BB#1
%RSP<def,tied1> = SUB64rr %RSP<tied0>, %RAX, %EFLAGS<imp-def>;
flags: FrameSetup
SEH_StackAlloc 40040; flags: FrameSetup
SEH_EndPrologue; flags: FrameSetup
%RCX<def> = LEA64r %RSP, 1, %noreg, 40, %noreg
%EDX<def> = MOV32r0 %EFLAGS<imp-def,dead>
CALL64pcrel32 <ga:@dummy_use>, <regmask>, %RSP<imp-use>,
%RCX<imp-use>, %EDX<imp-use,kill>, %RSP<imp-def>
SEH_Epilogue
%RSP<def,tied1> = ADD64ri32 %RSP<tied0>, 40040, %EFLAGS<imp-def,dead>
RETQ
# End machine code for function test_large.
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: OR64mi8- operand 0: %RCX
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RCX<def,tied1> = SUB64ri32- operand 1: %RCX<tied0>
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#1 (0x40b1d70)
- instruction: %RDX<def,tied1> = SUB64ri32- operand 1: %RDX<tied0>
*** Bad machine code: Using an undefined physical register ***
- function: test_large
- basic block: BB#2 (0x40b1e20)
- instruction: %RSP<def,tied1> = SUB64rr- operand 2: %RAX
LLVM ERROR: Found 4 machine code errors.
On Mon, Aug 17, 2015 at 7:46 PM, Andy Ayers <an...@microsoft.com> wrote:
> The stack pointer can't ever safely point beyond the valid range. I don't 100% know the exact reason but suspect the OS may verify this on context save/restore, which can happen unpredictably. This can be tricky to guarantee if you're modifying the stack pointer in flight without some extra math here and there, at which point you might as well use another register and defer the update like we do.
>
> I'm also not sure what you mean by updating the stack pointer being "allowed" for x86-32 -- if you look at the implementation of __chkstk for x86-32, which we provide in sources with VS, it does not modify ESP in the loop -- it uses EAX to probe the stack.
I was thinking of the fact that __chkstk modified SP on x86-32, but
not on x86-64.
> We have the same issue with epilogs that you saw. We've worked around it for now by generating epilogs first, but this seems a bit hacky. I'd like to see us fix this by delaying the inline expansion of a stack probe until after prolog/epilog generation. That also removes the need for us to update the prolog block in ::emitProlog, which is a bit clunky. But we haven't gotten around to this yet.
Hm.. I did try just generating epilogs first, but I ran into test failures.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org