[LLVMdev] Controlling the stack layout

311 views
Skip to first unread message

Nicolas Geoffray

unread,
Dec 27, 2008, 5:28:30 PM12/27/08
to LLVM Developers Mailing List
Hi everyone,

As a front-end developer, I'd like to add a language-specific
information at a fixed location of each stack frame. The reason is that
I want to retrieve this information when dynamically walking the stack.

For example, X86 has the following stack layout for a function with two
arguments and two locals:

12(%ebp) - second function parameter
8(%ebp) - first function parameter
4(%ebp) - old %EIP (the function's "return address")
0(%ebp) - old %EBP (previous function's base pointer)
-4(%ebp) - first local variable
-8(%ebp) - second local variable


I'd like to generate this layout:

12(%ebp) - second function parameter
8(%ebp) - first function parameter
4(%ebp) - old %EIP (the function's "return address")
0(%ebp) - old %EBP (previous function's base pointer)
-4(%ebp) - My language specific information
-8(%ebp) - first local variable
-12(%ebp) - second local variable


Can I express this in LLVM without modifying llvm internals? I looked at
writing a machine function pass, but I can't register one when JITting.
Is the machine function pass the correct way of implementing this?

Thanks,
Nicolas
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nick Johnson

unread,
Dec 28, 2008, 11:31:22 AM12/28/08
to LLVM Developers Mailing List
> I'd like to generate this layout:
>
> 12(%ebp) - second function parameter
> 8(%ebp) - first function parameter
> 4(%ebp) - old %EIP (the function's "return address")
> 0(%ebp) - old %EBP (previous function's base pointer)
> -4(%ebp) - My language specific information
> -8(%ebp) - first local variable
> -12(%ebp) - second local variable
>

Take a look at the register allocators, each of which is implemented
as a MachineFunctionPass. These are responsible for assigning stack
slots, and it shouldn't be too hard to modify them to reserve some
stack slots.

Also, I'd recommend that you consider the prevalence of this language
specific information, and re-evaluate whether you want to modify the
stack frame at all. Do you expect *every* function will need it, or
*at most* every function will need it? Could this information also be
encoded into a per-function local variable? Could your compiler
generate code to maintain a separate stack for such information?

Also, out of curiosity: are you working on something like Java
security contexts? Or perhaps something like ProPolice canary values?

--
Nick Johnson

Basile STARYNKEVITCH

unread,
Dec 28, 2008, 11:59:32 AM12/28/08
to LLVM Developers Mailing List
Nick Johnson wrote (citing somebody else):

>> I'd like to generate this layout:
>>
>> 12(%ebp) - second function parameter
>> 8(%ebp) - first function parameter
>> 4(%ebp) - old %EIP (the function's "return address")
>> 0(%ebp) - old %EBP (previous function's base pointer)
>> -4(%ebp) - My language specific information
>> -8(%ebp) - first local variable
>> -12(%ebp) - second local variable
>>
>>
>
> Take a look at the register allocators, each of which is implemented
> as a MachineFunctionPass. These are responsible for assigning stack
> slots, and it shouldn't be too hard to modify them to reserve some
> stack slots.
>
> Also, I'd recommend that you consider the prevalence of this language
> specific information, and re-evaluate whether you want to modify the
> stack frame at all. Do you expect *every* function will need it, or
> *at most* every function will need it?

I don't know the motivation of the initial poster, but I do understand
the wish to put some language specific information on the call stack. I
even can think of several reasons to do that:

1. easy support of a precise garbage collector: you want each call frame
to have, at some fixed offset, a pointer to some frame layout descriptor
(used by your copying garbage collector) which tells what are the
pointers in the call frame. Another way to do that is to have your GC
use the return address to find out the routine and hence its frame
layout (IIRC, the ocaml native compiler does that), but this is much
harder to implement.

2. introspective/reflective facilities: your language is compiled but
yet has the ability to inspect each of its call stack frame. There are
many reasons to want that.

3. reification of continuations : your language implement continuations,
and perhaps facilities to inspect them, or dump or serialize them, ...

Happy New Year 2009 to everyone!

Regards.

--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***
membre de l'APRIL "promouvoir et défendre le logiciel libre"
Rejoignez maitenant pplus de 3700 adhérents http://www.april.org

Nicolas Geoffray

unread,
Dec 29, 2008, 1:54:27 AM12/29/08
to LLVM Developers Mailing List
Hi Nick,

Nick Johnson wrote:
>> I'd like to generate this layout:
>>
>> 12(%ebp) - second function parameter
>> 8(%ebp) - first function parameter
>> 4(%ebp) - old %EIP (the function's "return address")
>> 0(%ebp) - old %EBP (previous function's base pointer)
>> -4(%ebp) - My language specific information
>> -8(%ebp) - first local variable
>> -12(%ebp) - second local variable
>>
>>
>
> Take a look at the register allocators, each of which is implemented
> as a MachineFunctionPass. These are responsible for assigning stack
> slots, and it shouldn't be too hard to modify them to reserve some
> stack slots.
>
>

Thanks, but I don't want to modify the register allocators. I'd like to
write an llvm-external machine pass.

> Also, I'd recommend that you consider the prevalence of this language
> specific information, and re-evaluate whether you want to modify the
> stack frame at all.

Yes, I do :) There are some alternatives, but this looks like the most
efficient. What I'm facing is engineering issues, since adding a new
information in the stack frame is similar to adding the frame-pointer
information.

> Do you expect *every* function will need it, or
> *at most* every function will need it?

Every function that I compile with llvm.

> Could this information also be
> encoded into a per-function local variable?

No, because then you wouldn't know where the information is stored when
walking the stack.

> Could your compiler
> generate code to maintain a separate stack for such information?
>
>

Sure, but it's much more expensive than a simple push and pop.

> Also, out of curiosity: are you working on something like Java
> security contexts? Or perhaps something like ProPolice canary values?
>
>

I'm working on VMKit, which implements a JVM on top of LLVM. And an easy
way to walk the stack is to have a methodID stored in each stack frame
to locate which method the frame belongs to.

Nicolas

Anton Korobeynikov

unread,
Dec 29, 2008, 3:27:42 AM12/29/08
to LLVM Developers Mailing List
Hi, Nicolas

> Yes, I do :) There are some alternatives, but this looks like the most
> efficient. What I'm facing is engineering issues, since adding a new
> information in the stack frame is similar to adding the frame-pointer
> information.

I don't see any huge problems with writing such pass: just create
stack frame objects at fixed offsets inside your MF pass - and you'll
done. The only problem is that you need to do this early - before
prologue / epilogue inserter code runs, since afterwards stack frame
layout is almostly finalized (at "high level") and you'd deal with
much low-level and target-specific stuff.

I believe you can even use on of prologue-epilogue inserter hooks in
order to do this...

--
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Nicolas Geoffray

unread,
Dec 29, 2008, 3:59:33 AM12/29/08
to LLVM Developers Mailing List
Hi Anton,

Anton Korobeynikov wrote:
> I don't see any huge problems with writing such pass: just create
> stack frame objects at fixed offsets inside your MF pass - and you'll
> done. The only problem is that you need to do this early - before
> prologue / epilogue inserter code runs, since afterwards stack frame
> layout is almostly finalized (at "high level") and you'd deal with
> much low-level and target-specific stuff.
>
>

OK.

> I believe you can even use on of prologue-epilogue inserter hooks in
> order to do this...
>

Could you point me where those hooks are in the llvm code? I didn't find
any.

Thanks!
Nicolas

Anton Korobeynikov

unread,
Dec 29, 2008, 6:39:38 AM12/29/08
to LLVM Developers Mailing List
Hi, Nicolas

> Could you point me where those hooks are in the llvm code? I didn't find
> any.

Look into PrologEpilogInserter.cpp::PEI::runOnMachineFunction(). There
are calls to hooks inside TargetRegisterInfo:
TargetRegisterInfo::processFunctionBeforeCalleeSavedScan() and
TargetRegisterInfo::processFunctionBeforeFrameFinalized().

Maybe they are not so convenient when working via JIT but at least
you'll know the place, where all stack-related stuff is being cooked
:)

--
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Nicolas Geoffray

unread,
Dec 29, 2008, 7:27:16 AM12/29/08
to LLVM Developers Mailing List
Anton Korobeynikov wrote:
> Hi, Nicolas
>
>
>> Could you point me where those hooks are in the llvm code? I didn't find
>> any.
>>
> Look into PrologEpilogInserter.cpp::PEI::runOnMachineFunction(). There
> are calls to hooks inside TargetRegisterInfo:
> TargetRegisterInfo::processFunctionBeforeCalleeSavedScan() and
> TargetRegisterInfo::processFunctionBeforeFrameFinalized().
>
>

Doesn't that involve modifying the target? eg modifying
X86RegisterInfo.cpp to allocate the stack entry?

Bill Wendling

unread,
Dec 29, 2008, 6:15:07 PM12/29/08
to LLVM Developers Mailing List

This might help. See how "stack protectors" is implemented here:

lib/CodeGen/StackProtector.cpp

It places a special value at a specific place on the stack. You can
use the same trick to put your own information on a set stack
position. There's more to the code than just that .cpp file. It's done
with intrinsics. You'll also need to check out the
PrologEpilogInserter.cpp code.

-bw

Nicolas Geoffray

unread,
Dec 29, 2008, 6:32:49 PM12/29/08
to LLVM Developers Mailing List
Hi Bill,

Bill Wendling wrote:
>
> This might help. See how "stack protectors" is implemented here:
>
> lib/CodeGen/StackProtector.cpp
>
> It places a special value at a specific place on the stack. You can
> use the same trick to put your own information on a set stack
> position. There's more to the code than just that .cpp file. It's done
> with intrinsics. You'll also need to check out the
> PrologEpilogInserter.cpp code.
>
>

Thanks. I've already looked at what stack protector does, and indeed the
need is similar. However, I'd like to add the functionality as a new
machine pass, so that I don't need to add new intrinsics and modify the
llvm code base. Do you think that's possible?

Nicolas

Bill Wendling

unread,
Dec 29, 2008, 8:36:47 PM12/29/08
to LLVM Developers Mailing List
Hi Nicolas,

>> This might help. See how "stack protectors" is implemented here:
>>
>> lib/CodeGen/StackProtector.cpp
>>
>> It places a special value at a specific place on the stack. You can
>> use the same trick to put your own information on a set stack
>> position. There's more to the code than just that .cpp file. It's
>> done
>> with intrinsics. You'll also need to check out the
>> PrologEpilogInserter.cpp code.
>>
>>
> Thanks. I've already looked at what stack protector does, and indeed
> the
> need is similar. However, I'd like to add the functionality as a new
> machine pass, so that I don't need to add new intrinsics and modify
> the
> llvm code base. Do you think that's possible?
>

I suppose that it's possible, though not as elegant. :-) What type of
information are you storing? Is it stuff that is known before we
convert the LLVM IR into its DAG format for the back-end? If so, then
it's probably cleaner to go the way of the stack protector. It's
easier to send this information to the back-end so that it knows how
to set up the prologue and epilogue. I'm not too familiar with the
JIT, so I don't know if making this a machine function pass will work.

In all, modifying LLVM to create a new intrinsic (a la stack
protectors) isn't all that much work. ;-) And modifying LLVM IR is
*much* easier than modifying its DAG format.

Nicolas Geoffray

unread,
Dec 30, 2008, 5:25:00 AM12/30/08
to LLVM Developers Mailing List
Hi Bill,

Bill Wendling wrote:
>
> I suppose that it's possible, though not as elegant. :-) What type of
> information are you storing?

I want to store a methodID, so that when walking the stack I know which
function the frame belongs to.

> Is it stuff that is known before we
> convert the LLVM IR into its DAG format for the back-end?

Yes. If I didn't need a fixed offset, I'd do:

ptr = alloca(MethodType)
store MethodId, ptr

> If so, then
> it's probably cleaner to go the way of the stack protector. It's
> easier to send this information to the back-end so that it knows how
> to set up the prologue and epilogue. I'm not too familiar with the
> JIT, so I don't know if making this a machine function pass will work.
>
> In all, modifying LLVM to create a new intrinsic (a la stack
> protectors) isn't all that much work. ;-) And modifying LLVM IR is
> *much* easier than modifying its DAG format.
>

Yeah, the intrinsic way is the easiest. It just looked nice to write
that as a new pass and not modify the LLVM codebase.

Nicolas

Reply all
Reply to author
Forward
0 new messages