[llvm-dev] RFC: Dynamically Allocated "Callee Saved Registers" Lists

72 views
Skip to first unread message

Ben Simhon, Oren via llvm-dev

unread,
Jan 9, 2017, 4:09:46 AM1/9/17
to llvm...@lists.llvm.org

Dynamically Allocated “Callee Saved Registers” Lists

 

Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.

Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.

For example consider the following function:

void __regcall func(int a, int b, int c, int d, int e);

According to RegCall CC, parameters d and e should reside in registers EDI and ESI. The problem is that these registers also appear in the CSR list of RegCall calling convention. So, since the registers were used to pass arguments the callee doesn’t have to preserve their values.

The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.

 

There are two types of static CSR lists:

1.      register mask array of the CSRs (including register aliases)

2.      register CSR list

 

The proposed solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.

 

Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).

 

The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.

 

Open Issue

 

Machine Instructions (MI) have intermediate representation that can be printed and later on parsed to recreate the MIs.

MI printer and parser expect the Register Mask array pointer to point to a predefined (static) list of RegMasks. Those lists are retrieved from auto generated file x86GenRegisterInfo.inc using the functions: getRegMasks() and getRegMaskNames().

However, since we create a dynamically allocated register mask, its pointer will not reside in the static lists and no corresponding name could be found.

In that case, the MIPrinter will fail to emit the RegMask Name.

 

I would appreciate the community opinion regarding my solution and regarding possible solutions to the open issue.

 

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Bruce Hoult via llvm-dev

unread,
Jan 9, 2017, 4:47:15 AM1/9/17
to Ben Simhon, Oren, llvm...@lists.llvm.org
Weird calling convention, but I see that's as documented at e.g. 


Silly question:

Is there any calling convention, on any supported platform, that requires the callee to preserve a register that was used to pass an argument?

Of course someone *could* define such a CC, but if there isn't currently one, then we don't need to support that possibility *yet*.

If no one has done that then no need to add a whole new special case just for IA32 __regcall. We could just for every function do something like (assuming a bitmap representation):

fnCalleeSavedRegs = archCSR & ~fnArgs
fnScratchRegs = ~fnCalleeSavedRegs & ~archSpecialRegs; // if not included in CSR

This is simple enough to be calculated whenever it's needed, not stored.


_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


Joerg Sonnenberger via llvm-dev

unread,
Jan 9, 2017, 6:45:23 AM1/9/17
to llvm...@lists.llvm.org
On Mon, Jan 09, 2017 at 12:47:07PM +0300, Bruce Hoult via llvm-dev wrote:
> Is there any calling convention, on any supported platform, that requires
> the callee to preserve a register that was used to pass an argument?

The "this" parameter for constructors and possible other things is
handled like that on some platforms. ARM EABI I think?

Joerg

David Chisnall via llvm-dev

unread,
Jan 9, 2017, 6:58:16 AM1/9/17
to Joerg Sonnenberger, llvm...@lists.llvm.org
On 9 Jan 2017, at 11:45, Joerg Sonnenberger via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On Mon, Jan 09, 2017 at 12:47:07PM +0300, Bruce Hoult via llvm-dev wrote:
>> Is there any calling convention, on any supported platform, that requires
>> the callee to preserve a register that was used to pass an argument?
>
> The "this" parameter for constructors and possible other things is
> handled like that on some platforms. ARM EABI I think?

Do you model that as a callee-save register, or as a register that is both an argument and return register and is guaranteed to return the argument?

David

Joerg Sonnenberger via llvm-dev

unread,
Jan 9, 2017, 7:31:25 AM1/9/17
to llvm...@lists.llvm.org
On Mon, Jan 09, 2017 at 11:58:07AM +0000, David Chisnall wrote:
> On 9 Jan 2017, at 11:45, Joerg Sonnenberger via llvm-dev <llvm...@lists.llvm.org> wrote:
> >
> > On Mon, Jan 09, 2017 at 12:47:07PM +0300, Bruce Hoult via llvm-dev wrote:
> >> Is there any calling convention, on any supported platform, that requires
> >> the callee to preserve a register that was used to pass an argument?
> >
> > The "this" parameter for constructors and possible other things is
> > handled like that on some platforms. ARM EABI I think?
>
> Do you model that as a callee-save register, or as a register that is
> both an argument and return register and is guaranteed to return the
> argument?

I think it is currently modeled as return register with guaranteed
value.

Joerg

Mehdi Amini via llvm-dev

unread,
Jan 9, 2017, 11:38:16 AM1/9/17
to Ben Simhon, Oren, llvm...@lists.llvm.org
On Jan 9, 2017, at 1:09 AM, Ben Simhon, Oren via llvm-dev <llvm...@lists.llvm.org> wrote:

Dynamically Allocated “Callee Saved Registers” Lists

 

Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.

Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.

For example consider the following function:

void __regcall func(int a, int b, int c, int d, int e);

According to RegCall CC, parameters d and e should reside in registers EDI and ESI. The problem is that these registers also appear in the CSR list of RegCall calling convention. So, since the registers were used to pass arguments the callee doesn’t have to preserve their values.

The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.

 

There are two types of static CSR lists: 

1.      register mask array of the CSRs (including register aliases)
2.      register CSR list
 
The proposed solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
 
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
 
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.

Have you looked at how IPRA is implemented? It needs to dynamically allocate the register mask as well, so unless there is something fundamentally different I’m missing, we should share the mechanism.

— 
Mehdi



 

Open Issue

 

Machine Instructions (MI) have intermediate representation that can be printed and later on parsed to recreate the MIs.

MI printer and parser expect the Register Mask array pointer to point to a predefined (static) list of RegMasks. Those lists are retrieved from auto generated file x86GenRegisterInfo.inc using the functions: getRegMasks() and getRegMaskNames().

However, since we create a dynamically allocated register mask, its pointer will not reside in the static lists and no corresponding name could be found.

In that case, the MIPrinter will fail to emit the RegMask Name.

 

I would appreciate the community opinion regarding my solution and regarding possible solutions to the open issue.

 

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________

Ben Simhon, Oren via llvm-dev

unread,
Jan 9, 2017, 1:42:40 PM1/9/17
to Bruce Hoult, llvm...@lists.llvm.org

Thanks Bruce for your comments.

 

Unfortunately there are calling conventions that require a callee to preserve a register that was used for argument passing.

For example: hhvm (passes and preserves R12), swift (passes and preserves R12 & R13).

 

Apparently there are other calling conventions that can use the proposed solution.

For example: Interrupt handler CC that needs to preserve all registers (instead of passed/returned arguments).

 

Regarding:

fnCalleeSavedRegs = archCSR & ~fnArgs

fnScratchRegs = ~fnCalleeSavedRegs & ~archSpecialRegs; // if not included in CSR

 

Your solution is the approach I started with while updating around 50 places were “fnCalleeSavedRegs” is calculated, but I encountered several issues:

1.      Creating “fnArgs” every time is expensive because it requires calling the CC functions every time.

2.      So “fnArgs” is bitmap that needs to be created and stored somewhere.

3.      Not in every place that we use “archCSR” we have (or can calculate) fnArgs because in some cases we don’t have the Machine Function handler.

 

Basically, my proposed solution is: Instead of calculating “fnArgs” and “fnCalleeSavesRegs” every time – Calculate “fnCalleeSavesRegs” once and save it.

 

Thanks again,

Oren

Ben Simhon, Oren via llvm-dev

unread,
Jan 11, 2017, 10:06:50 AM1/11/17
to mehdi...@apple.com, llvm...@lists.llvm.org

Hi Mehdi,

 

I wasn’t familiar with IPRA before, thank you for bringing it up.

After studying it, I have to say that IPRA is a wonderful idea and is well implemented.

 

I tried to reuse the mechanism for the last couple of days.

I implemented a solution using IPRA mechanism and encountered few issues:

1.      IPRA uses immutable analysis pass called “PhysicalRegisterUsageInfo”. The usage of such passes should be optimization only. In my case, this is a functional issue. The analysis must run in order to be compatible with other compilers.

2.      IPRA passes are not enabled by default and when they are enabled many tests fail due to various reasons (mainly because the CallGraph bottom up approach).

3.      The manipulated RegMasks generated using RegUsageInfoCollector are very different than the manipulated RegMasks that I need. It will be an abuse to change the current pass implementation. So new pass is required.

4.      When dumping the MIR after running IPRA passes, assertion is raised because the RegMask name is unknown (same issue that I face with my solution – see “open Issue”).

I might reuse the analysis pass of IPRA instead of saving the RegMask inside the Machine Function but as mentioned by #1 it is not recommended.

Thus I believe that using my current suggestion is more suitable for the issue I am resolving.

 

Best Regards,

Oren

 

Sent: Monday, January 09, 2017 18:38
To: Ben Simhon, Oren <oren.be...@intel.com>
Cc: llvm...@lists.llvm.org

Mehdi Amini via llvm-dev

unread,
Jan 11, 2017, 1:21:46 PM1/11/17
to Ben Simhon, Oren, llvm...@lists.llvm.org
On Jan 11, 2017, at 7:06 AM, Ben Simhon, Oren <oren.be...@intel.com> wrote:

Hi Mehdi,
 
I wasn’t familiar with IPRA before, thank you for bringing it up.
After studying it, I have to say that IPRA is a wonderful idea and is well implemented.
 
I tried to reuse the mechanism for the last couple of days.
I implemented a solution using IPRA mechanism and encountered few issues:

I didn’t know if the implementation would be suited perfectly for your need, but the similarity makes me thing that it is not desirable to have two components doing “almost the same thing” there.

1.      IPRA uses immutable analysis pass called “PhysicalRegisterUsageInfo”. The usage of such passes should be optimization only.

Can you elaborate what makes you say so? Do we a rule that analysis can’t be used to store `correctness` informations?


In my case, this is a functional issue. The analysis must run in order to be compatible with other compilers.

2.      IPRA passes are not enabled by default and when they are enabled many tests fail due to various reasons (mainly because the CallGraph bottom up approach).

3.      The manipulated RegMasks generated using RegUsageInfoCollector are very different than the manipulated RegMasks that I need. It will be an abuse to change the current pass implementation. So new pass is required.

4.      When dumping the MIR after running IPRA passes, assertion is raised because the RegMask name is unknown (same issue that I face with my solution – see “open Issue”).

I might reuse the analysis pass of IPRA instead of saving the RegMask inside the Machine Function but as mentioned by #1 it is not recommended.

At this point, I didn’t grasp in what you’re raising above what justify not sharing the infrastructure between IPRA and what you need (again I’m not saying it is ready to do exactly what you need, but I’m against duplicating similar mechanism instead of refactoring).

— 
Mehdi

Ben Simhon, Oren via llvm-dev

unread,
Jan 12, 2017, 2:27:43 AM1/12/17
to mehdi...@apple.com, llvm...@lists.llvm.org

Hi Mehdi,

 

It is true that both IPRA and the proposed mechanism save RegMasks.

So you might say that the data structure in the immutable pass should be reused, but this is the only similarity.

 

Even this similarity is not exactly true.

I save register masks that doesn’t use passed/returned arguments while IPRA saves register masks for modified registers.

So how can they share the same mechanism?

 

Regarding immutable pass, I am not familiar with immutable passes that hold correctness information (can you share an example?).

 

Thanks,

Mehdi Amini via llvm-dev

unread,
Jan 12, 2017, 2:51:40 AM1/12/17
to Ben Simhon, Oren, llvm...@lists.llvm.org
Hi Ben,


On Jan 11, 2017, at 11:26 PM, Ben Simhon, Oren <oren.be...@intel.com> wrote:

Hi Mehdi,
 
It is true that both IPRA and the proposed mechanism save RegMasks.
So you might say that the data structure in the immutable pass should be reused,

This is not exactly what I’m saying, let me clarify: I’m saying if the two high-level features needs the same underlying feature (dynamic regmask), then the underlying feature should be shared (unless there is a good reason, which I very well may have missed). Whether the current way of doing it is flexible enough or the most convenient for what you want does not invalidate my point about sharing it, it just has to be adapted / transformed in 


but this is the only similarity.
 
Even this similarity is not exactly true.
I save register masks that doesn’t use passed/returned arguments while IPRA saves register masks for modified registers.
So how can they share the same mechanism?

I’m not sure I understand what you refer to by "save register masks that doesn’t use passed/returned arguments", but keep in mind that I don’t play frequently with the MI level. 
I just skimmed through you original RFC, and it mentions "The proposed solution is to dynamically allocate the CSR lists”, which I thought is exactly what’s done in IPRA.

Back to my point: outside of implementation details: my high-level impression is that it seems we need a "dynamic calling convention” mechanism in both case (IPRA and __regcall CC), and I’d approach this layer/feature independently as such (not just as the minimum ad-hoc structure to support one particular CC).

Best,

Mehdi

Ben Simhon, Oren via llvm-dev

unread,
Jan 12, 2017, 4:04:00 AM1/12/17
to mehdi...@apple.com, llvm...@lists.llvm.org

Hi Mehdi,

 

I think that the subject of the RFC is misleading.

The true problem that we are trying to solve is to remove returned/passed arguments from the regmask (According to the calling convention).

IPRA and CC updated RegMask can’t use the same mechanism because they contradict each other.

 

I think that the following analog will help to explain why I think that a reuse is redundant:

Let’s assume two different functions (FuncA and FuncB) need to allocate an array of items of type X.

FuncA allocates XarrayA that contains all X items that are big.

FuncB allocates XarrayB that contains all X items that are square shaped.

Should both of them use the same array?!

 

I think that they shouldn’t. Same in our case.

It is true that both structures save register masks but each register mask represent a different type of register masks.

I don’t see how we can change the mechanism to make IPRA and my updated regmask mutual exclusive.

 

I hope it clarifies what I am trying to say.

Mehdi Amini via llvm-dev

unread,
Jan 12, 2017, 5:05:27 AM1/12/17
to Ben Simhon, Oren, llvm...@lists.llvm.org
On Jan 12, 2017, at 1:03 AM, Ben Simhon, Oren <oren.be...@intel.com> wrote:

Hi Mehdi,
 
I think that the subject of the RFC is misleading.
The true problem that we are trying to solve is to remove returned/passed arguments from the regmask (According to the calling convention).
IPRA and CC updated RegMask can’t use the same mechanism because they contradict each other.

Can you clarify in which way to they contradict each other? 
Do you have a patch I could look at? (Or could point at some piece of code in LLVM?)

 
I think that the following analog will help to explain why I think that a reuse is redundant:
Let’s assume two different functions (FuncA and FuncB) need to allocate an array of items of type X.
FuncA allocates XarrayA that contains all X items that are big.
FuncB allocates XarrayB that contains all X items that are square shaped.
Should both of them use the same array?!

They may not use the same array, but they’ll both use malloc() :-)

— 
Mehdi

Ben Simhon, Oren via llvm-dev

unread,
Jan 12, 2017, 6:38:59 AM1/12/17
to mehdi...@apple.com, llvm...@lists.llvm.org

Here in an example that explains the difference.

 

// Only declaration – No implementation

// Assume that the value is returned in EAX and the arguments are passed in EAX, ECX, EDX, ESI, EDI.

int __regcall callee (int a, int b, int c, int d, int e);

 

// implemented in a different module

void caller() {

  x = callee(1,2,3,4,5);

}

 

What will be RegMask using IPRA register usage collector?

Callee Saved Registers (from the static register mask) minus RAX.

What should really be the RegMask?

Callee Saved Registers (from the static register mask) minus RAX, ESI and EDI (and their sub registers).

 

Do you think that I should fix IPRA collector?

 

Even after fixing IPRA collector, I can’t run the collector nor the propogate (because many tests are failing due to the bottom up traversal).

So the only thing in common will be the data structure inside the immutable pass. Am I right?

 

You can see the phabricator review that I uploaded yesterday here:

https://reviews.llvm.org/D28566

Mehdi Amini via llvm-dev

unread,
Jan 12, 2017, 11:06:43 AM1/12/17
to Ben Simhon, Oren, llvm...@lists.llvm.org
On Jan 12, 2017, at 3:38 AM, Ben Simhon, Oren <oren.be...@intel.com> wrote:

Here in an example that explains the difference.
 
// Only declaration – No implementation
// Assume that the value is returned in EAX and the arguments are passed in EAX, ECX, EDX, ESI, EDI.
int __regcall callee (int a, int b, int c, int d, int e);
 
// implemented in a different module
void caller() {
  x = callee(1,2,3,4,5);
}
 
What will be RegMask using IPRA register usage collector?
Callee Saved Registers (from the static register mask) minus RAX.
What should really be the RegMask?
Callee Saved Registers (from the static register mask) minus RAX, ESI and EDI (and their sub registers).
 
Do you think that I should fix IPRA collector?

I’m not saying IPRA as the optimization/analysis is the way to solve your problem, so likely no. Even if we should make sure they’ll be compatible.



Even after fixing IPRA collector, I can’t run the collector nor the propogate (because many tests are failing due to the bottom up traversal).
So the only thing in common will be the data structure inside the immutable pass. Am I right?

The data-structure seems the same to me, but more importantly the mechanism by which the lowering attach a regmask to a call. If you need to have a dynamic regmask in MRI or wherever, can’t IPRA use it *instead* of the immutable analysis?

— 
Mehdi


Ben Simhon, Oren via llvm-dev

unread,
Jan 12, 2017, 11:33:37 AM1/12/17
to mehdi...@apple.com, llvm...@lists.llvm.org

I agree, I will make the change and upload it in another patch.

Mehdi Amini via llvm-dev

unread,
Jan 12, 2017, 11:36:15 AM1/12/17
to Ben Simhon, Oren, llvm...@lists.llvm.org
Just to be clear: I was/am not requiring you to implement the change for IPRA. Just to make sure it *can* be done (i.e. provide the right abstraction to store/use a dynamic regmask).

— 
Mehdi
Reply all
Reply to author
Forward
0 new messages