[LLVMdev] Named register variables GNU-style

Renato Golin

unread,

Mar 27, 2014, 9:55:47 AM3/27/14

to LLVM Dev, Clang Dev

Folks,

I just had a discussion about __builtin_stack_pointer in the GCC list,
and there were a number of arguments against it, and it got me
thinking I didn't have strong arguments against GNU's named register
extension. Does anyone remember the arguments for not implementing
that extension?

My view is that making it an intrinsic (say @llvm.register(name))
would have the exact same semantics as __builtin_<register_name> has,
in that it'll be carried down all the way to SelectionDAG and be just
a CopyFromReg from that reg's name.

The fact that they remain as intrinsics will guarantee that they will
last until SelectionDag and not commoned up or heavily modified. I'm
not sure how to make Clang do that, but it shouldn't be too hard to
short-circuit the asm handler if we're dealing with a
declaration/instantiation and "register" is a specifier of the type.

The arguments supporting the builtins is that, in case of stack
pointer, it's not target specific, thus avoiding ifdefs. The
counter-argument is that most usage of the named register extension is
already target specific (together with everything around it), so that
extra value is very limited. Also, since kernel and library code
(heavy users of named registers) will have to support old compilers,
this will *have* to be ifdefd anyway.

The arguments against builtins are that named register is more
generic, is already in use for more than the stack pointer and is
reasonably straightforward to both understand and implement.

Both builtins and named registers don't give you the guarantees that
you would like from such a high-level construct, and users are already
aware that this is the case, so we don't have to worry that much about
it.

Also, reading back some comments, it seems that the biggest concern
was that inline asm wasn't really a first-class citizen in the LLVM
back-end, but I think that has changes with M, right?

My questions are:

1. Were the initial concerns dealt with by the introduction of MC?

2. Is there any remaining argument against named registers that
stronger than the ones supporting it?

3. Is my draft for implementing named registers acceptable?

cheers,
--renato
_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Rafael Espíndola

unread,

Mar 27, 2014, 11:30:20 AM3/27/14

to Renato Golin, Clang Dev, LLVM Dev

On 27 March 2014 09:55, Renato Golin <renato...@linaro.org> wrote:
> Folks,
>
> I just had a discussion about __builtin_stack_pointer in the GCC list,
> and there were a number of arguments against it, and it got me
> thinking I didn't have strong arguments against GNU's named register
> extension. Does anyone remember the arguments for not implementing
> that extension?

In clang or llvm? I think we can implement it in clang by lowering
them with a similar trick that we do for local ones.

For local register variables, clang just keeps a note that it has to
add a constraint when it creates an inline assembly.

For global ones, it should also codegen every non inline asm to use an
llvm intrinsic (llvm.read_register/llvm.write_register for example).

This is not exactly the semantics gcc uses since the register would
still be allocatable, but should cover 99% of the uses, including
reading the stack pointer in the kernel.

I don't think we should implement this directly in LLVM, since it
introduces the really odd notion that reading of an value is
observable. For example, is it legal to move the read of rsp out of a
loop? By using an intrinsic at the llvm level we trivially represent
and preserve all the reads and writes from the source program.

Cheers,
Rafael

Renato Golin

unread,

Mar 27, 2014, 12:17:20 PM3/27/14

to Rafael Espíndola, Clang Dev, LLVM Dev

On 27 March 2014 15:30, Rafael Espíndola <rafael.e...@gmail.com> wrote:
> For global ones, it should also codegen every non inline asm to use an
> llvm intrinsic (llvm.read_register/llvm.write_register for example).

That's my idea, yes. I'm not sure how Clang would transform the named
registers into the intrinsic, but something along the lines of:

i8* @SP = "SP";

define void @step() nounwind {
entry:
%0 = call i32 @llvm.read_register(i8* @SP)
%1 = add i32 %0, i32 4
call void @llvm.write_register(i8* @SP, %1)
}

declare void @llvm.write_register(i8*, i32) nounwind readnone
declare i32 @llvm.read_register(i8*) nounwind readnone

> This is not exactly the semantics gcc uses since the register would
> still be allocatable, but should cover 99% of the uses, including
> reading the stack pointer in the kernel.

http://gcc.gnu.org/onlinedocs/gcc/Global-Reg-Vars.html

It seems that the semantics is to avoid PCS registers, or they will be
clobbered...

Nevertheless, we can reserve the register on demand, as we already do
with R9, for instance.

> For example, is it legal to move the read of rsp out of a
> loop?

No. It should be a volatile read/write.

> By using an intrinsic at the llvm level we trivially represent
> and preserve all the reads and writes from the source program.

Exactly!

cheers,
--renato

Rafael Espíndola

unread,

Mar 27, 2014, 12:30:46 PM3/27/14

to Renato Golin, Clang Dev, LLVM Dev

> That's my idea, yes. I'm not sure how Clang would transform the named
> registers into the intrinsic, but something along the lines of:
>
> i8* @SP = "SP";
>
> define void @step() nounwind {
> entry:
> %0 = call i32 @llvm.read_register(i8* @SP)
> %1 = add i32 %0, i32 4
> call void @llvm.write_register(i8* @SP, %1)
> }
>
> declare void @llvm.write_register(i8*, i32) nounwind readnone
> declare i32 @llvm.read_register(i8*) nounwind readnone

I would not produce any llvm global for it. So some insanity like

register long a asm("rsp");
long f(long x) {
long ret = a;
a = x;
return ret;
}

would compile to

define i64 @f(i64 %x) {
%ret = call i64 @llvm.read_register("rsp");
call void @llvm.write_register("rsp", i64 %x)
ret %ret
}
declare void @llvm.write_register(i8*, i64)
declare i64 @llvm.read_register(i8*)

>> This is not exactly the semantics gcc uses since the register would
>> still be allocatable, but should cover 99% of the uses, including
>> reading the stack pointer in the kernel.
>
> http://gcc.gnu.org/onlinedocs/gcc/Global-Reg-Vars.html
>
> It seems that the semantics is to avoid PCS registers, or they will be
> clobbered...

Yes, it is really odd. It says "Global register variables reserve
registers throughout the program.", which is obviously not the case
since not all compile units might see it.

>
>> For example, is it legal to move the read of rsp out of a
>> loop?
>
> No. It should be a volatile read/write.

Agreed. With the intrinsic the semantics are easy to represent.

Cheers,
Rafael

Hal Finkel

unread,

Mar 27, 2014, 12:52:54 PM3/27/14

to Renato Golin, Clang Dev, LLVM Dev

----- Original Message -----
> From: "Renato Golin" <renato...@linaro.org>
> To: "LLVM Dev" <llv...@cs.uiuc.edu>, "Clang Dev" <cfe...@cs.uiuc.edu>
> Sent: Thursday, March 27, 2014 8:55:47 AM
> Subject: [LLVMdev] Named register variables GNU-style
>
> Folks,
>
> I just had a discussion about __builtin_stack_pointer in the GCC
> list,
> and there were a number of arguments against it,

Can you summarize?

> and it got me
> thinking I didn't have strong arguments against GNU's named register
> extension. Does anyone remember the arguments for not implementing
> that extension?
>
> My view is that making it an intrinsic (say @llvm.register(name))
> would have the exact same semantics as __builtin_<register_name> has,
> in that it'll be carried down all the way to SelectionDAG and be just
> a CopyFromReg from that reg's name.

I think this also would be a nice feature to have, and fairly straightforward to implement.

That having been said, are there not cases where only the backend knows in what register the stack pointer is held? A sophisticated backend might even spill the stack pointer during some portions of the function to create a range in which it was allocatable, and I certainly would not want to preclude such an implementation.

-Hal

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Hal Finkel

unread,

Mar 27, 2014, 12:58:19 PM3/27/14

to Rafael Espíndola, Clang Dev, LLVM Dev

+1

-Hal

>
> >> This is not exactly the semantics gcc uses since the register
> >> would
> >> still be allocatable, but should cover 99% of the uses, including
> >> reading the stack pointer in the kernel.
> >
> > http://gcc.gnu.org/onlinedocs/gcc/Global-Reg-Vars.html
> >
> > It seems that the semantics is to avoid PCS registers, or they will
> > be
> > clobbered...
>
> Yes, it is really odd. It says "Global register variables reserve
> registers throughout the program.", which is obviously not the case
> since not all compile units might see it.
>
> >
> >> For example, is it legal to move the read of rsp out of a
> >> loop?
> >
> > No. It should be a volatile read/write.
>
> Agreed. With the intrinsic the semantics are easy to represent.
>
> Cheers,
> Rafael
> _______________________________________________

> cfe-dev mailing list
> cfe...@cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>

--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

_______________________________________________

Renato Golin

unread,

Mar 27, 2014, 1:37:26 PM3/27/14

to Rafael Espindola, clang-dev Developers, LLVM Developers Mailing List

On 27 Mar 2014 16:30, "Rafael Espíndola" <rafael.e...@gmail.com> wrote:
>
> > That's my idea, yes. I'm not sure how Clang would transform the named
> > registers into the intrinsic, but something along the lines of:
> >
> > i8* @SP = "SP";
> >
> > define void @step() nounwind {
> > entry:
> > %0 = call i32 @llvm.read_register(i8* @SP)
> > %1 = add i32 %0, i32 4
> > call void @llvm.write_register(i8* @SP, %1)
> > }
> >
> > declare void @llvm.write_register(i8*, i32) nounwind readnone
> > declare i32 @llvm.read_register(i8*) nounwind readnone
>
> I would not produce any llvm global for it. So some insanity like
>
> register long a asm("rsp");
> long f(long x) {
> long ret = a;
> a = x;
> return ret;
> }
>
> would compile to
>
> define i64 @f(i64 %x) {
> %ret = call i64 @llvm.read_register("rsp");
> call void @llvm.write_register("rsp", i64 %x)
> ret %ret
> }
> declare void @llvm.write_register(i8*, i64)
> declare i64 @llvm.read_register(i8*)

That was actually my first idea, but I got confused on the implementation. :-)

I'll try it again.

Cheers,
Renato

Tim Northover

unread,

Mar 27, 2014, 1:46:30 PM3/27/14

to Hal Finkel, LLVM Dev, Clang Dev

>> I would not produce any llvm global for it. So some insanity like

>> %ret = call i64 @llvm.read_register("rsp");
>

> +1

Aren't you going to need some kind of "private unnamed_addr" thing
just to make syntactically valid IR?

Tim.

Rafael Espíndola

unread,

Mar 27, 2014, 2:02:45 PM3/27/14

to Tim Northover, Clang Dev, LLVM Dev

> Aren't you going to need some kind of "private unnamed_addr" thing
> just to make syntactically valid IR?

I guess the best would probably be a metadata string, that way we are
sure we don't actually output anything.

Cheers,
Rafael

Krzysztof Parzyszek

unread,

Mar 27, 2014, 3:35:40 PM3/27/14

to llv...@cs.uiuc.edu

On 3/27/2014 8:55 AM, Renato Golin wrote:
>
> I just had a discussion about __builtin_stack_pointer in the GCC list,
> and there were a number of arguments against it, and it got me
> thinking I didn't have strong arguments against GNU's named register
> extension. Does anyone remember the arguments for not implementing
> that extension?

Is there any sane reason to actually implement it?

Are there any cases when inline asm would work well enough? We have the
__builtin_stack_pointer now, which is somewhat questionable[1], however
this should not serve a precedent to implement further "extensions" like
this. Argument against it..? "Why?"

-Krzysztof

[1] As someone has pointed out, the stack pointer builtin only makes
sense on targets that actually have a "stack pointer". Not all
architectures do, and even on those that do, the register could be used
for other purposes in some cases. I'm guessing that the main use
scenarios come from OS kernels or device drivers, but even there inline
asm would likely suffice.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Renato Golin

unread,

Mar 27, 2014, 4:15:45 PM3/27/14

to Krzysztof Parzyszek, LLVM Dev

On 27 March 2014 19:35, Krzysztof Parzyszek <kpar...@codeaurora.org> wrote:
> Is there any sane reason to actually implement it?

Bare metal code is never sane, but that's not an excuse not to write
it. C is a very complete language, especially modern variations like
C11, but there's still a lot that can't be done in C and for good
reasons. Kernel code and optimal libraries can stretch the compiler a
lot more than user code, and often enough, there's simply no way to
represent ideas in C. One of these examples is unwind code.

Then you ask...

> Are there any cases when inline asm would work well enough?

Well, assembly is very powerful and so, but it's also very hard to
understand what's going on. Inline assembly is an extension to the
language, and because of that, different compilers implement them in
different ways. Most of GCC's implementation is hidden in layers upon
layers of legacy code that not many people dare to touch, but that
compiles code that can't stop being compiled, nor easily migrated (for
technical and legal reasons).

The interaction between inline assembly and C is, therefore, not easy
to implement, and it's even harder to get it "right" (ie. similar to
the "other" compiler). The most you could write in C the better for
*compilers*, and minimising the exposure by using register variables
is actually not a bad idea.

But mostly, the __builtin_stack_frame is, in essence, a special case
of the generic pattern of named registers, and gives us the same level
of guarantees, so, in the end, there isn't much *technical* difference
in doing one and the other, with the added benefit that named
registers are already widely used and you won't need to add ifdefs for
new compilers.

> We have the
> __builtin_stack_pointer now, which is somewhat questionable[1], however this
> should not serve a precedent to implement further "extensions" like this.

No we don't. Not yet. ;)

I was about to commit when I thought I should get a point of view from
all sides, including (and especially) GCC.

The first issue here is that, if GCC doesn't "buy in", having a
__builtin_stack_pointer in Clang is as good as not having it.

The second issue is that, if we're going to implement an extension,
and there is already a technically equivalent solution in existence,
there's no reason to create a different one.

Lastly, and (much) more importantly, their technical arguments were
significantly better than mine.

> Argument against it..? "Why?"

Why? Because the less asm and more C code we have, the better. Because
inline asm semantics is a lot more obtuse than named registers.
Because builtins are just special cases of named registers. Because
target-independence is not possible in that kind of code.

Some arguments...

** With builtins, we don't need ifdefs, if all compilers implement it.
-- Yes, we do, for old compilers (GCC and Clang), and kernel/glibc
will need it for decades to come

** Builtins make the code target-intependent
-- No they don't, only the register part. The rest of the code
surrounding is highly target-specific. That's because you only need
that level of detail on code that is very target specific. That kind
of code is normally in separate files.

** Builtins remove the need for named registers
-- No they don't. glibc uses other registers for special purpose code.

** Builtins would work on all targets
-- No they won't. What do you do in targets that the compiler doesn't
implement the builtin because there is no equivalent of a stack
pointer?

Other arguments in favour of named registers:

-- Without this extension, it is e.g. hard to force some input or output
arguments of inline asm into specific machine registers, the only other way
is to use constraints, but most targets don't have constraints for every
specific register

-- The guarantees that a builtin would give us are very little more
than the ones by
named registers, and not enough value to justify the implementation of
yet-another
extension that other compilers won't implement.

You see, I'm not a big fan of extensions, nor I think we should
blindly follow GCC on whatever they do, but in this particular case,
we don't have a strong enough case to make. Even taking the legacy
argument off the table, the technical arguments comparing named
registers and specially crafted builtins are still in favour of named
registers, at least IMHO.

cheers,
--renato

Krzysztof Parzyszek

unread,

Mar 27, 2014, 5:20:23 PM3/27/14

to Renato Golin, LLVM Dev

On 3/27/2014 3:15 PM, Renato Golin wrote:
> On 27 March 2014 19:35, Krzysztof Parzyszek <kpar...@codeaurora.org> wrote:
>> Is there any sane reason to actually implement it?
>
> Bare metal code is never sane, but that's not an excuse not to write
> it. C is a very complete language, especially modern variations like
> C11, but there's still a lot that can't be done in C and for good
> reasons. Kernel code and optimal libraries can stretch the compiler a
> lot more than user code, and often enough, there's simply no way to
> represent ideas in C. One of these examples is unwind code.

That's all fine, but I'm not sure how this supports having named
register builtins. The problem is that once we implement a feature, it
may be impossible to get rid of it, should it turn out to be a flop. Do
we understand all intended and unintended consequences of implementing this?

> Then you ask...
>
>> Are there any cases when inline asm would work well enough?

Sorry, that was meant to be "would not work well enough".

> Well, assembly is very powerful and so, but it's also very hard to
> understand what's going on. Inline assembly is an extension to the
> language, and because of that, different compilers implement them in
> different ways.

True, but I know of only one other compiler that allows register
variables (and it's not even a C compiler). The portability argument is
not a very strong one here, at least as I understand it.

> Most of GCC's implementation is hidden in layers upon
> layers of legacy code that not many people dare to touch, but that
> compiles code that can't stop being compiled, nor easily migrated (for
> technical and legal reasons).
>
> The interaction between inline assembly and C is, therefore, not easy
> to implement, and it's even harder to get it "right" (ie. similar to
> the "other" compiler). The most you could write in C the better for
> *compilers*, and minimising the exposure by using register variables
> is actually not a bad idea.

I'm not sure if I'm following your argument. The code that uses inline
asm that cannot be easily migrated will likely not be rewritten to use
named registers. Specifically, for those reasons, we need to be
implementation-compatible with GCC when it comes to inline asm. So,
whether we like it or not, we have to have that part working.

Maybe I'm missing something, but in the previous comments, you mention
__builtin_<register_name>, and in the examples given by others, the
register name is given as a string. Also, there is some example where
the register name is associated with a variable via "asm". All these
options are not exactly equivalent, but they all come with some issues.

If a register name is given via a string, as in "register long a
asm("rsp")", who will check the type of "a"? On PowerPC, "fpr0" is a
floating point register. Would it be legal to have "uint64_t a
asm("fpr0")"? How about "float a asm("fpr0")"? What's funny here is
that fpr0 is 64-bit long and is incapable of holding a 32-bit IEEE
value. If you load a 32-bit fp value into it, it will be automatically
extended to 64 bits. With VSX things are different, and if I remember
correctly, it is now possible to have a single-precision values in some
set of registers. Do you want the front-end to deal with all this?
Actually, what's even funnier is that the official PPC assembler syntax
does not define "fpr0". It's just 0, and the meaning of it depends on
where it's placed. But I digress...

Another example: on Hexagon you can use pairs of registers, for example
r15:14. Some instructions can use 64-bit values given in even-odd pairs
like that. At the same time, you can use r14 and r15 separately, and
both of them will be aliased with the r15:14...

> But mostly, the __builtin_stack_frame is, in essence, a special case
> of the generic pattern of named registers, and gives us the same level
> of guarantees, so, in the end, there isn't much *technical* difference
> in doing one and the other, with the added benefit that named
> registers are already widely used and you won't need to add ifdefs for
> new compilers.

__builtin_frame_pointer specifies a register via the functionality, not
by name. In that sense, it is actually something more general than
named registers. While many architectures have something like "frame
pointer", not a lot of them have "rax".

>> Argument against it..? "Why?"
>
> Why? Because the less asm and more C code we have, the better.

Yes, but making it "more C" by keeping direct uses of registers and only
changing how that is accomplished, is, ahem, akin to porting code from C
to C++ by changing the file name from .c to .cpp.

> Because
> inline asm semantics is a lot more obtuse than named registers.

I'm not sure if that's true. The compiler still needs to conform to the
user's desire to have some some value in R3, even if everything except
R3 would be a better choice.

> Because builtins are just special cases of named registers. Because
> target-independence is not possible in that kind of code.

Ok, so by now I'm confused. What exactly are we discussing here:
1. unsigned a asm("rax") : make a be an alias for "rax",
2. a = __builtin_register("rax") : copy "rax" to a,
3. a = __builtin_rax() : copy "rax" to a?

Is it about which of the above is superior to others?

As you can see, I don't like any of those, but some of them I like even
less than others. :)

The option 3 makes it easier to type-check, since each target can define
its own set of builtins with proper types. On the other hand, what do
you expect to get? The value of "rax" at that particular place in the
code? How would you control what is being loaded into rax?

-Krzysztof

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Richard Smith

unread,

Mar 27, 2014, 5:52:27 PM3/27/14

to Rafael Espíndola, Clang Dev, LLVM Dev

On Thu, Mar 27, 2014 at 9:30 AM, Rafael Espíndola <rafael.e...@gmail.com> wrote:

> That's my idea, yes. I'm not sure how Clang would transform the named
> registers into the intrinsic, but something along the lines of:
>
> i8* @SP = "SP";
>
> define void @step() nounwind {
> entry:
> %0 = call i32 @llvm.read_register(i8* @SP)
> %1 = add i32 %0, i32 4
> call void @llvm.write_register(i8* @SP, %1)
> }
>
> declare void @llvm.write_register(i8*, i32) nounwind readnone
> declare i32 @llvm.read_register(i8*) nounwind readnone

I would not produce any llvm global for it. So some insanity like

register long a asm("rsp");
long f(long x) {
long ret = a;
a = x;
return ret;
}

would compile to

define i64 @f(i64 %x) {
%ret = call i64 @llvm.read_register("rsp");
call void @llvm.write_register("rsp", i64 %x)
ret %ret
}
declare void @llvm.write_register(i8*, i64)
declare i64 @llvm.read_register(i8*)

I don't think that works. Per the GCC documentation, a global register variable reserves the register entirely for use with that name in a translation unit. We don't seem to want exactly that model, but the approach you're suggesting doesn't seem to capture the semantics. For instance:

register long a asm("r12");

void f(long n) {

n *= 3;

a += n;

}

... could do the wrong thing if the multiplication happens to use r12.

Renato Golin

unread,

Mar 27, 2014, 6:32:21 PM3/27/14

to Richard Smith, LLVM Dev, Clang Dev

On 27 March 2014 21:52, Richard Smith <ric...@metafoo.co.uk> wrote:
> I don't think that works. Per the GCC documentation, a global register
> variable reserves the register entirely for use with that name in a
> translation unit. We don't seem to want exactly that model, but the approach
> you're suggesting doesn't seem to capture the semantics.

The IR in question has no mention that the register cannot be
reserved. In fact we do that already in the ARM back-end, reserving
the R9 for special purposes. We could very well reserve the register
in question to not be used. Some of us also mentioned that we should
reserve the register, and I think there's nothing stopping us from
reserving the registers on a compilation unit (module) level for
global named registers.

The problem here is that you can't reserve all registers. On ARM,
R0~R3, SP, LR and PC (and sometimes R9 or R11) cannot be fully
reserved, as they are part of the PCS/execution model, or are reserved
already. GCC docs state that "it's not safe" assuming those things
will by reserved, which is the same effect. The stack pointer can
still be safely used for reading, for example, as is the case of
unwinding.

cheers,
--renato

Renato Golin

unread,

Mar 27, 2014, 6:36:08 PM3/27/14

to Krzysztof Parzyszek, LLVM Dev

On 27 March 2014 21:20, Krzysztof Parzyszek <kpar...@codeaurora.org> wrote:
> That's all fine, but I'm not sure how this supports having named register
> builtins. The problem is that once we implement a feature, it may be
> impossible to get rid of it, should it turn out to be a flop. Do we
> understand all intended and unintended consequences of implementing this?

I think you're misunderstanding... I'm not proposing named register
builtins, just named registers, like GCC.

cheers,
--renato

Krzysztof Parzyszek

unread,

Mar 27, 2014, 7:30:32 PM3/27/14

to Renato Golin, LLVM Dev

On 3/27/2014 5:36 PM, Renato Golin wrote:
> On 27 March 2014 21:20, Krzysztof Parzyszek <kpar...@codeaurora.org> wrote:
>> That's all fine, but I'm not sure how this supports having named register
>> builtins. The problem is that once we implement a feature, it may be
>> impossible to get rid of it, should it turn out to be a flop. Do we
>> understand all intended and unintended consequences of implementing this?
>
> I think you're misunderstanding... I'm not proposing named register
> builtins, just named registers, like GCC.
>

Well, my main point is that as long as users can access individual
registers, the code won't be much closer to C, regardless what mechanism
they will use. Since we have to have working inline asm, I'm not sure
if having an extra set of builtins will really help a lot. By inserting
uses of physical registers in the code the user may impede register
allocation and scheduling. The user already has means to accomplish all
that and more.

-K

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

Rafael Espíndola

unread,

Mar 27, 2014, 10:06:42 PM3/27/14

to Richard Smith, Clang Dev, LLVM Dev

> I don't think that works. Per the GCC documentation, a global register
> variable reserves the register entirely for use with that name in a
> translation unit. We don't seem to want exactly that model, but the approach
> you're suggesting doesn't seem to capture the semantics. For instance:
>
> register long a asm("r12");
> void f(long n) {
> n *= 3;
> a += n;
> }
>
> ... could do the wrong thing if the multiplication happens to use r12.

Correct. The proposed solution would work for all non allocatable
registers. We should probably still err if someone tries to use an
allocatable one.

Renato Golin

unread,

Mar 28, 2014, 5:31:07 AM3/28/14

to Rafael Espíndola, Clang Dev, LLVM Dev

On 28 March 2014 02:06, Rafael Espíndola <rafael.e...@gmail.com> wrote:
> Correct. The proposed solution would work for all non allocatable
> registers. We should probably still err if someone tries to use an
> allocatable one.

AFAIK, GCC reserves the allocatable registers. If we're going to do
this we'd have to be as close as possible to the current behaviour to
avoid surprises.

--renato

Renato Golin

unread,

Mar 28, 2014, 6:05:42 AM3/28/14

to Rafael Espíndola, LLVM Dev, Clang Dev

On 27 March 2014 18:02, Rafael Espíndola <rafael.e...@gmail.com> wrote:
> I guess the best would probably be a metadata string, that way we are
> sure we don't actually output anything.

Hi Rafael,

I don't get the risk here, why would that risk any output?

I can see that the @llvm.annotation intrinsics use metadata for text,
but they discard them on use. It may be simpler to get an MDNode and
convert it into text, but if the intrinsic is *always*
converted-or-fail, I don't understand how it could output any unwanted
values.

cheers,
--renato

Chandler Carruth

unread,

Mar 28, 2014, 6:17:50 AM3/28/14

to Renato Golin, LLVM Dev, Clang Dev

On Fri, Mar 28, 2014 at 2:31 AM, Renato Golin <renato...@linaro.org> wrote:

On 28 March 2014 02:06, Rafael Espíndola <rafael.e...@gmail.com> wrote:
> Correct. The proposed solution would work for all non allocatable
> registers. We should probably still err if someone tries to use an
> allocatable one.

AFAIK, GCC reserves the allocatable registers. If we're going to do
this we'd have to be as close as possible to the current behaviour to
avoid surprises.

This has been the long standing historical objection to the feature. It is a *really* invasive change to the register allocator to plumb this kind of register reservation through it. Worse, the semantics for it being inherently translation-unit based become deeply confusing in LLVM due to the potential for (partial) LTO.

Renato Golin

unread,

Mar 28, 2014, 6:39:08 AM3/28/14

to Chandler Carruth, LLVM Dev, Clang Dev

On 28 March 2014 10:17, Chandler Carruth <chan...@google.com> wrote:
> This has been the long standing historical objection to the feature. It is a
> *really* invasive change to the register allocator to plumb this kind of
> register reservation through it.

Do you mean only the reserved part or the general named register idea?

About reserving registers, we already have the ability to reserve
certain registers (R9 on ARM, for instance), so it should not be too
hard to reserve arbitrary registers.

As far as I could find looking backwards, there were many more
impediments back then. The inability to represent this in IR
correctly, the idea that it should be part of the language (not an
intrinsic) thus breaking optimisation rules, the lack of inline asm
knowledge in the back-end (fixed by MC and IAS), and the lack of
register reservation mechanisms (which we have today).

So, while I appreciate there is vast historical reasons to not support
it, otherwise Mark and Behan would not have gone that far convincing
us of the __builtin_stack_pointer in the first place, I believe that,
not only the harder technical problems have being fixed by now, but
the arguments towards builtins are less technically valid than the
ones supporting named registers.

> Worse, the semantics for it being
> inherently translation-unit based become deeply confusing in LLVM due to the
> potential for (partial) LTO.

I don't know LTO well enough to answer that, maybe Rafael can chime in.

But ultimately, this particular feature is draconian in itself, and
the GCC docs describe lots of cases where it's not safe to use it.
This, and the clear_cache builtins, are particularly dangerous but
necessary extensions to implementing low level bare-metal code while
keeping the code in a manageable level (C-land).

We can only guarantee behaviour that is described in the standards we
implement. Anything else may have value but still be unsafe in certain
conditions, and in that category we have undefined behaviour,
implementation defined behaviour and extensions. It's not because a
few things might break that we don't use it, and undefined behaviour
is a particularly dangerous field that we all thread happily on a day
to day basis. Extensions shouldn't be different.

cheers,

Chandler Carruth

unread,

Mar 28, 2014, 7:16:49 AM3/28/14

to Renato Golin, LLVM Dev, Clang Dev

On Fri, Mar 28, 2014 at 3:39 AM, Renato Golin <renato...@linaro.org> wrote:

On 28 March 2014 10:17, Chandler Carruth <chan...@google.com> wrote:
> This has been the long standing historical objection to the feature. It is a
> *really* invasive change to the register allocator to plumb this kind of
> register reservation through it.

Do you mean only the reserved part or the general named register idea?

Just the reserved part.

About reserving registers, we already have the ability to reserve
certain registers (R9 on ARM, for instance), so it should not be too
hard to reserve arbitrary registers.

I don't understand how taking registers out of the allocation set within the innards of the target definition itself is really comparable to making some functions register allocate with one set of registers and other functions register allocate with a different set of registers. I don't think this example really means anything.

As far as I could find looking backwards, there were many more
impediments back then. The inability to represent this in IR
correctly, the idea that it should be part of the language (not an
intrinsic) thus breaking optimisation rules, the lack of inline asm
knowledge in the back-end (fixed by MC and IAS), and the lack of
register reservation mechanisms (which we have today).

I don't understand this paragraph. It seems wrong.

- We have never (and still don't have) a representation for this in the IR. But that's OK, the whole point has always been that such a representation would be invented.

- I don't recall anyone caring about "language" versus intrinsics.

- We have always had knowledge of the important part of inline asm: the constraints and clobbers. Neither MC nor IAS is relevant here.

- We don't have a generic register reservation mechanism today.

If you don't believe the last part, look at how many bugs and how much time it has taken for developers to try to support the frame pointer register usage on x86. That is actually the closest I know of to register reservation, and it has been an endless source of complexity and bugs despite not even needing to interact with the actual input code outside of inline assembly constraints and clobbers.

> Worse, the semantics for it being
> inherently translation-unit based become deeply confusing in LLVM due to the
> potential for (partial) LTO.

I don't know LTO well enough to answer that, maybe Rafael can chime in.

But ultimately, this particular feature is draconian in itself, and
the GCC docs describe lots of cases where it's not safe to use it.

And these seem like excellent reasons to not implement the dangerous feature and instead to provide a significantly safer feature and direct users either to not do the dangerous things, or if they are trying to do the safe thing, use the feature which was designed for it.

Also, to reiterate, this email is about reserving allocatable registers, not necessarily about every other way you might design global named registers, such as restricting them to unallocatable registers. I still think that is a silly way of representing things, but I don't have some deep concern over its complexity. I *do* have deep concerns over the complexity of making the register set essentially parameterized by user code. I think that is madness.

This, and the clear_cache builtins, are particularly dangerous but
necessary extensions to implementing low level bare-metal code while
keeping the code in a manageable level (C-land).

Global named register variables which reserve allocatable registers are not necessary for anything. Case in point, multiple operating systems today can be built entirely using a compiler which doesn't support them, down to and including their kernels. But I think your going overboard trying to sell the importance of the general area when the objections are regarding specific aspects of the implementation. No one is arguing that we shouldn't support the concrete known use cases.

Renato Golin

unread,

Mar 28, 2014, 7:50:28 AM3/28/14

to Chandler Carruth, LLVM Dev, Clang Dev

On 28 March 2014 11:16, Chandler Carruth <chan...@google.com> wrote:
> Just the reserved part.

Ok, in that case, I share you concerns.

We could easily only implement the reserved ones (stack pointer being
the case in hand). If there is any mad reason why allocatable ones
should be used (I heard glibc uses R8 for some special things, haven't
confirmed myself), we can discuss this topic later.

> I don't understand how taking registers out of the allocation set within the
> innards of the target definition itself is really comparable to making some
> functions register allocate with one set of registers and other functions
> register allocate with a different set of registers. I don't think this
> example really means anything.

The way the ARM back-end does is to treat R9 as a special register
from birth, like the frame pointer, and use it to calculate register
pressure and to allow it to be allocated or not.

I agree this is not the same as doing it with *any* register, but it
shouldn't be particularly hard to add isReserved(Reg), add it to the
Reserved list, and avoid it during allocation on a per-module (not
function) granularity. We'd have to do that on all targets, yes, it
won't be easy, but it should be doable and the register allocator
already respects a restricted list of reserved registers dynamically.

That's not to say that I'm willing to do it now. I agree we should
start with the half-sane implementation of already reserved registers.

> I don't understand this paragraph. It seems wrong.

It probably is... ;)

> - We have never (and still don't have) a representation for this in the IR.
> But that's OK, the whole point has always been that such a representation
> would be invented.
> - I don't recall anyone caring about "language" versus intrinsics.

So, from my defective memory, I remember people proposing to add a
`register` keyword for global variables with some metadata to identify
which register, and the loads and stores would have to behave
differently (similar to volatile, but with a specific register
attached). I don't remember when, or who, or where, it could have been
on an LLVM dev meeting. That was not a good idea.

Using intrinsics, we don't need to create the global variables at all,
and all reads and writes can be mapped to intrinsics, making this
change a lot simpler with zero changes to the IR except the creation
of two new intrinsics.

> - We have always had knowledge of the important part of inline asm: the
> constraints and clobbers. Neither MC nor IAS is relevant here.

What I meant here is that the MC layer allowed us to interpret inline
assembly more thoroughly and add checks (that we do now for textual
output as well, as you know).

One of the uses of named registers is to pinpoint inline asm variables
to specific registers (for instance to mark the stack pointer as
clobbered during an MRC call), and in the past, we couldn't guarantee
it because we didn't know what the inline asm had inside, as it was
just opaque text. Now we can warn users if they're using it wrong, or
if there's any danger of clobbering the wrong registers, because we
have that knowledge in the MC layer.

Makes sense?

> - We don't have a generic register reservation mechanism today.

Well, TargetRegisterInfo::getAllocatableSet() enquires the targets for
getReservedRegs() which could be set to take into account named
register globals. Since they're module globals, this could be done
once per compilation job, making it a lot simpler.

> If you don't believe the last part...

I didn't say it would be simple... ;)

And I agree with you that we should not do it "just because". There's
where the technical reasons for not implementing it trump the reasons
for having it as a feature.

> And these seem like excellent reasons to not implement the dangerous feature
> and instead to provide a significantly safer feature and direct users either
> to not do the dangerous things, or if they are trying to do the safe thing,
> use the feature which was designed for it.

The point here is that __builting_stack_pointer doesn't provide enough
additional value to be worth deviating from the norm.

True, you can only use the stack pointers and not allocatable
registers, and you don't have to name the register, so it's slightly
more target-independent, but those were the only reasons. Named
registers exist (and will exist) for ages, and people that use it know
of the constraints and they're reportedly not an issue.

We could add a warning if the user defines any allocatable register,
saying that the register will not be reserved...

Renato Golin

unread,

Mar 28, 2014, 8:11:43 AM3/28/14

to Rafael Espíndola, LLVM Dev, Clang Dev

On 28 March 2014 10:05, Renato Golin <renato...@linaro.org> wrote:
> On 27 March 2014 18:02, Rafael Espíndola <rafael.e...@gmail.com> wrote:
>> I guess the best would probably be a metadata string, that way we are
>> sure we don't actually output anything.
>
> Hi Rafael,
>
> I don't get the risk here, why would that risk any output?

Ignore me. That was meant for the global variable, not the string on
the intrinsic.

Richard Smith

unread,

Apr 2, 2014, 6:35:32 PM4/2/14

to Renato Golin, Clang Dev, LLVM Dev

On Fri, Mar 28, 2014 at 4:50 AM, Renato Golin <renato...@linaro.org> wrote:

On 28 March 2014 11:16, Chandler Carruth <chan...@google.com> wrote:
> Just the reserved part.

Ok, in that case, I share you concerns.

We could easily only implement the reserved ones (stack pointer being
the case in hand). If there is any mad reason why allocatable ones
should be used (I heard glibc uses R8 for some special things, haven't
confirmed myself), we can discuss this topic later.

I'm a bit confused by this. The GCC documentation makes it pretty clear that *only* the allocatable registers are suitable for use as global register variables (and that things like the stack pointer make no sense here). From the doc:

"Defining a global register variable in a certain register reserves that register entirely for this use, at least within the current compilation. The register is not allocated for any other purpose in the functions in the current compilation, and is not saved and restored by these functions. Stores into this register are never deleted even if they appear to be dead, but references may be deleted or moved or simplified."

Obviously it's not possible to reserve the stack pointer register entirely for use as a global register variable, and any attempt to do so would be abusing the mechanism.

Is there some other specification that you guys are working from, or are you basing this on observations of GCC's actual behavior?

Renato Golin

unread,

Apr 2, 2014, 6:54:19 PM4/2/14

to Richard Smith, Clang Dev, LLVM Dev

On 2 April 2014 23:35, Richard Smith <ric...@metafoo.co.uk> wrote:
> I'm a bit confused by this. The GCC documentation makes it pretty clear that
> *only* the allocatable registers are suitable for use as global register
> variables (and that things like the stack pointer make no sense here). From
> the doc:

That document is terribly confusing...

> Obviously it's not possible to reserve the stack pointer register entirely
> for use as a global register variable, and any attempt to do so would be
> abusing the mechanism.

Absolutely! We just can't do it.

> Is there some other specification that you guys are working from, or are you
> basing this on observations of GCC's actual behavior?

Observations and discussions in the GCC mailing list, unfortunately.
Most of the GNU extensions are either badly documented or completely
undocumented, and that's yet-another example of both!

About the implementation, I'd like to make clear that the
__builtin_stack_pointer can still exist as aliases to
read_register/write_register. We can even create a special name in the
IR "stack_pointer", so that each back end can do that and not force
the front-end to know what's the stack pointer on each platform. In a
way, I'm implementing both of them at the same time.

So, we'll start with the non-allocatable ones because that's what the
original idea (the builtin) implements, and because that's the problem
we have right now. After this is in, maybe even with an additional
builtin, we can think about the allocatable ones.

Richard Sandiford

unread,

Apr 7, 2014, 11:16:33 AM4/7/14

to Richard Smith, Clang Dev, LLVM Dev

It might not matter, but MIPS linux is one case where a non-allocatable
register is used as a true global variable, rather than as a convienent
way of getting the value of a fixed-purpose register. The register used
there is $gp, which on bare-metal is usually reserved for small-data and
(in theory) GOT accesses. Since linux knows it doesn't use either,
it instead uses $gp to hold information about the current thread.

I realise that's probably an unusual case though.

Thanks,
Richard

Renato Golin

unread,

Apr 9, 2014, 10:03:09 AM4/9/14

to Richard Smith, Renato Golin, Clang Dev, LLVM Dev, Richard Sandiford

On 7 April 2014 12:16, Richard Sandiford <rsan...@linux.vnet.ibm.com> wrote:
> It might not matter, but MIPS linux is one case where a non-allocatable
> register is used as a true global variable, rather than as a convienent
> way of getting the value of a fixed-purpose register. The register used
> there is $gp, which on bare-metal is usually reserved for small-data and
> (in theory) GOT accesses. Since linux knows it doesn't use either,
> it instead uses $gp to hold information about the current thread.

Hi Richard,

I think all cases for named registers are unusual, and that's the
point. I think this is a perfectly valid case for using named
registers, since there is no way in C to represent this logic and bare
metal code might have a heavy use of that register's value.

cheers,
--renato

Renato Golin

unread,

Apr 27, 2014, 4:41:28 PM4/27/14

to Edward Z. Yang, Clang Dev, LLVM Dev

On 19 April 2014 22:15, Edward Z. Yang <ezy...@mit.edu> wrote:
> Like the kernel, we use register variables to store thread-local state
> in our garbage collector [1]; on Clang, the lack of this capability
> costs us 30% in overall runtime of all programs. We can't use inline
> assembly, because that would require us rewriting our entire garbage
> collector (i.e. all code which uses the thread-local state) in inline
> assembly, but this is all code that should be written in C.

Hi Edward,

This is another good reason to have named registers implemented. Many
people replied to my original email with "write inline assembly
instead", but there are cases that doing so would make the project *a
lot* more complex.

> Perhaps this is simply an advertisement for Clang to optimize __thread
> variables in some way further; certainly this would solve our (and other
> folks) problem.

You can certainly propose an optimization that would be beneficial,
and if shown to be safe, you or someone else could implement it in
LLVM or Clang.

Reply all

Reply to author

Forward