[llvm-dev] RFC: Implementing the Swift calling convention in LLVM and Clang

553 views
Skip to first unread message

John McCall via llvm-dev

unread,
Mar 1, 2016, 8:14:48 PM3/1/16
to llvm-dev, Clang Dev, Joe Pamer
Hi, all.

Swift uses a non-standard calling convention on its supported platforms. Implementing this calling convention requires support from LLVM and (to a lesser degree) Clang. If necessary, we’re willing to keep that support in “private” branches of LLVM and Clang, but we feel it would be better to introduce it in trunk, both to (1) minimize the differences between our branches and trunk and (2) allow other language implementations to take advantage of that support.

We don’t expect this to be particularly controversial, at least in the abstract, since LLVM already includes support for a number of variant, language-specific calling conventions. Some of Swift's variations are more invasive than those existing conventions, however, so we want to make sure the community is on board before we start landing patches or sending them out for review.

Here’s a brief technical summary of the convention:

In general, the calling convention lowers onto an existing C calling convention; let’s call this the “intermediary convention”. The intermediary convention is not necessarily the target platform’s standard C convention; for example, we intend to use a VFP convention on iOS ARM targets. Aggregate arguments and results are translated to sequences of scalar types (possibly just an indirect argument/sret pointer) and, for the most part, passed and returned using the intermediary convention’s rules for a function with that signature. For example, if struct A expands to the sequence [i32,float,i32], a function type like (A,Int64) -> Bool) would be lowered basically like the C function type bool(*)(int32_t, float, int32_t, int64_t).

There are four general points of deviation from the intermediary convention:

- We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers. For example, we want to return a value of struct A, above, purely in registers. For the most part, I don’t think this is a problem to layer on to an existing IR convention: C frontends will generally use explicit sret arguments when the convention requires them, and so the Swift lowering will produce result types that don’t have legal interpretations as direct results under the C convention. But we can use a different IR convention if it’s necessary to disambiguate Swift’s desired treatment from the target's normal attempts to retroactively match the C convention.

- We sometimes have both direct results and indirect results. It would be nice to take advantage of the sret convention even in the presence of direct results on targets that do use a different (profitable) ABI treatment for it. I don’t know how well-supported this is in LLVM.

- We want a special “context” treatment for a certain argument. A pointer-sized value is passed in an integer register; the same value should be present in that register after the call. In some cases, the caller may pass a context argument to a function that doesn’t expect one, and this should not trigger undefined behavior. Both of these rules suggest that the context argument be passed in a register which is normally callee-save.

- We want a special “error” treatment for a certain argument/result. A pointer-sized value is passed in an integer register; a different value may be present in that register after the call. Much like the context treatment, the caller may use the error treatment with a function that doesn’t expect it; this should not trigger undefined behavior, and the existing value should be left in place. Like the context treatment, this suggests that the error value be passed and returned in a register which is normally callee-save.

Here’s a brief summary of the expected code impact for this.

The Clang impact is relatively minor; it is focused on allowing the Swift runtime to define functions that use the convention. It adds a new calling convention attribute, a few new parameter attributes constrained to that calling convention, and some relatively un-invasive call lowering code in IR generation.

The LLVM impact is somewhat larger.

Three things in the convention require a possible change to IR:

- Using sret together with a direct result may or may not “just work". I certainly don’t see a reason why it shouldn’t work in the middle-end. Obviously, some targets can’t support it, but we can avoid doing this on those targets.

- Opting in to the two argument treatments requires new parameter attributes. We discussed using separate calling conventions; unfortunately, error and context arguments can appear either separately or together, so we’d really need several new conventions for all the valid combinations. Furthermore, calling a context-free function with an ignored context argument could turn into a call to a function using a mismatched calling convention, which LLVM IR generally treats as undefined behavior. Also, it wasn’t obvious that just a calling convention would be sufficient for the error treatment; see the next bullet.

- The “error” treatment requires some way to (1) pass and receive the value in the caller and (2) receive and change the value in the callee. The best way we could think of to represent this was to pretend that the argument is actually passed indirectly; the value is “passed” by storing to the pointer and “received” by loading from it. To simplify backend lowering, we require the argument to be a special kind of swifterror alloca that can only be loaded, stored, and passed as a swifterror argument; in the callee, swifterror arguments have similar restrictions. This ends up being fairly invasive in the backend, unfortunately.

The convention also requires a few changes to the targets that support the convention, to deal with the context and error treatments and to return more values in registers.

Anyway, I would appreciate your thoughts.

John.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Tim Northover via llvm-dev

unread,
Mar 2, 2016, 12:38:20 AM3/2/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
It's probably worth noting that the likely diffs involved can be
inferred from the "upstream-with-swift" branch at
"g...@github.com:apple/swift-llvm.git", which is a fairly regularly
merged copy of trunk with the swift changes.

I'm steering well clear of the policy decision.

Tim.

Renato Golin via llvm-dev

unread,
Mar 2, 2016, 4:34:37 AM3/2/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
On 2 March 2016 at 01:14, John McCall via llvm-dev
<llvm...@lists.llvm.org> wrote:
> Hi, all.

> - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers. For example, we want to return a value of struct A, above, purely in registers. For the most part, I don’t think this is a problem to layer on to an existing IR convention: C frontends will generally use explicit sret arguments when the convention requires them, and so the Swift lowering will produce result types that don’t have legal interpretations as direct results under the C convention. But we can use a different IR convention if it’s necessary to disambiguate Swift’s desired treatment from the target's normal attempts to retroactively match the C convention.

Is this a back-end decision, or do you expect the front-end to tell
the back-end (via annotation) which parameters will be in regs? Unless
you also have back-end patches, I don't think the latter is going to
work well. For example, the ARM back-end has a huge section related to
passing structures in registers, which conforms to the ARM EABI, not
necessarily your Swift ABI.

Not to mention that this creates the versioning problem, where two
different LLVM releases can produce slightly different PCS register
usage (due to new features or bugs), and thus require re-compilation
of all libraries. This, however, is not a problem for your current
request, just a comment.


> - We sometimes have both direct results and indirect results. It would be nice to take advantage of the sret convention even in the presence of direct results on targets that do use a different (profitable) ABI treatment for it. I don’t know how well-supported this is in LLVM.

I'm not sure what you mean by direct or indirect results here. But if
this is a language feature, as long as the IR semantics is correct, I
don't see any problem.


> - We want a special “context” treatment for a certain argument. A pointer-sized value is passed in an integer register; the same value should be present in that register after the call. In some cases, the caller may pass a context argument to a function that doesn’t expect one, and this should not trigger undefined behavior. Both of these rules suggest that the context argument be passed in a register which is normally callee-save.

I think it's going to be harder to get all opts to behave in the way
you want them to. And may also require back-end changes to make sure
those registers are saved in the right frame, or reserved from
register allocation, or popped back after the call, etc.


> The Clang impact is relatively minor; it is focused on allowing the Swift runtime to define functions that use the convention. It adds a new calling convention attribute, a few new parameter attributes constrained to that calling convention, and some relatively un-invasive call lowering code in IR generation.

This sounds like a normal change to support language perks, no big
deal. But I'm not a Clang expert, nor I've seen the code.


> - Using sret together with a direct result may or may not “just work". I certainly don’t see a reason why it shouldn’t work in the middle-end. Obviously, some targets can’t support it, but we can avoid doing this on those targets.

All sret problems I've seen were back-end related (ABI conformance).
But I wasn't paying attention to the middle-end.


> - Opting in to the two argument treatments requires new parameter attributes. We discussed using separate calling conventions; unfortunately, error and context arguments can appear either separately or together, so we’d really need several new conventions for all the valid combinations. Furthermore, calling a context-free function with an ignored context argument could turn into a call to a function using a mismatched calling convention, which LLVM IR generally treats as undefined behavior. Also, it wasn’t obvious that just a calling convention would be sufficient for the error treatment; see the next bullet.

Why not treat context and error like C's default arguments? Or like
named arguments in Python?

Surely the front-end can easily re-order the arguments (according to
some ABI) and make sure every function that may be called with
context/error has it as the last arguments, and default them to null.
You can then later do an inter-procedural pass to clean it up for all
static functions that are never called with those arguments, etc.


> - The “error” treatment requires some way to (1) pass and receive the value in the caller and (2) receive and change the value in the callee. The best way we could think of to represent this was to pretend that the argument is actually passed indirectly; the value is “passed” by storing to the pointer and “received” by loading from it. To simplify backend lowering, we require the argument to be a special kind of swifterror alloca that can only be loaded, stored, and passed as a swifterror argument; in the callee, swifterror arguments have similar restrictions. This ends up being fairly invasive in the backend, unfortunately.

I think this logic is too high-level for the back-end to deal with.
This looks like a simple run of the mill pointer argument that can be
null (and is by default), but if it's not, the callee can change the
object pointed by but not the pointer itself, ie, "void foo(exception
* const Error = null)". I don't understand why you need this argument
to be of a special kind of SDNode.

cheers,
--renato

Bruce Hoult via llvm-dev

unread,
Mar 2, 2016, 6:41:16 AM3/2/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
I've done the "environment passed in a callee-save register" thing before, just by using the C compiler's ability to reserve a register and map a particular C global variable to it.

As you say, there is then no problem when you call code (let's call it "library code") which doesn't expect it. The library code just automatically saves and restores that register if it needs it.

However -- and probably you've thought of this -- there is a problem with callbacks from library code that doesn't know about the environment argument. The library might save the environment register, put something else there, and then call back to your code that expects the environment to be set up. Boom!

This callback might be a function that you passed explicitly as an argument, a function pointed to by a global hook, or a virtual function of an object you passed (derived from a base class that the library knows about).

Any such callbacks need to either 1) not use the environment register, or 2) set up the environment register from somewhere else before using it or calling other code that uses it, or 3) be a wrapper/thunk that sets up the environment register before calling the real function.

John McCall via llvm-dev

unread,
Mar 2, 2016, 12:11:17 PM3/2/16
to Bruce Hoult, llvm-dev, Joe Pamer, Clang Dev
On Mar 2, 2016, at 3:41 AM, Bruce Hoult <br...@hoult.org> wrote:

I've done the "environment passed in a callee-save register" thing before, just by using the C compiler's ability to reserve a register and map a particular C global variable to it.

As you say, there is then no problem when you call code (let's call it "library code") which doesn't expect it. The library code just automatically saves and restores that register if it needs it.

However -- and probably you've thought of this -- there is a problem with callbacks from library code that doesn't know about the environment argument. The library might save the environment register, put something else there, and then call back to your code that expects the environment to be set up. Boom!

Yes, that’s a well-known problem with trying to reserve a register for a ubiquitous environment.  That’s not what we’re doing here, though.  The error result is more like a special argument / result to the function, in as much it's only actually required that the value be in that register at the call boundary.

Swift uses a non-zero-cost exceptions scheme for its primary error handling; this result is used to indicate whether (and what) a function throws.  The basic idea is that the callee sets the register to either null, meaning it didn’t throw, or an error value, meaning it did; except actually it’s better for code size if the caller sets the register to null on entry.

The considerations on the choice of register are as follows:

1. We want to be able to freely convert an opaque function value that’s known not to throw to a function that can.  The idea here is that the caller initializes the register to null.  If the callee is dynamically potentially-throwing, it expects the register to be null on entry and may or may not set it to be some other value.  If the callee is not dynamically potentially-throwing, it leaves the register alone because it considers it to be callee-save.  So there’s a hard requirement that whatever register we choose be considered callee-save by the Swift convention.

2. Swift code frequently calls C code.  It’s good for performance and code size if a function doesn’t have to save and restore the error-result register just because it’s calling a C function.  So we really want it to be callee-save in the C convention, too.  This also means that we don’t have to worry about the dynamic linker messing us up.

3. We don’t want to penalize other Swift functions by claiming an argument/result register that they would otherwise use.  Of course, they wouldn’t normally use a callee-save register.

John.

John McCall via llvm-dev

unread,
Mar 2, 2016, 1:48:55 PM3/2/16
to Renato Golin, llvm-dev, Joe Pamer, Clang Dev
> On Mar 2, 2016, at 1:33 AM, Renato Golin <renato...@linaro.org> wrote:
>
> On 2 March 2016 at 01:14, John McCall via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>> Hi, all.
>> - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers. For example, we want to return a value of struct A, above, purely in registers. For the most part, I don’t think this is a problem to layer on to an existing IR convention: C frontends will generally use explicit sret arguments when the convention requires them, and so the Swift lowering will produce result types that don’t have legal interpretations as direct results under the C convention. But we can use a different IR convention if it’s necessary to disambiguate Swift’s desired treatment from the target's normal attempts to retroactively match the C convention.
>
> Is this a back-end decision, or do you expect the front-end to tell
> the back-end (via annotation) which parameters will be in regs? Unless
> you also have back-end patches, I don't think the latter is going to
> work well. For example, the ARM back-end has a huge section related to
> passing structures in registers, which conforms to the ARM EABI, not
> necessarily your Swift ABI.
>
> Not to mention that this creates the versioning problem, where two
> different LLVM releases can produce slightly different PCS register
> usage (due to new features or bugs), and thus require re-compilation
> of all libraries. This, however, is not a problem for your current
> request, just a comment.

The frontend will not tell the backend explicitly which parameters will be
in registers; it will just pass a bunch of independent scalar values, and
the backend will assign them to registers or the stack as appropriate.

Our intent is to completely bypass all of the passing-structures-in-registers
code in the backend by simply not exposing the backend to any parameters
of aggregate type. The frontend will turn a struct into (say) an i32, a float,
and an i8; if the first two get passed in registers and the last gets passed
on the stack, so be it.

The only difficulty with this plan is that, when we have multiple results, we
don’t have a choice but to return a struct type. To the extent that backends
try to infer that the function actually needs to be sret, instead of just trying
to find a way to return all the components of the struct type in appropriate
registers, that will be sub-optimal for us. If that’s a pervasive problem, then
we probably just need to introduce a swift calling convention in LLVM.

>> - We sometimes have both direct results and indirect results. It would be nice to take advantage of the sret convention even in the presence of direct results on targets that do use a different (profitable) ABI treatment for it. I don’t know how well-supported this is in LLVM.
>
> I'm not sure what you mean by direct or indirect results here. But if
> this is a language feature, as long as the IR semantics is correct, I
> don't see any problem.

A direct result is something that’s returned in registers. An indirect
result is something that’s returned by storing it in an implicit out-parameter.
I would like to be able to form calls like this:

%temp = alloca %my_big_struct_type
call i32 @my_swift_function(sret %my_big_struct_type* %temp)

This doesn’t normally happen today in LLVM IR because when C frontends
use an sret result, they set the direct IR result to void.

Like I said, I don’t think this is a serious problem, but I wanted to float the idea
before assuming that.

>> - We want a special “context” treatment for a certain argument. A pointer-sized value is passed in an integer register; the same value should be present in that register after the call. In some cases, the caller may pass a context argument to a function that doesn’t expect one, and this should not trigger undefined behavior. Both of these rules suggest that the context argument be passed in a register which is normally callee-save.
>
> I think it's going to be harder to get all opts to behave in the way
> you want them to. And may also require back-end changes to make sure
> those registers are saved in the right frame, or reserved from
> register allocation, or popped back after the call, etc.

I don’t expect the optimizer to be a problem, but I just realized that the main
reason is something I didn’t talk about in my first post. See below.

That this will require some support from the backend is a given.

>> The Clang impact is relatively minor; it is focused on allowing the Swift runtime to define functions that use the convention. It adds a new calling convention attribute, a few new parameter attributes constrained to that calling convention, and some relatively un-invasive call lowering code in IR generation.
>
> This sounds like a normal change to support language perks, no big
> deal. But I'm not a Clang expert, nor I've seen the code.
>
>
>> - Using sret together with a direct result may or may not “just work". I certainly don’t see a reason why it shouldn’t work in the middle-end. Obviously, some targets can’t support it, but we can avoid doing this on those targets.
>
> All sret problems I've seen were back-end related (ABI conformance).
> But I wasn't paying attention to the middle-end.
>
>
>> - Opting in to the two argument treatments requires new parameter attributes. We discussed using separate calling conventions; unfortunately, error and context arguments can appear either separately or together, so we’d really need several new conventions for all the valid combinations. Furthermore, calling a context-free function with an ignored context argument could turn into a call to a function using a mismatched calling convention, which LLVM IR generally treats as undefined behavior. Also, it wasn’t obvious that just a calling convention would be sufficient for the error treatment; see the next bullet.
>
> Why not treat context and error like C's default arguments? Or like
> named arguments in Python?

>
> Surely the front-end can easily re-order the arguments (according to
> some ABI) and make sure every function that may be called with
> context/error has it as the last arguments, and default them to null.
> You can then later do an inter-procedural pass to clean it up for all
> static functions that are never called with those arguments, etc.

Oh, sorry, I forgot to talk about that. Yes, the frontend already rearranges
these arguments to the end, which means the optimizer’s default behavior
of silently dropping extra call arguments ends up doing the right thing.

I’m reluctant to say that the convention always requires these arguments.
If we have to do that, we can, but I’d rather not; it would involve generating
a lot of unnecessary IR and would probably create unnecessary
code-generation differences, and I don’t think it would be sufficient for
error results anyway.

>> - The “error” treatment requires some way to (1) pass and receive the value in the caller and (2) receive and change the value in the callee. The best way we could think of to represent this was to pretend that the argument is actually passed indirectly; the value is “passed” by storing to the pointer and “received” by loading from it. To simplify backend lowering, we require the argument to be a special kind of swifterror alloca that can only be loaded, stored, and passed as a swifterror argument; in the callee, swifterror arguments have similar restrictions. This ends up being fairly invasive in the backend, unfortunately.
>
> I think this logic is too high-level for the back-end to deal with.
> This looks like a simple run of the mill pointer argument that can be
> null (and is by default), but if it's not, the callee can change the
> object pointed by but not the pointer itself, ie, "void foo(exception
> * const Error = null)". I don't understand why you need this argument
> to be of a special kind of SDNode.

We don’t want checking or setting the error result to actually involve memory
access.

An alternative to the pseudo-indirect-result approach would be to model
the result as an explicit result. That would really mess up the IR, though.
The ability to call a non-throwing function as a throwing function means
we’d have to provide this extra explicit result on every single function with
the Swift convention, because the optimizer is definitely not going to
gracefully handle result-type mismatches; so even a function as simple as
func foo() -> Int32
would have to be lowered into IR as
define { i32, i8* } @foo(i8*)

John.

John McCall via llvm-dev

unread,
Mar 2, 2016, 2:02:25 PM3/2/16
to John McCall, llvm-dev, Clang Dev, Joe Pamer

Also, just a quick question. I’m happy to continue to talk about the actual
design and implementation of LLVM IR on this point, and I’d be happy to
put out the actual patch we’re initially proposing. Obviously, all of this code
needs to go through the normal LLVM/Clang code review processes. But
before we continue with that, I just want to clarify one important point: assuming
that the actual implementation ends up satisfying your technical requirements,
do you have any objections to the general idea of supporting the Swift CC
in mainline LLVM?

Renato Golin via llvm-dev

unread,
Mar 2, 2016, 2:05:37 PM3/2/16
to John McCall, llvm-dev, Clang Dev, Joe Pamer
On 2 March 2016 at 19:01, John McCall <rjmc...@apple.com> wrote:
> Also, just a quick question. I’m happy to continue to talk about the actual
> design and implementation of LLVM IR on this point, and I’d be happy to
> put out the actual patch we’re initially proposing. Obviously, all of this code
> needs to go through the normal LLVM/Clang code review processes. But
> before we continue with that, I just want to clarify one important point: assuming
> that the actual implementation ends up satisfying your technical requirements,
> do you have any objections to the general idea of supporting the Swift CC
> in mainline LLVM?

I personally don't. I think we should treat Swift as any other
language that we support, and if we can't use existing mechanisms in
the back-end to lower Swift, then we need to expand the back-end to
support that.

That being said, if the Swift support starts to bit-rot (if, for
instance, Apple stops supporting it in the future), it will be harder
to clean up the back-end from its CC. But that, IMHO, is a very
far-fetched future and a small price to pay.

cheers,
--renato

John McCall via llvm-dev

unread,
Mar 2, 2016, 2:09:26 PM3/2/16
to Renato Golin, llvm-dev, Clang Dev, Joe Pamer
> On Mar 2, 2016, at 11:04 AM, Renato Golin <renato...@linaro.org> wrote:
> On 2 March 2016 at 19:01, John McCall <rjmc...@apple.com> wrote:
>> Also, just a quick question. I’m happy to continue to talk about the actual
>> design and implementation of LLVM IR on this point, and I’d be happy to
>> put out the actual patch we’re initially proposing. Obviously, all of this code
>> needs to go through the normal LLVM/Clang code review processes. But
>> before we continue with that, I just want to clarify one important point: assuming
>> that the actual implementation ends up satisfying your technical requirements,
>> do you have any objections to the general idea of supporting the Swift CC
>> in mainline LLVM?
>
> I personally don't. I think we should treat Swift as any other
> language that we support, and if we can't use existing mechanisms in
> the back-end to lower Swift, then we need to expand the back-end to
> support that.
>
> That being said, if the Swift support starts to bit-rot (if, for
> instance, Apple stops supporting it in the future), it will be harder
> to clean up the back-end from its CC. But that, IMHO, is a very
> far-fetched future and a small price to pay.

Okay, thank you. Back to technical discussion. :)

John.

David Chisnall via llvm-dev

unread,
Mar 2, 2016, 2:10:24 PM3/2/16
to Renato Golin, llvm-dev, Clang Dev
On 2 Mar 2016, at 19:04, Renato Golin via cfe-dev <cfe...@lists.llvm.org> wrote:
>
> I personally don't. I think we should treat Swift as any other
> language that we support, and if we can't use existing mechanisms in
> the back-end to lower Swift, then we need to expand the back-end to
> support that.
>
> That being said, if the Swift support starts to bit-rot (if, for
> instance, Apple stops supporting it in the future), it will be harder
> to clean up the back-end from its CC. But that, IMHO, is a very
> far-fetched future and a small price to pay.

The Swift calling model also seems to be quite generally useful. I can imagine that VMKit would have used it, if it had been available then.

My only concern is that the implicit contract between the front and back ends, with regard to calling convention, is already complex, already largely undocumented, and already difficult to infer even if you have the relevant platform ABI document in front of you. It is badly in need of some cleanup and I wonder if the desire to minimise diffs for Swift might provide Apple with some incentive to spend a little bit of engineering effort on it?

David

Renato Golin via llvm-dev

unread,
Mar 2, 2016, 2:11:51 PM3/2/16
to David Chisnall, llvm-dev, Clang Dev
On 2 March 2016 at 19:09, David Chisnall <David.C...@cl.cam.ac.uk> wrote:
> It is badly in need of some cleanup and I wonder if the desire to minimise diffs for Swift might provide Apple with some incentive to spend a little bit of engineering effort on it?

That's a very good point. We don't have that many chances of
refactoring largely forgotten and undocumented code.

--renato

Renato Golin via llvm-dev

unread,
Mar 2, 2016, 2:33:44 PM3/2/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
On 2 March 2016 at 18:48, John McCall <rjmc...@apple.com> wrote:
> The frontend will not tell the backend explicitly which parameters will be
> in registers; it will just pass a bunch of independent scalar values, and
> the backend will assign them to registers or the stack as appropriate.

I'm assuming you already have code in the back-end that does that in
the way you want, as you said earlier you may want to use variable
number of registers for PCS.


> Our intent is to completely bypass all of the passing-structures-in-registers
> code in the backend by simply not exposing the backend to any parameters
> of aggregate type. The frontend will turn a struct into (say) an i32, a float,
> and an i8; if the first two get passed in registers and the last gets passed
> on the stack, so be it.

How do you differentiate the @foo's below?

struct A { i32, float };
struct B { float, i32 };

define @foo (A, i32) -> @foo(i32, float, i32);

and

define @foo (i32, B) -> @foo(i32, float, i32);


> The only difficulty with this plan is that, when we have multiple results, we
> don’t have a choice but to return a struct type. To the extent that backends
> try to infer that the function actually needs to be sret, instead of just trying
> to find a way to return all the components of the struct type in appropriate
> registers, that will be sub-optimal for us. If that’s a pervasive problem, then
> we probably just need to introduce a swift calling convention in LLVM.

Oh, yeah, some back-ends will fiddle with struct return. Not all
languages have single-value-return restrictions, but I think that ship
has sailed already for IR.

That's another reason to try and pass all by pointer at the end of the
parameter list, instead of receive as an argument and return.


> A direct result is something that’s returned in registers. An indirect
> result is something that’s returned by storing it in an implicit out-parameter.

Oh, I see. In that case, any assumption on the variable would have to
be invalidated, maybe use global volatile variables, or special
built-ins, so that no optimisation tries to get away with it. But that
would mess up your optimal code, especially if they have to get passed
in registers.


> Oh, sorry, I forgot to talk about that. Yes, the frontend already rearranges
> these arguments to the end, which means the optimizer’s default behavior
> of silently dropping extra call arguments ends up doing the right thing.

Excellent!


> I’m reluctant to say that the convention always requires these arguments.
> If we have to do that, we can, but I’d rather not; it would involve generating
> a lot of unnecessary IR and would probably create unnecessary
> code-generation differences, and I don’t think it would be sufficient for
> error results anyway.

This should be ok for internal functions, but maybe not for global /
public interfaces. The ARM ABI has specific behaviour guarantees for
public interfaces (like large alignment) that would be prohibitively
bad for all functions, but ok for public ones.

If hells break loose, you could enforce that for public interfaces only.


> We don’t want checking or setting the error result to actually involve memory
> access.

And even though most of those access could be optimised away, there's
no guarantee.

Another option would be to have a special built-in to recognise
context/error variables, and plug in a late IR pass to clean up
everything. But I'd only recommend that if we can't find another way
around.


> The ability to call a non-throwing function as a throwing function means
> we’d have to provide this extra explicit result on every single function with
> the Swift convention, because the optimizer is definitely not going to
> gracefully handle result-type mismatches; so even a function as simple as
> func foo() -> Int32
> would have to be lowered into IR as
> define { i32, i8* } @foo(i8*)

Indeed, very messy.

I'm going on a tangent, here, may be all rubbish, but...

C++ handles exception handling with the exception being thrown
allocated in library code, not the program. If, like C++, Swift can
only handle one exception at a time, why can't the error variable be a
global?

The ARM back-end accepts the -rreserve-r9 option, and others seem to
have similar options, so you could use that to force your global
variable to live on the platform register.

That way, all your error handling built-ins deal with that global
variable, which the back-end knows is on registers. You will need a
special DAG node, but I'm assuming you already have/want one. You also
drop any problem with arguments and PCS, at least for the error part.

cheers,
--renato

John McCall via llvm-dev

unread,
Mar 2, 2016, 2:47:11 PM3/2/16
to David Chisnall, llvm-dev, Clang Dev
> On Mar 2, 2016, at 11:09 AM, David Chisnall <David.C...@cl.cam.ac.uk> wrote:
> On 2 Mar 2016, at 19:04, Renato Golin via cfe-dev <cfe...@lists.llvm.org> wrote:
>>
>> I personally don't. I think we should treat Swift as any other
>> language that we support, and if we can't use existing mechanisms in
>> the back-end to lower Swift, then we need to expand the back-end to
>> support that.
>>
>> That being said, if the Swift support starts to bit-rot (if, for
>> instance, Apple stops supporting it in the future), it will be harder
>> to clean up the back-end from its CC. But that, IMHO, is a very
>> far-fetched future and a small price to pay.
>
> The Swift calling model also seems to be quite generally useful. I can imagine that VMKit would have used it, if it had been available then.
>
> My only concern is that the implicit contract between the front and back ends, with regard to calling convention, is already complex, already largely undocumented, and already difficult to infer even if you have the relevant platform ABI document in front of you. It is badly in need of some cleanup and I wonder if the desire to minimise diffs for Swift might provide Apple with some incentive to spend a little bit of engineering effort on it?

I have to say that, while I completely agree with you, I also deliberately made an effort in the design of our lowering to avoid as many of those existing complexities as I could. :) So I’m not sure we’d be an ideal vehicle for cleaning up the C lowering model. I’m also wary about turning this project — already somewhat complex — into a massive undertaking, which I’m afraid that changing general CC lowering rules would be. Furthermore, I’m concerned that anything we did here would just turn into an *extra* dimension of complexity for the backend, rather than replacing the current complexity, because it’s not obvious that targets would be able to simply drop their existing ad-hoc interpretation rules. But if you have concrete ideas about this, maybe we can find a way to work them in.

The basic tension in CC lowering is between wanting simple cases to just work without further annotations and the need to cover the full gamut of special-case ABI rules. If we didn’t care about the former, we could just require every call and the function to completely describe the ABI to use — "argument 1 is in R0, argument 2 is in R12, argument 3 is at offset 48 on the stack, and we need 64 bytes on the stack and it has to be 16-byte-aligned at the call”. But dealing with that level of generality at every single call boundary would be a huge pain for backends, and we’d still need special code for things like varargs. So instead we’ve evolved all these informal protocols between frontends and backends. The informal protocols are… annoying, but I think the bigger problem is that they’re undocumented, and it’s unclear to everybody involved what’s supposed to happen when you go outside them. So the first step, I think, would just be to document as many of those informal, target-specific protocols as we can, and then from there maybe we can find commonalities that can be usefully generalized.

John.

Tian, Xinmin via llvm-dev

unread,
Mar 2, 2016, 2:49:16 PM3/2/16
to llvm...@lists.llvm.org, Clang Dev, llvm-dev...@lists.llvm.org
Proposal for function vectorization and loop vectorization with function calls
==============================================================================
Intel Corporation (3/2/2016)

This is a proposal for an initial work towards Clang and LLVM implementation of
vectorizing a function annotated with OpenMP 4.5's "#pragma omp declare simd"
(named SIMD-enabled function) and its associated clauses based on the VectorABI
[2]. On the caller side, we propose to improve LLVM loopVectorizer such that
the code that calls the SIMD-enabled function can be vectorized. On the callee
side, we propose to add Clang FE support for "#pragma omp declare simd" syntax
and a new pass to transform the SIMD-enabled function body into a SIMD loop.
This newly created loop can then be fed to LLVM loopVectorizer (or its future
enhancement) for vectorization. This work does leverage LLVM's existing
LoopVectorizer.


Problem Statement
=================
Currently, if a loop calls a user-defined function or a 3rd party library
function, the loop can't be vectorized unless the function is inlined. In the
example below the LoopVectorizer fails to vectorize the k loop due to its
function call to "dowork" because "dowork" is an external function. Note that
inlining the "dowork" function may result in vectorization for some of the
cases, but that is not a generally applicable solution. Also, there may be
reasons why compiler may not (or can't) inline the "dowork" function call.
Therefore, there is value in being able to vectorize the loop with a call to
"dowork" function in it.

#include<stdio.h>
extern float dowork(float *a, int k);

float a[4096];
int main()
{ int k;
#pragma clang loop vectorize(enable)
for (k = 0; k < 4096; k++) {
a[k] = k * 0.5;
a[k] = dowork(a, k);
}
printf("passed %f\n", a[1024]);
}

sh-4.1$ clang -c -O2 -Rpass=loop-vectorize -Rpass-missed=loop-vectorize
-Rpass-analysis=loop-vectorize loopvec.c
loopvec.c:15:12: remark: loop not vectorized: call instruction cannot be
vectorized [-Rpass-analysis]
a[k] = dowork(a, k);
^
loopvec.c:13:3: remark: loop not vectorized: use -Rpass-analysis=loop-vectorize
for more info (Force=true) [-Rpass-missed=loop-vectorize]
for (k = 0; k < 4096; k++) {
^
loopvec.c:13:3: warning: loop not vectorized: failed explicitly specified
loop vectorization [-Wpass-failed]
1 warning generated.


New functionality of Vectorization
==================================
New functionalities and enhancements are proposed to address the issues
stated above which include: a) Vectorize a function annotated by the
programmer using OpenMP* SIMD extensions; b) Enhance LLVM's LoopVectorizer
to vectorize a loop containing a call to SIMD-enabled function.

For example, when writing:

#include<stdio.h>

#pragma omp declare simd uniform(a) linear(k)
extern float dowork(float *a, int k);

float a[4096];
int main()
{ int k;
#pragma clang loop vectorize(enable)
for (k = 0; k < 4096; k++) {
a[k] = k * 0.5;
a[k] = dowork(a, k);
}
printf("passed %f\n", a[1024]);
}

the programmer asserts that
a) there will be a vector version of "dowork" available for the compiler to
use (link with, with appropriate signature, explained below) when
vectorizing the k loop; and that
b) no loop-carried backward dependencies are introduced by the "dowork"
call that prevent the vectorization of the k loop.

The expected vector loop (shown as pseudo code, ignoring leftover iterations)
resulting from LLVM's LoopVectorizer is

... ...
vectorized_for (k = 0; k < 4096; k += VL) {
a[k:VL] = {k, k+1, k+2, k+VL-1} * 0.5;
a[k:VL] = _ZGVb4Nul_dowork(a, k);
}
... ...

In this example "_ZGVb4Nul_dowork" is a special name mangling where:
_ZGV is a prefix based on C/C++ name mangling rule suggested by GCC community,
'b' indicates "xmm" (assume we vectorize here to 128bit xmm vector registers),
'4' is VL (assume we vectorize here for length 4),
'N' indicates that the function is vectorized without a mask, M indicates that
the function is vecrized with a mask.
'u' indicates that the first parameter has the "uniform" property,
'l' indicates that the second argement has the "linear" property.

More details (including name mangling scheme) can be found in the following
references [2].

References
==========

1. OpenMP SIMD language extensions: http://www.openmp.org/mp-documents/openmp-4.
5.pdf

2. VectorABI Documentation:
https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vecto
r-Function-2012-v0.9.5.pdf
https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=Vecto
rABI.txt

[[Note: VectorABI was reviewed at X86-64 System V Application Binary Interface
mailing list. The discussion was recorded at
https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4 ]]

3. The first paper on SIMD extensions and implementations:
"Compiling C/C++ SIMD Extensions for Function and Loop Vectorizaion on
Multicore-SIMD Processors" by Xinmin Tian, Hideki Saito, Milind Girkar,
Serguei Preis, Sergey Kozhukhov, et al., IPDPS Workshops 2012, pages 2349--2358
[[Note: the first implementation and the paper were done before VectorABI was
finalized with the GCC community and Redhat. The latest VectorABI
version for OpenMP 4.5 is ready to be published]]


Proposed Implementation
=======================
1. Clang FE parses "#pragma omp declare simd [clauses]" and generates mangled
name including these prefixes as vector signatures. These mangled name
prefixes are recorded as function attributes in LLVM function attribute
group. Note that it may be possible to have several mangled names associated
with the same function, which correspond to several desired vectorized
versions. Clang FE generates all function attributes for expected vector
variants to be generated by the back-end. E.g.,

#pragma omp delcare simd uniform(a) linear(k)
float dowork(float *a, int k)
{
a[k] = sinf(a[k]) + 9.8f;
}

define __stdcall f32 @_dowork(f32* %a, i32 %k) #0
... ...
attributes #0 = { nounwind uwtable "_ZGVbM4ul_" "_ZGVbN4ul_" ...}

2. A new vector function generation pass is introduced to generate vector
variants of the original scalar function based on VectorABI (see [2, 3]).
For example, one vector variant is generated for "_ZGVbN4ul_" attribute
as follows (pseudo code):

define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0
{
#pragma clang loop vectorize(enable)
for (int %t = k; %t < %k + 4; %t++) {
%a[t] = sinf(%a[t]) + 9.8f;
}
vec_load xmm0, %a[k:VL]
return xmm0;
}

The body of the function is wrapped inside a loop having VL iterations,
which correspond to the vector lanes.

The LLVM LoopVectorizer will vectorize the generated %t loop, expected
to produce the following vectorized code eliminating the loop (pseudo code):

define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0
{
vec_load xmm1, %a[k: VL]
xmm2 = call __svml_sinf(xmm1)
xmm0 = vec_add xmm2, [9,8f, 9.8f, 9.8f, 9.8f]
store %a[k:VL], xmm0
return xmm0;
}

[[Note: Vectorizer support for the Short Vector Math Library (SVML)
functions will be a seperate proposal. ]]

3. The LLVM LoopVectorizer is enhanced to
a) identify loops with calls that have been annotated with
"#pragma omp declare simd" by checking function attribute groups;
b) analyze each call instruction and its parameters in the loop, to
determine if each parameter has the following properties:
* uniform
* linear + stride
* vector
* aligned
* called inside a conditional branch or not
... ...
Based on these properties, the signature of the vectorized call is
generated; and
c) performs signature matching to obtain the suitable vector variant
among the signatures available for the called function. If no such
signature is found, the call cannot be vectorized.

Note that a similar enhancement can and should be made also to LLVM's
SLP vectorizer.

For example:

#pragma omp declare simd uniform(a) linear(k)
extern float dowork(float *a, int k);

... ...
#pragma clang loop vectorize(enable)
for (k = 0; k < 4096; k++) {
a[k] = k * 0.5;
a[k] = dowork(a, k);
}
... ...

Step a: "dowork" function is marked as SIMD-enabled function
attributes #0 = { nounwind uwtable "_ZGVbM4ul_" "_ZGVbN4ul_" ...}

Step b: 1) 'a' is uniform, as it is the base address of array 'a'
2) 'k' is linear, as 'k' is the induction variable with stride=1
3) SIMD "dowork" is called unconditionally in the candidate k loop.
4) it is compiled for SSE4.1 with the Vector Length VL=4.
based on these properties, the signature is "_ZGVbN4ul_"

[[Notes: For conditional call in the loop, it needs masking support,
the implementation details seen in reference [1][2][3] ]]

Step c: Check if the signature "_ZGVbN4ul_" exists in function attribute #0;
if yes the suitable vectorized version is found and will be linked
with.

The below loop is expected to be produced by the LoopVectorizer:
... ...
vectorized_for (k = 0; k < 4096; k += 4) {
a[k:4] = {k, k+1, k+2, k+3} * 0.5;
a[k:4] = _ZGVb4Nul_dowork(a, k);
}
... ...

[[Note: Vectorizer support for the Short Vector Math Library (SVML) functions
will be a seperate proposal. ]]


GCC and ICC Compatibility
=========================
With this proposal the callee function and the loop containing a call to it
can each be compiled and vectorized by a different compiler, including
Clang+LLVM with its LoopVectorizer as outlined above, GCC and ICC. The
vectorized loop will then be linked with the vectorized callee function.
Of-course each of these compilers can also be used to compile both loop and
callee function.


Current Implementation Status and Plan
======================================
1. Clang FE is done by Intel Clang FE team according to #1. Note: Clang FE
syntax process patch is implemented and under community review
(http://reviews.llvm.org/D10599). In general, the review feedback is
very positive from the Clang community.

2. A new pass for function vectorization is implemented to support #2 and
to be prepared for LLVM community review.

3. Work is in progress to teach LLVM's LoopVectorizer to vectorize a loop
with user-defined function calls according to #3.

Call for Action
===============
1. Please review this proposal and provide constructive feedback on its
direction and key ideas.

2. Feel free to ask any technical questions related to this proposal and
to read the associated references.

3. Help is also highly welcome and appreciated in the development and
upstreaming process.

Reid Kleckner via llvm-dev

unread,
Mar 2, 2016, 3:01:10 PM3/2/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
On Tue, Mar 1, 2016 at 5:14 PM, John McCall via llvm-dev <llvm...@lists.llvm.org> wrote:
There are four general points of deviation from the intermediary convention:

  - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers.  For example, we want to return a value of struct A, above, purely in registers.  For the most part, I don’t think this is a problem to layer on to an existing IR convention: C frontends will generally use explicit sret arguments when the convention requires them, and so the Swift lowering will produce result types that don’t have legal interpretations as direct results under the C convention.  But we can use a different IR convention if it’s necessary to disambiguate Swift’s desired treatment from the target's normal attempts to retroactively match the C convention.

You're suggesting that backends shouldn't try to turn returns of {i32, float, i32} into sret automatically if the C ABI would require that struct to be returned indirectly. I know there are many users of LLVM out there that wish that LLVM would just follow the C ABI for them in "simple" cases like this, even though in general it's a lost cause. I think if you hide this new behavior under your own swiftcc then we can keep those people happy, ish.

  - We sometimes have both direct results and indirect results.  It would be nice to take advantage of the sret convention even in the presence of direct results on targets that do use a different (profitable) ABI treatment for it.  I don’t know how well-supported this is in LLVM.

LLVM insists that sret functions be void because the C convention requires the sret pointer to be returned in the normal return register. X86 Sys V requires this, though LLVM does not leverage it, and was non-conforming for most of its life. I don't see why Swift would need to use the 'sret' attribute for indirect results, though, if it doesn't need to conform to that part of the x86 convention. Am I missing something profitable about reusing our sret support?
 
  - We want a special “context” treatment for a certain argument.  A pointer-sized value is passed in an integer register; the same value should be present in that register after the call.  In some cases, the caller may pass a context argument to a function that doesn’t expect one, and this should not trigger undefined behavior.  Both of these rules suggest that the context argument be passed in a register which is normally callee-save.

As discussed later, these arguments would come last. I thought it was already legal to call a C function with too many arguments without invoking UB, so I think we have to keep this working in LLVM anyway.
 
  - The “error” treatment requires some way to (1) pass and receive the value in the caller and (2) receive and change the value in the callee.  The best way we could think of to represent this was to pretend that the argument is actually passed indirectly; the value is “passed” by storing to the pointer and “received” by loading from it.  To simplify backend lowering, we require the argument to be a special kind of swifterror alloca that can only be loaded, stored, and passed as a swifterror argument; in the callee, swifterror arguments have similar restrictions.  This ends up being fairly invasive in the backend, unfortunately.

This seems unfortunate. I guess you've already rejected returning an FCA. I wonder if we should ever go back to the world of "only calls can produce multiple values" as a special case, since that's what really happens at the MI level. I wonder if operand bundles or tokens could help solve this problem.

---

In general, yes, I'm in favor of upstreaming this Swift CC support. The main awkwardness here to me is the error value which is in memory in the mid-level, but in registers in the backend. I feel like intrinsic/token/bundle/glue stuff might actually be better.

John McCall via llvm-dev

unread,
Mar 2, 2016, 3:03:36 PM3/2/16
to Renato Golin, llvm-dev, Joe Pamer, Clang Dev
> On Mar 2, 2016, at 11:33 AM, Renato Golin <renato...@linaro.org> wrote:
> On 2 March 2016 at 18:48, John McCall <rjmc...@apple.com> wrote:
>> The frontend will not tell the backend explicitly which parameters will be
>> in registers; it will just pass a bunch of independent scalar values, and
>> the backend will assign them to registers or the stack as appropriate.
>
> I'm assuming you already have code in the back-end that does that in
> the way you want, as you said earlier you may want to use variable
> number of registers for PCS.
>
>
>> Our intent is to completely bypass all of the passing-structures-in-registers
>> code in the backend by simply not exposing the backend to any parameters
>> of aggregate type. The frontend will turn a struct into (say) an i32, a float,
>> and an i8; if the first two get passed in registers and the last gets passed
>> on the stack, so be it.
>
> How do you differentiate the @foo's below?
>
> struct A { i32, float };
> struct B { float, i32 };
>
> define @foo (A, i32) -> @foo(i32, float, i32);
>
> and
>
> define @foo (i32, B) -> @foo(i32, float, i32);

We don’t need to. We don't use the intermediary convention’s rules for aggregates.
The Swift rule for aggregate arguments is literally “if it’s too complex according to
<foo>, pass it indirectly; otherwise, expand it into a sequence of scalar values and
pass them separately”. If that means it’s partially passed in registers and partially
on the stack, that’s okay; we might need to re-assemble it in the callee, but the
first part of the rule limits how expensive that can ever get.

>> The only difficulty with this plan is that, when we have multiple results, we
>> don’t have a choice but to return a struct type. To the extent that backends
>> try to infer that the function actually needs to be sret, instead of just trying
>> to find a way to return all the components of the struct type in appropriate
>> registers, that will be sub-optimal for us. If that’s a pervasive problem, then
>> we probably just need to introduce a swift calling convention in LLVM.
>
> Oh, yeah, some back-ends will fiddle with struct return. Not all
> languages have single-value-return restrictions, but I think that ship
> has sailed already for IR.
>
> That's another reason to try and pass all by pointer at the end of the
> parameter list, instead of receive as an argument and return.

That’s pretty sub-optimal compared to just returning in registers. Also, most
backends do have the ability to return small structs in multiple registers already.

>> A direct result is something that’s returned in registers. An indirect
>> result is something that’s returned by storing it in an implicit out-parameter.
>
> Oh, I see. In that case, any assumption on the variable would have to
> be invalidated, maybe use global volatile variables, or special
> built-ins, so that no optimisation tries to get away with it. But that
> would mess up your optimal code, especially if they have to get passed
> in registers.

I don’t understand what you mean here. The out-parameter is still explicit in
LLVM IR. Nothing about this is novel, except that C frontends generally won’t
combine indirect results with direct results. Worst case, if pervasive LLVM
assumptions prevent us from combining the sret attribute with a direct result,
we just won’t use the sret attribute.

>> Oh, sorry, I forgot to talk about that. Yes, the frontend already rearranges
>> these arguments to the end, which means the optimizer’s default behavior
>> of silently dropping extra call arguments ends up doing the right thing.
>
> Excellent!
>
>
>> I’m reluctant to say that the convention always requires these arguments.
>> If we have to do that, we can, but I’d rather not; it would involve generating
>> a lot of unnecessary IR and would probably create unnecessary
>> code-generation differences, and I don’t think it would be sufficient for
>> error results anyway.
>
> This should be ok for internal functions, but maybe not for global /
> public interfaces. The ARM ABI has specific behaviour guarantees for
> public interfaces (like large alignment) that would be prohibitively
> bad for all functions, but ok for public ones.
>
> If hells break loose, you could enforce that for public interfaces only.
>
>
>> We don’t want checking or setting the error result to actually involve memory
>> access.
>
> And even though most of those access could be optimised away, there's
> no guarantee.

Right. The backend isn’t great about removing memory operations that survive to it.

> Another option would be to have a special built-in to recognise
> context/error variables, and plug in a late IR pass to clean up
> everything. But I'd only recommend that if we can't find another way
> around.
>
>
>> The ability to call a non-throwing function as a throwing function means
>> we’d have to provide this extra explicit result on every single function with
>> the Swift convention, because the optimizer is definitely not going to
>> gracefully handle result-type mismatches; so even a function as simple as
>> func foo() -> Int32
>> would have to be lowered into IR as
>> define { i32, i8* } @foo(i8*)
>
> Indeed, very messy.
>
> I'm going on a tangent, here, may be all rubbish, but...
>
> C++ handles exception handling with the exception being thrown
> allocated in library code, not the program. If, like C++, Swift can
> only handle one exception at a time, why can't the error variable be a
> global?
>
> The ARM back-end accepts the -rreserve-r9 option, and others seem to
> have similar options, so you could use that to force your global
> variable to live on the platform register.
>
> That way, all your error handling built-ins deal with that global
> variable, which the back-end knows is on registers. You will need a
> special DAG node, but I'm assuming you already have/want one. You also
> drop any problem with arguments and PCS, at least for the error part.

Swift does not run in an independent environment; it has to interact with
existing C code. That existing code does not reserve any registers globally
for this use. Even if that were feasible, we don’t actually want to steal a
register globally from all the C code on the system that probably never
interacts with Swift.

John.

John McCall via llvm-dev

unread,
Mar 2, 2016, 3:10:40 PM3/2/16
to Reid Kleckner, llvm-dev, Joe Pamer, Clang Dev
On Mar 2, 2016, at 12:00 PM, Reid Kleckner <r...@google.com> wrote:
On Tue, Mar 1, 2016 at 5:14 PM, John McCall via llvm-dev <llvm...@lists.llvm.org> wrote:
There are four general points of deviation from the intermediary convention:

  - We sometimes want to return more values in registers than the convention normally does, and we want to be able to use both integer and floating-point registers.  For example, we want to return a value of struct A, above, purely in registers.  For the most part, I don’t think this is a problem to layer on to an existing IR convention: C frontends will generally use explicit sret arguments when the convention requires them, and so the Swift lowering will produce result types that don’t have legal interpretations as direct results under the C convention.  But we can use a different IR convention if it’s necessary to disambiguate Swift’s desired treatment from the target's normal attempts to retroactively match the C convention.

You're suggesting that backends shouldn't try to turn returns of {i32, float, i32} into sret automatically if the C ABI would require that struct to be returned indirectly. I know there are many users of LLVM out there that wish that LLVM would just follow the C ABI for them in "simple" cases like this, even though in general it's a lost cause. I think if you hide this new behavior under your own swiftcc then we can keep those people happy, ish.

Yes, that may be best.

  - We sometimes have both direct results and indirect results.  It would be nice to take advantage of the sret convention even in the presence of direct results on targets that do use a different (profitable) ABI treatment for it.  I don’t know how well-supported this is in LLVM.

LLVM insists that sret functions be void because the C convention requires the sret pointer to be returned in the normal return register. X86 Sys V requires this, though LLVM does not leverage it, and was non-conforming for most of its life. I don't see why Swift would need to use the 'sret' attribute for indirect results, though, if it doesn't need to conform to that part of the x86 convention. Am I missing something profitable about reusing our sret support?

For most platforms, it’s not profitable.  On some platforms, there’s a register reserved for the sret argument; it would be nice to take advantage of that for several reasons, including just being nicer to existing tools (debuggers, etc.) in common cases.

  - We want a special “context” treatment for a certain argument.  A pointer-sized value is passed in an integer register; the same value should be present in that register after the call.  In some cases, the caller may pass a context argument to a function that doesn’t expect one, and this should not trigger undefined behavior.  Both of these rules suggest that the context argument be passed in a register which is normally callee-save.

As discussed later, these arguments would come last. I thought it was already legal to call a C function with too many arguments without invoking UB, so I think we have to keep this working in LLVM anyway.

Formally, no, it’s UB to call (non-variadic, of course) C functions with extra arguments.  But you’re right, it does generally work at runtime, and LLVM doesn’t get in the way.
 
  - The “error” treatment requires some way to (1) pass and receive the value in the caller and (2) receive and change the value in the callee.  The best way we could think of to represent this was to pretend that the argument is actually passed indirectly; the value is “passed” by storing to the pointer and “received” by loading from it.  To simplify backend lowering, we require the argument to be a special kind of swifterror alloca that can only be loaded, stored, and passed as a swifterror argument; in the callee, swifterror arguments have similar restrictions.  This ends up being fairly invasive in the backend, unfortunately.

This seems unfortunate. I guess you've already rejected returning an FCA.

See one of my recent responses to Renato.  It’s really hard to make that work cleanly with the ability to call functions that lack the result.

I wonder if we should ever go back to the world of "only calls can produce multiple values" as a special case, since that's what really happens at the MI level. I wonder if operand bundles or tokens could help solve this problem.

---

In general, yes, I'm in favor of upstreaming this Swift CC support. The main awkwardness here to me is the error value which is in memory in the mid-level, but in registers in the backend. I feel like intrinsic/token/bundle/glue stuff might actually be better.

I agree that the error-result stuff is the most intrinsically awkward part of our current patch, and I would love to find alternatives.

John.

Richard Smith via llvm-dev

unread,
Mar 2, 2016, 3:22:46 PM3/2/16
to John McCall, llvm-dev, Clang Dev

I consider it completely reasonable for Clang to support this calling
convention and the associated attributes, especially given the minor
impact you describe above.

Mikhail Zolotukhin via llvm-dev

unread,
Mar 2, 2016, 5:43:22 PM3/2/16
to Tian, Xinmin, llvm...@lists.llvm.org, llvm-dev...@lists.llvm.org, Clang Dev
Hi Tian,

Thanks for the writeup, it sounds very interesting! Please find some questions/comments inline:

I think it should be possible to vectorize such loop even without openmp clauses. We just need to gather a vector value from several scalar calls, and vectorizer already knows how to do that, we just need not to bail out early. Dealing with calls is tricky, but in this case we have the pragma, so we can assume it should be fine. What do you think, will it make sense to start from here?

Am I getting it right, that you're going to emit declarations for all possible vector types, and then implement only used ones? If not, how does frontend know which vector-width to use? If the dowork function and its caller are in different modules, how does compiler communicate what vector width are needed?


>
> The body of the function is wrapped inside a loop having VL iterations,
> which correspond to the vector lanes.
>
> The LLVM LoopVectorizer will vectorize the generated %t loop, expected
> to produce the following vectorized code eliminating the loop (pseudo code):
>
> define __stdcall <4 x f32> @_ZGVbN4ul_dowork(f32* %a, i32 %k) #0
> {
> vec_load xmm1, %a[k: VL]
> xmm2 = call __svml_sinf(xmm1)
> xmm0 = vec_add xmm2, [9,8f, 9.8f, 9.8f, 9.8f]
> store %a[k:VL], xmm0
> return xmm0;
> }
>
> [[Note: Vectorizer support for the Short Vector Math Library (SVML)
> functions will be a seperate proposal. ]]

Loop Vectorizer already supports math functions and math functions libraries. You might need just to expand this support to SVML (i.e. add tables of correspondence between scalar and vector function variants).

Again, thanks for writing it up. I think this would be a valuable improvement of the vectorizer and I'm looking forward to further discussion and/or patches!

Best regards,
Michael

Tian, Xinmin via llvm-dev

unread,
Mar 2, 2016, 6:48:57 PM3/2/16
to mzolo...@apple.com, llvm...@lists.llvm.org, llvm-dev...@lists.llvm.org, Clang Dev
Hi Michael. Thank for your feedback and questions/comments. See below.

>>>>>I think it should be possible to vectorize such loop even without openmp clauses. We just need to gather a vector value from several scalar calls, and vectorizer already knows how to do that, we just need not to bail out early. Dealing with calls is tricky, but in this case we have the pragma, so we can assume it should be fine. What do you think, will it make sense to start from here?

Yes, we can vectorize this loop by calling scalar code VL times as an emulation of SIMD execution if the SIMD version does not exist to start with. See below example, we need this functionality anyway as a fall back for vecotizing this loop when there is no SIMD version of dowork exist. E.g.

#pragma clang loop vectorize(enable)
for (k = 0; k < 4096; k++) {
a[k] = k * 0.5;
a[k] = dowork(a, k);
}

==>

Vectorized_for (k = 0; k < 4096; k+=VL) { // assume VL = 4. No vector version of dowork function exist.
a[k:VL] = {k, k+1, K+2, k+3) * 0.5.; // Broadcast 0.5 to SIMD register, vector mul, with {k, k+1, k+2, k+3}, vector store to a[k:VL]
t0 = dowork(a, k) // emulate SIMD execution with scalar calls.
t1 = dowork(a, k+1)
t2 = dowork(a, k+2)
t3 = dowork(a, k+3)
a[k:VL] = {t0, t1, t2, t3}; // SIMD store
}

>>>>>Am I getting it right, that you're going to emit declarations for all possible vector types, and then implement only used ones? If not, how does frontend know which vector-width to use? If the dowork function and its caller are in different modules, how does compiler communicate what vector width are needed?

Yes, you are right in general, that is defined by VectorABI used by GCC and ICC. E.g. GCC generation 7 versions by default for x86 (scalar, SSE(mask, nomask), AVX(mask, nomask), AVX2 (mask, nomask). There are several options we can optimize to reduce the # of version we need to generate w.r.t compile-time and code-size. We can provide detailed info.

>>>>> Loop Vectorizer already supports math functions and math functions libraries. You might need just to expand this support to SVML (i.e. add tables of correspondence between scalar and vector function variants).

Correct, that is the Step 3 in the doc we are working on.

>>>>> Again, thanks for writing it up. I think this would be a valuable improvement of the vectorizer and I'm looking forward to further discussion and/or patches!

Thanks for the positive feedback! We are also looking forward to further discussion and sending patches with help from you and other LLVM community members.

Thanks,
Xinmin

Michael Zolotukhin via llvm-dev

unread,
Mar 2, 2016, 7:08:13 PM3/2/16
to Tian, Xinmin, llvm...@lists.llvm.org, llvm-dev...@lists.llvm.org, Clang Dev
Hi Tian,


> On Mar 2, 2016, at 3:48 PM, Tian, Xinmin <xinmi...@intel.com> wrote:
>
> Hi Michael. Thank for your feedback and questions/comments. See below.
>
>>>>>> I think it should be possible to vectorize such loop even without openmp clauses. We just need to gather a vector value from several scalar calls, and vectorizer already knows how to do that, we just need not to bail out early. Dealing with calls is tricky, but in this case we have the pragma, so we can assume it should be fine. What do you think, will it make sense to start from here?
>
> Yes, we can vectorize this loop by calling scalar code VL times as an emulation of SIMD execution if the SIMD version does not exist to start with. See below example, we need this functionality anyway as a fall back for vecotizing this loop when there is no SIMD version of dowork exist. E.g.
>
> #pragma clang loop vectorize(enable)
> for (k = 0; k < 4096; k++) {
> a[k] = k * 0.5;
> a[k] = dowork(a, k);
> }
>
> ==>
>
> Vectorized_for (k = 0; k < 4096; k+=VL) { // assume VL = 4. No vector version of dowork function exist.
> a[k:VL] = {k, k+1, K+2, k+3) * 0.5.; // Broadcast 0.5 to SIMD register, vector mul, with {k, k+1, k+2, k+3}, vector store to a[k:VL]
> t0 = dowork(a, k) // emulate SIMD execution with scalar calls.
> t1 = dowork(a, k+1)
> t2 = dowork(a, k+2)
> t3 = dowork(a, k+3)
> a[k:VL] = {t0, t1, t2, t3}; // SIMD store
> }

Yes, that’s what I meant.

>>>>>> Am I getting it right, that you're going to emit declarations for all possible vector types, and then implement only used ones? If not, how does frontend know which vector-width to use? If the dowork function and its caller are in different modules, how does compiler communicate what vector width are needed?
>
> Yes, you are right in general, that is defined by VectorABI used by GCC and ICC. E.g. GCC generation 7 versions by default for x86 (scalar, SSE(mask, nomask), AVX(mask, nomask), AVX2 (mask, nomask).

How does it play with other architectures? Should it be described in more general terms, like vector/element width? I realize that you might be mostly concerned about x86, but this feature looks pretty generic, so I think it should be kept target-independent.

> There are several options we can optimize to reduce the # of version we need to generate w.r.t compile-time and code-size. We can provide detailed info.

I’ll be interested in looking into this, as I find this part the most challenging in this changeset (other parts look to me like clear improvements of what we have now).

Thanks,
Michael

Tian, Xinmin via llvm-dev

unread,
Mar 2, 2016, 7:15:32 PM3/2/16
to mzolo...@apple.com, llvm...@lists.llvm.org, llvm-dev...@lists.llvm.org, Clang Dev
>>>>I’ll be interested in looking into this, as I find this part the most challenging in this changeset (other parts look to me like clear improvements of what we have now).

Right, this part is a challenging part. We will work closely with you to get a good implementation. Thanks.

>>>>How does it play with other architectures? Should it be described in more general terms, like vector/element width? I realize that you might be mostly concerned about x86, but this feature looks pretty generic, so I think it should be kept target-independent.

Yes, our experiences are mainly x86, we will work with you for the generic support.

John McCall via llvm-dev

unread,
Mar 2, 2016, 7:34:14 PM3/2/16
to Richard Smith, llvm-dev, Clang Dev

Great, thanks.

Since the response so far has been very positive on the idea, I think it’s probably time to start sending out patches for review. Manman will be leading that on the LLVM side, since she did most of the work there. On the Clang side, I’ll land what I have and then progressively work on it in trunk.

John.

Renato Golin via llvm-dev

unread,
Mar 3, 2016, 5:00:21 AM3/3/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
On 2 March 2016 at 20:03, John McCall <rjmc...@apple.com> wrote:
> We don’t need to. We don't use the intermediary convention’s rules for aggregates.
> The Swift rule for aggregate arguments is literally “if it’s too complex according to
> <foo>, pass it indirectly; otherwise, expand it into a sequence of scalar values and
> pass them separately”. If that means it’s partially passed in registers and partially
> on the stack, that’s okay; we might need to re-assemble it in the callee, but the
> first part of the rule limits how expensive that can ever get.

Right. My worry is, then, how this plays out with ARM's AAPCS.

As you said below, you *have* to interoperate with C code, so you will
*have* to interoperate with AAPCS on ARM.

AAPCS's rules on aggregates are not simple, but they also allow part
of it in registers, part on the stack. I'm guessing you won't have the
same exact rules, but similar ones, which may prove harder to
implement than the former.


> That’s pretty sub-optimal compared to just returning in registers. Also, most
> backends do have the ability to return small structs in multiple registers already.

Yes, but not all of them can return more than two, which may constrain
you if you have both error and context values in a function call, in
addition to the return value.


> I don’t understand what you mean here. The out-parameter is still explicit in
> LLVM IR. Nothing about this is novel, except that C frontends generally won’t
> combine indirect results with direct results.

Sorry, I had understood this, but your reply (for some reason) made me
think it was a hidden contract, not an explicit argument. Ignore me,
then. :)


> Right. The backend isn’t great about removing memory operations that survive to it.

Precisely!


> Swift does not run in an independent environment; it has to interact with
> existing C code. That existing code does not reserve any registers globally
> for this use. Even if that were feasible, we don’t actually want to steal a
> register globally from all the C code on the system that probably never
> interacts with Swift.

So, as Reid said, usage of built-ins might help you here.

Relying on LLVM's ability to not mess up your fiddling with variable
arguments seems unstable. Adding specific attributes to functions or
arguments seem too invasive. So a solution would be to add a built-in
in the beginning of the function to mark those arguments as special.

Instead of alloca %a + load -> store + return, you could have
llvm.swift.error.load(%a) -> llvm.swift.error.return(%a), which
survives most of middle-end passes intact, and a late pass then change
the function to return a composite type, either a structure or a
larger type, that will be lowered in more than one register.

This makes sure error propagation won't be optimised away, and that
you can receive the error in any register (or even stack), but will
always return it in the same registers (ex. on ARM, R1 for i32, R2+R3
for i64, etc).

I understand this might be far off what you guys did, and I'm not
trying to re-write history, just brainstorming a bit.

IMO, both David and Richard are right. This is likely not a huge deal
for the CC code, but we'd be silly not to take this opportunity to
make it less fragile overall.

cheers,
--renato

Andrey Bokhanko via llvm-dev

unread,
Mar 3, 2016, 5:28:23 AM3/3/16
to Tian, Xinmin, David Majnemer, Richard Smith, llvm...@lists.llvm.org, llvm-dev...@lists.llvm.org, Clang Dev
Hi David [Majnemer], Richard [Smith],

Front-end wise, the biggest change in this proposal is introduction of
new mangling for vector functions.

May I ask you to look at the mangling part (sections 1 and 2 in the
"Proposed Implementation" chapter) and review it?

(Obviously, others who are concerned with how mangling is done in
Clang are welcome to chime in as well!)

Yours,
Andrey

John McCall via llvm-dev

unread,
Mar 3, 2016, 12:37:23 PM3/3/16
to Renato Golin, llvm-dev, Joe Pamer, Clang Dev
> On Mar 3, 2016, at 2:00 AM, Renato Golin <renato...@linaro.org> wrote:
>
> On 2 March 2016 at 20:03, John McCall <rjmc...@apple.com> wrote:
>> We don’t need to. We don't use the intermediary convention’s rules for aggregates.
>> The Swift rule for aggregate arguments is literally “if it’s too complex according to
>> <foo>, pass it indirectly; otherwise, expand it into a sequence of scalar values and
>> pass them separately”. If that means it’s partially passed in registers and partially
>> on the stack, that’s okay; we might need to re-assemble it in the callee, but the
>> first part of the rule limits how expensive that can ever get.
>
> Right. My worry is, then, how this plays out with ARM's AAPCS.
>
> As you said below, you *have* to interoperate with C code, so you will
> *have* to interoperate with AAPCS on ARM.

I’m not sure of your point here. We don’t use the Swift CC to call C functions.
It does not matter, at all, whether the frontend lowering of an aggregate under
the Swift CC resembles the frontend lowering of the same aggregate under AAPCS.

I brought up interoperation with C code as a counterpoint to the idea of globally
reserving a register.

> AAPCS's rules on aggregates are not simple, but they also allow part
> of it in registers, part on the stack. I'm guessing you won't have the
> same exact rules, but similar ones, which may prove harder to
> implement than the former.


>> That’s pretty sub-optimal compared to just returning in registers. Also, most
>> backends do have the ability to return small structs in multiple registers already.
>
> Yes, but not all of them can return more than two, which may constrain
> you if you have both error and context values in a function call, in
> addition to the return value.

We do actually use a different swiftcc calling convention in IR. I don’t see any
serious interop problems here. The “intermediary” convention is just the original
basis of swiftcc on the target.

>> I don’t understand what you mean here. The out-parameter is still explicit in
>> LLVM IR. Nothing about this is novel, except that C frontends generally won’t
>> combine indirect results with direct results.
>
> Sorry, I had understood this, but your reply (for some reason) made me
> think it was a hidden contract, not an explicit argument. Ignore me,
> then. :)
>
>
>> Right. The backend isn’t great about removing memory operations that survive to it.
>
> Precisely!
>
>
>> Swift does not run in an independent environment; it has to interact with
>> existing C code. That existing code does not reserve any registers globally
>> for this use. Even if that were feasible, we don’t actually want to steal a
>> register globally from all the C code on the system that probably never
>> interacts with Swift.
>
> So, as Reid said, usage of built-ins might help you here.
>
> Relying on LLVM's ability to not mess up your fiddling with variable
> arguments seems unstable. Adding specific attributes to functions or
> arguments seem too invasive.

I’m not sure why you say that. We already do have parameter ABI override
attributes with target-specific behavior in LLVM IR: sret and inreg.

I can understand being uneasy with adding new swiftcc-specific attributes, though.
It would be reasonable to make this more general. Attributes can be parameterized;
maybe we could just say something like abi(“context”), and leave it to the CC to
interpret that?

Having that sort of ability might make some special cases easier for C lowering,
too, come to think of it. Imagine an x86 ABI that — based on type information
otherwise erased by the conversion to LLVM IR — sometimes returns a float in
an SSE register and sometimes on the x86 stack. It would be very awkward to
express that today, but some sort of abi(“x87”) attribute would make it easy.

> So a solution would be to add a built-in
> in the beginning of the function to mark those arguments as special.
>
> Instead of alloca %a + load -> store + return, you could have
> llvm.swift.error.load(%a) -> llvm.swift.error.return(%a), which
> survives most of middle-end passes intact, and a late pass then change
> the function to return a composite type, either a structure or a
> larger type, that will be lowered in more than one register.
>
> This makes sure error propagation won't be optimised away, and that
> you can receive the error in any register (or even stack), but will
> always return it in the same registers (ex. on ARM, R1 for i32, R2+R3
> for i64, etc).
>
> I understand this might be far off what you guys did, and I'm not
> trying to re-write history, just brainstorming a bit.
>
> IMO, both David and Richard are right. This is likely not a huge deal
> for the CC code, but we'd be silly not to take this opportunity to
> make it less fragile overall.

The lowering required for this would be very similar to the lowering that Manman’s
patch does for swift-error: the backend basically does special value
propagation. The main difference is that it’s completely opaque to the middle-end
by default instead of looking like a load or store that ordinary memory optimizations
can handle. That seems like a loss, since those optimizations would actually do
the right thing.

John.

Renato Golin via llvm-dev

unread,
Mar 3, 2016, 1:07:21 PM3/3/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
On 3 March 2016 at 17:36, John McCall <rjmc...@apple.com> wrote:
> I’m not sure of your point here. We don’t use the Swift CC to call C functions.
> It does not matter, at all, whether the frontend lowering of an aggregate under
> the Swift CC resembles the frontend lowering of the same aggregate under AAPCS.

Right, ignore me, then.


> I’m not sure why you say that. We already do have parameter ABI override
> attributes with target-specific behavior in LLVM IR: sret and inreg.

Their meaning is somewhat confused and hard-coded in the back-end. I
once wanted to use inreg for lowering register-based divmod in
SelectionDAG, but ended up implementing custom lowering in the ARM
back-end because inreg wasn't used correctly. It's possible that now
it's better, but you'll always be at the mercy of what the back-end
does with the attributes, especially in custom lowering.

Also, for different back-ends, "inreg" means different things. If the
PCS allows multiple argument/return registers, then sret inreg is
possible for a structure with up to X/Y words, where X and Y are
different for different targets and could very well be zero.

Example, in a machine with *two* PCS registers:

i64 @foo (i32)

returning in registers becomes: sret { i32, i32 } @foo (inreg i32)

then you add your error: sret { i32, i32, i8* } @foo (inreg i32, inreg i8*)

You can fit the two arguments in registers, but you can't fit the
result + error in your sret.

Targets will have to deal with that in the DAG, if you don't do that
in IR. The ARM target would put the error pointer in the stack, which
is not where you want it to go.

You'd probably need a way to mark portions of your sret as *must be
inreg* and others to be "nice to be inreg", so that you can spill the
result and not the error, if that's what you want.


> Having that sort of ability might make some special cases easier for C lowering,
> too, come to think of it. Imagine an x86 ABI that — based on type information
> otherwise erased by the conversion to LLVM IR — sometimes returns a float in
> an SSE register and sometimes on the x86 stack. It would be very awkward to
> express that today, but some sort of abi(“x87”) attribute would make it easy.

If this is kept in Swift PCS only, and if the compiler always agree on
which registers you're using, that's ok.

But if you call a C function, or a new version of LLVM decides to use
a different register, you'll have run-time problems.

That's why ARM has different standards for hard and soft float, which
cannot mix.

cheers,
--renato

John McCall via llvm-dev

unread,
Mar 3, 2016, 4:09:10 PM3/3/16
to Renato Golin, llvm-dev, Joe Pamer, Clang Dev

Right, this is one very good reason I would prefer to keep the error-result
modelled as a parameter rather than mixing it in with the return value.

Also, recall that the error-result is supposed to be assigned to a register
that isn’t normally used for return values (or arguments, for that matter).

>> Having that sort of ability might make some special cases easier for C lowering,
>> too, come to think of it. Imagine an x86 ABI that — based on type information
>> otherwise erased by the conversion to LLVM IR — sometimes returns a float in
>> an SSE register and sometimes on the x86 stack. It would be very awkward to
>> express that today, but some sort of abi(“x87”) attribute would make it easy.
>
> If this is kept in Swift PCS only, and if the compiler always agree on
> which registers you're using, that's ok.
>
> But if you call a C function, or a new version of LLVM decides to use
> a different register, you'll have run-time problems.

A new version of LLVM really can’t just decide to use a different register
once there’s an agreed interpretation. It sounds like the problem you were
running into with “inreg” was that the ARM backend didn’t have a stable meaning
for it, probably because the ARM target doesn’t allow the frontend features
(regparm/sseregparm) that inreg is designed for. But there are targets — i386,
chiefly — where inreg has a defined, stable meaning precisely because regparm
has a defined, stable meaning. It seems to me that an abi(“context”) attribute
would be more like the latter than the former: any target that supports swiftcc
would also have to assign a stable meaning for abi(“context”).

John.

David Chisnall via llvm-dev

unread,
Mar 4, 2016, 3:54:32 AM3/4/16
to John McCall, llvm-dev, Clang Dev
On 2 Mar 2016, at 19:46, John McCall <rjmc...@apple.com> wrote:
>
> I have to say that, while I completely agree with you, I also deliberately made an effort in the design of our lowering to avoid as many of those existing complexities as I could. :) So I’m not sure we’d be an ideal vehicle for cleaning up the C lowering model. I’m also wary about turning this project — already somewhat complex — into a massive undertaking, which I’m afraid that changing general CC lowering rules would be. Furthermore, I’m concerned that anything we did here would just turn into an *extra* dimension of complexity for the backend, rather than replacing the current complexity, because it’s not obvious that targets would be able to simply drop their existing ad-hoc interpretation rules. But if you have concrete ideas about this, maybe we can find a way to work them in.
>
> The basic tension in CC lowering is between wanting simple cases to just work without further annotations and the need to cover the full gamut of special-case ABI rules. If we didn’t care about the former, we could just require every call and the function to completely describe the ABI to use — "argument 1 is in R0, argument 2 is in R12, argument 3 is at offset 48 on the stack, and we need 64 bytes on the stack and it has to be 16-byte-aligned at the call”. But dealing with that level of generality at every single call boundary would be a huge pain for backends, and we’d still need special code for things like varargs. So instead we’ve evolved all these informal protocols between frontends and backends. The informal protocols are… annoying, but I think the bigger problem is that they’re undocumented, and it’s unclear to everybody involved what’s supposed to happen when you go outside them. So the first step, I think, would just be to document as many of those informal, target-specific protocols as we can, and then from there maybe we can find commonalities that can be usefully generalized.

To be absolutely clear: I’m not suggesting that merging the Swift CC should be conditional on Apple fixing all of the associated ugliness in all of the calling convention logic. I support merging the Swift CC, but I also think that it is going to add some complexity to an already complex part of LLVM and think that it would be good if it could come along with a plan for reducing that complexity.

For the current logic, there are two interrelated issues:

- The C ABI defines how to map things to registers / stack slots.

- Other language ABI documents (including C++) are typically defined in terms of lowering to the platform’s C calling convention. Even when the core language is not, the C FFI usually is.

There are a few smaller issues, such as the complexity required for each pass to work out what the return value of a call / invoke instruction is (is it the return value, is it a load of some alloca that is passed via an sret argument?).

There are two separable parts of this problem:

- What does the representation of a call with a known set of C types look like in LLVM?

- What are the APIs that we use for constructing a function that has these calls?

Clang already has APIs to abstract a lot of this. Given a C type and a set of LLVM values that represent these C values, it can deconstruct the values into the relevant LLVM types and, on the callee side, reassemble LLVM values that correspond to the C types. It’s been proposed a few times before to have some kind of ABIBuilder class that would encapsulate this behaviour, probably pulling some code out of clang. It would then be the responsibility of backend maintainers to ensure that the ABIBuilder is kept in sync with any changes to how they represent their ABI in IR. It would probably also help to have some introspection APIs of the same form (e.g. for getting the return value).

David

Renato Golin via llvm-dev

unread,
Mar 4, 2016, 9:04:01 AM3/4/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
On 3 March 2016 at 21:08, John McCall <rjmc...@apple.com> wrote:
> Right, this is one very good reason I would prefer to keep the error-result
> modelled as a parameter rather than mixing it in with the return value.

Ok, so we're on the same page here.


> Also, recall that the error-result is supposed to be assigned to a register
> that isn’t normally used for return values (or arguments, for that matter).

This looks more complicated, though.

The back-end knows how to lower standard PCS, so you'll have to teach
that this particular argument violates that agreement in a very
predictable fashion, ie. another ABI.

If the non-PCS register you use is always the same (say the platform
register), then you'll have to save/restore whenever you cross the
boundaries between using/not-using (ex. between C and Swift
functions). This sounds hard to get right.

One way to know would be to identify all calls in IR that have
different number of parameters, and do the save/restore there.
Example:

define @foo() {
...
call @bar(i32, i8*)
...
}

define @bar(i32)

You'd need to change the frame lowering code to identify the
difference and, instead of bailing out, create the additional spills.


> A new version of LLVM really can’t just decide to use a different register
> once there’s an agreed interpretation.

Good, so there will be a defined ABI.

It originally sounded like you could choose "any register", but it
seems you're actually going to define the exact behaviour in all
supported platforms.


> It seems to me that an abi(“context”) attribute
> would be more like the latter than the former: any target that supports swiftcc
> would also have to assign a stable meaning for abi(“context”).

Makes sense.

John McCall via llvm-dev

unread,
Mar 4, 2016, 2:18:00 PM3/4/16
to Renato Golin, llvm-dev, Joe Pamer, Clang Dev
> On Mar 4, 2016, at 6:03 AM, Renato Golin <renato...@linaro.org> wrote:
> On 3 March 2016 at 21:08, John McCall <rjmc...@apple.com> wrote:
>> Right, this is one very good reason I would prefer to keep the error-result
>> modelled as a parameter rather than mixing it in with the return value.
>
> Ok, so we're on the same page here.
>
>
>> Also, recall that the error-result is supposed to be assigned to a register
>> that isn’t normally used for return values (or arguments, for that matter).
>
> This looks more complicated, though.
>
> The back-end knows how to lower standard PCS, so you'll have to teach
> that this particular argument violates that agreement in a very
> predictable fashion, ie. another ABI.

We are using a different swiftcc convention in IR already, and we are fine with
locking the error-result treatment to that CC.

> If the non-PCS register you use is always the same (say the platform
> register), then you'll have to save/restore whenever you cross the
> boundaries between using/not-using (ex. between C and Swift
> functions). This sounds hard to get right.
>
> One way to know would be to identify all calls in IR that have
> different number of parameters, and do the save/restore there.
> Example:
>
> define @foo() {
> ...
> call @bar(i32, i8*)
> ...
> }
>
> define @bar(i32)
>
> You'd need to change the frame lowering code to identify the
> difference and, instead of bailing out, create the additional spills.

I don’t think we can make this depend on statically recognizing when we’re
passing extra arguments. That’s why, in our current implementation, whether
or not the register is treated as an ordinary callee-save register or the magic
error result is based on whether there’s an argument to the call (or function
on the callee side) with that specific parameter attribute.

>> A new version of LLVM really can’t just decide to use a different register
>> once there’s an agreed interpretation.
>
> Good, so there will be a defined ABI.
>
> It originally sounded like you could choose "any register", but it
> seems you're actually going to define the exact behaviour in all
> supported platforms.

Right.

John.

Renato Golin via llvm-dev

unread,
Mar 4, 2016, 2:23:45 PM3/4/16
to John McCall, llvm-dev, Joe Pamer, Clang Dev
On 4 March 2016 at 19:17, John McCall <rjmc...@apple.com> wrote:
> We are using a different swiftcc convention in IR already, and we are fine with
> locking the error-result treatment to that CC.

Makes sense.


> I don’t think we can make this depend on statically recognizing when we’re
> passing extra arguments. That’s why, in our current implementation, whether
> or not the register is treated as an ordinary callee-save register or the magic
> error result is based on whether there’s an argument to the call (or function
> on the callee side) with that specific parameter attribute.

Right, and you set it up even if the caller doesn't use the error
argument, which is expected.

I think all my questions were answered, and I'm happy with it. Thanks
for the time! :)

I'll look into Manman's patch soon, but seems quite straightforward.
No changes on the ARM side at all so far.

Thanks!
--renato

PS: Nice to see you're using X86 like ARM, not the other way around... :)

via llvm-dev

unread,
Mar 18, 2016, 6:03:56 AM3/18/16
to Tian, Xinmin, David Majnemer, Richard Smith, llvm...@lists.llvm.org, llvm-dev...@lists.llvm.org, Clang Dev
Pinging David and Richard!

Yours,
Andrey

> 3 марта 2016 г., в 11:27, Andrey Bokhanko <andreyb...@gmail.com> написал(а):

Reply all
Reply to author
Forward
0 new messages