[llvm-dev] ABI attributes on arguments vs parameters

166 views
Skip to first unread message

Arthur Eubanks via llvm-dev

unread,
Jun 22, 2021, 9:06:17 PM6/22/21
to llvm-dev
Currently ABI attributes are weirdly handled if an argument to a function is missing an ABI attribute that is present on the called function's corresponding parameter. e.g.

declare void @f(i32* byval(i32))
define void @g() {
 %a = alloca i32
 call void @f(i32* %a) ; missing the byval(i32) attribute
 ret void
}

CallBase::isByValArgument(unsigned ArgNo) forwards to CallBase::paramHasAttr(), which first checks the argument attributes, then if the call is a direct call, checks the called function's parameter attributes. The existing implementation of CallBase::paramHasAttr() makes sense for optimization attributes like nocapture, but doesn't really make sense for ABI attributes like byval. It's weird that lowering a call may be different depending on whether or not the call is direct.

I attempted to only have lowering inspect the ABI attributes on the argument and not look through at a potentially direct callee, but there were cases where LLVM was generating direct calls to functions with ABI attributes but without properly setting the ABI attributes on the arguments. I fixed a couple of these but ended up reverting everything since it was still unclear if we wanted to go in this direction.

Should we go down the path of ignoring ABI attributes on direct callees and only looking at the attributes on the arguments? And in that case we may generate code that crashes at runtime if ABI attributes don't properly match. Otherwise we should document the existing behavior in the LangRef. The LangRef only mandates that ABI attributes match on a musttail call.

David Blaikie via llvm-dev

unread,
Jun 23, 2021, 2:01:18 AM6/23/21
to Arthur Eubanks, Reid Kleckner, llvm-dev
Might be worth CC'ing any already interested parties from previous discussions & linking to those threads (& maybe linking to this thread from those ones).

It does seem pretty questionable that behavior changes if a function becomes indirect - do you have any rough idea of how deep the rabbit hole goes if we were to try to finish the work you started of not looking at the callee to determine these attributes?

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Madhur Amilkanthwar via llvm-dev

unread,
Jun 23, 2021, 4:31:06 AM6/23/21
to David Blaikie, Matthew....@amd.com, llvm-dev
Hi Arthur,
As of today, which among parameters and arguments are used for codegen when there is a mismatch between ABI attributes? I don't think we have a thorough documentation for that.

My2c:
IMHO, the codegen should honor the argument attributes rather than parameter ones as it would work for indirect calls too. For indirect calls to work seamlessly, not inspecting any parameter attributes should be the way forward. Mismatch on ABI attributes should be UB ideally, but we could have a compatibility matrix between ABI attributes so that caller can set compatible argument attributes rather than strictly adhering to parameter attributes. (Or could even have a call-site mechanism to override)

Nevertheless, we should document this in LangRef once we have consensus.



--
Disclaimer: Views, concerns, thoughts, questions, ideas expressed in this mail are of my own and my employer has no take in it. 
Thank You.
Madhur D. Amilkanthwar

Reid Kleckner via llvm-dev

unread,
Jun 23, 2021, 4:01:12 PM6/23/21
to Arthur Eubanks, llvm-dev
I think the trouble we are having comes from the fact that we don't make it easy for IR producers (frontends or instrumentation passes) to do the right thing. If we make it easy (almost automatic), then maybe it will be more reasonable to declare in LangRef that function prototype mismatch is in fact UB.

Another developer suggested that the IRBuilder should take responsibility for setting the attributes on the call site, but I don't like the idea of IRBuilder::CreateCall inspecting the Callee argument with dyn_cast to decide its behavior.

This might be an infeasibly difficult ABI breaking change, but what if we had two overloads, one for direct calls, and one for indirect calls? Maybe we already have that in the current set of overloads:
  CallInst *CreateCall(FunctionType *FTy, Value *Callee,
                       ArrayRef<Value *> Args = None, const Twine &Name = "",
                       MDNode *FPMathTag = nullptr);
  CallInst *CreateCall(FunctionType *FTy, Value *Callee, ArrayRef<Value *> Args,
                       ArrayRef<OperandBundleDef> OpBundles,
                       const Twine &Name = "", MDNode *FPMathTag = nullptr);
  CallInst *CreateCall(FunctionCallee Callee, ArrayRef<Value *> Args = None,
                       const Twine &Name = "", MDNode *FPMathTag = nullptr);
  CallInst *CreateCall(FunctionCallee Callee, ArrayRef<Value *> Args,
                       ArrayRef<OperandBundleDef> OpBundles,
                       const Twine &Name = "", MDNode *FPMathTag = nullptr);

Assuming the vast majority of callers use the FunctionCallee overload, which is implicitly constructible from a Function*, we could extend FunctionCallee to carry the calling convention and an attribute list. The CC and the attributes are actually part of clang::FunctionType, they just aren't part of llvm::FunctionType. FunctionCallee is sort of incomplete without them.

So, the task would be to add non-optional parameters to the FunctionType/Value overload (used for indirect calls) and then fix up all the call sites. Frontends that generate indirect calls would observe this as build breakage, not changes to generated code. Instrumentation passes would magically start doing the right thing without any source changes. This would also address the long-standing usability issue of calling conventions [1], which are optimized to unreachable if you forget to set the calling convention on the call site.


---

Having written this up, this seems like a good direction. Credit goes to the person that originally suggested the IRBuilder approach, I just had to think harder about it. :)

On Tue, Jun 22, 2021 at 6:06 PM Arthur Eubanks via llvm-dev <llvm...@lists.llvm.org> wrote:

Nicolai Hähnle via llvm-dev

unread,
Jun 24, 2021, 12:59:58 PM6/24/21
to Reid Kleckner, llvm-dev
This may be a silly question, but... why aren't all ABI-relevant facts about functions encoded in the function type? If you zoom back out from the problem sufficiently far, it seems that today, some ABI-relevant facts are part of the pointer type (argument & return types) while others aren't. It seems to me that a saner design would have everything in the same place. Which means one of:

1. All ABI-relevant information, including calling convention and byval, is part of the function type.

2. Function types become "opaque" (with opaque pointers, perhaps FunctionType is eliminated entirely?) and all relevant information is stored with the call instruction, independent of direct vs. indirect calls.

Cheers,
Nicolai
--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

David Blaikie via llvm-dev

unread,
Jun 24, 2021, 1:31:09 PM6/24/21
to Nicolai Hähnle, llvm-dev
On Thu, Jun 24, 2021 at 10:00 AM Nicolai Hähnle via llvm-dev <llvm...@lists.llvm.org> wrote:
This may be a silly question, but... why aren't all ABI-relevant facts about functions encoded in the function type? If you zoom back out from the problem sufficiently far, it seems that today, some ABI-relevant facts are part of the pointer type (argument & return types) while others aren't. It seems to me that a saner design would have everything in the same place. Which means one of:

1. All ABI-relevant information, including calling convention and byval, is part of the function type.

2. Function types become "opaque" (with opaque pointers, perhaps FunctionType is eliminated entirely?) and all relevant information is stored with the call instruction, independent of direct vs. indirect calls.

I guess it'll amount to the same thing with opaque pointers - the only places that'll encode the ABI will be a function itself and a call. Whether they encode it via a function type or not will be a smaller detail compared to now where pointers can be function typed, etc.

It might be a hassle to push these things into the function type (deal with adding more bitcasts, etc, for old code) now - but might be a reasonable cleanup/refactor once opaque pointers have been fully implemented.
 

Reid Kleckner via llvm-dev

unread,
Jun 24, 2021, 1:50:42 PM6/24/21
to Nicolai Hähnle, llvm-dev
On Thu, Jun 24, 2021 at 9:59 AM Nicolai Hähnle <nhae...@gmail.com> wrote:
This may be a silly question, but... why aren't all ABI-relevant facts about functions encoded in the function type? If you zoom back out from the problem sufficiently far, it seems that today, some ABI-relevant facts are part of the pointer type (argument & return types) while others aren't. It seems to me that a saner design would have everything in the same place. Which means one of:

1. All ABI-relevant information, including calling convention and byval, is part of the function type.

When I started working on LLVM, I was also surprised that the calling convention is not part of the function type. The rationale at the time was that LLVM value types are immutable: they can only be changed by creating a new value and doing RAUW. Keeping these ABI details out of the type allows passes to update them. See the way GlobalOpt applies fastcc to all non-address-taken internal functions. It just calls `setCallingConvention`.

However, we have many other passes that change FunctionTypes with the complex RAUW operation, and keeping a parallel list of attributes is actually a challenge for them. See the way DAE and ArgPromotion build new function prototypes, create a new Function, and update all call sites.
 
2. Function types become "opaque" (with opaque pointers, perhaps FunctionType is eliminated entirely?) and all relevant information is stored with the call instruction, independent of direct vs. indirect calls.

I believe that, today, call instructions carry the FunctionType that will be used to lower the call. If you use IRBuilder to build a call with a Function*, the type is implicitly taken from the Function. The FunctionCallee is implicitly constructed, and that overload is used. If you pass in a Value* (so, maybe an indirect call), the caller must supply a FunctionType. I'm proposing that we build on that distinction and require callers that pass a Value* to also pass the other prototype information (CC and attrs).

I think your idea makes sense, but for practical reasons, I think it makes sense to introduce something new, tentatively named a FunctionPrototype, that does not inherit from llvm::Type. Not being a Type allows it to be mutable. It should carry everything needed for argument lowering: CC, ABI attrs, and return and argument types. Functions and Calls would carry a FunctionPrototype, and they should match.

The goal is to make FunctionPrototype mismatch become immediate UB (-> unreachable), just like mismatched CCs are today. However, we are not ready for that, and we may always want to tolerate things like incompatible `zeroext/signext` prototypes to permit minor C prototype mismatches between `int` and `unsigned int` when only positive integer arguments are passed.

Once we have opaque pointers, I think that will greatly simplify argument promotion: changing the argument list will not require changing the type of the function. The passes still have to update all callers, but it should no longer require allocating a new Function.
Reply all
Reply to author
Forward
0 new messages