[llvm-dev] Opaque Pointers Help Wanted

Arthur Eubanks via llvm-dev

unread,

Jun 21, 2021, 12:03:38 PM6/21/21

to llvm-dev

For the opaque pointers project, https://llvm.org/docs/OpaquePointers.html#transition-plan contains high level steps for what to do before we can enable opaque pointers. (Looks like the page hasn't been rebuilt in a while, https://github.com/llvm/llvm-project/blob/main/llvm/docs/OpaquePointers.rst#transition-plan contains some more concrete steps)

Essentially almost all of the work boils down to figuring out how to remove calls to `PointerType::getElementType()` and `Type::getPointerElementType()`. Everything else derives from that.

Any help with this is welcome and appreciated!

Craig Topper via llvm-dev

unread,

Jun 21, 2021, 12:35:59 PM6/21/21

to Arthur Eubanks, llvm-dev

If I remember correctly, these 3 intrinsics need to be redesigned as they use a type overload on the pointer argument to recover a type. See code in BPFAbstractMemberAccess::IsPreserveDIAccessIndexCall.

def int_preserve_array_access_index : DefaultAttrsIntrinsic<[llvm_anyptr_ty],
[llvm_anyptr_ty, llvm_i32_ty,
llvm_i32_ty],
[IntrNoMem,
ImmArg<ArgIndex<1>>,
ImmArg<ArgIndex<2>>]>;
def int_preserve_union_access_index : DefaultAttrsIntrinsic<[llvm_anyptr_ty],
[llvm_anyptr_ty, llvm_i32_ty],
[IntrNoMem,
ImmArg<ArgIndex<1>>]>;
def int_preserve_struct_access_index : DefaultAttrsIntrinsic<[llvm_anyptr_ty],
[llvm_anyptr_ty, llvm_i32_ty,
llvm_i32_ty],
[IntrNoMem,
ImmArg<ArgIndex<1>>,
ImmArg<ArgIndex<2>>]>;

~Craig

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Kaylor, Andrew via llvm-dev

unread,

Jun 21, 2021, 5:27:57 PM6/21/21

to Arthur Eubanks, llvm...@lists.llvm.org

This has probably been discussed somewhere, but I missed it. Can you elaborate a bit on this?

Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer type (not to be turned on until ready)

To support legacy bitcode, such as legacy stores/loads, we need to track pointee types for all values since legacy instructions may infer the types from a pointer operand's pointee type

I‘m specifically trying to understand what will happen when typed pointer support is removed. How will IR with typed pointers be auto-upgraded to pure opaque pointer IR? Will the bitcode reader keep some level of typed pointer support indefinitely?

Also, do you have a plan for replacing intrinsics that currently rely on pointee types? For example, the load instruction was updated to take an explicit type operand but I don’t think we can do the same thing for an intrinsic like llvm.masked.load since there is Value for Type. This is an easy problem to work around for something like masked.load, but more complicated if anyone has a downstream GEP-like intrinsic that needs more than the size of an element (spoiler alert: I do have such an intrinsic). Would you use a metadata argument?

Thanks,

Andy

Arthur Eubanks via llvm-dev

unread,

Jun 21, 2021, 5:59:52 PM6/21/21

to Craig Topper, llvm-dev

Yes that's an issue, I commented on https://reviews.llvm.org/D61524.

One random hacky idea I had was to add an overloaded parameter that we always pass undef/poison to but is of the type we care about. There's probably a better way though.

David Blaikie via llvm-dev

unread,

Jun 21, 2021, 6:00:47 PM6/21/21

to Kaylor, Andrew, craig....@gmail.com, llvm...@lists.llvm.org

It looks like Craig's pointed out some cases that come up in LLVM. Perhaps you could try fixing one of the in-tree examples to help explore/set the direction for your out of tree work?

Metadata arguments sound like a direction worth exploring. (& would be consistent with the way parameter attributes have been approached - still using a type argument even though the length is all that matters in those cases)

_______________________________________________

Nicolai Hähnle via llvm-dev

unread,

Jun 22, 2021, 3:07:50 AM6/22/21

to Arthur Eubanks, llvm-dev

Hi Arthur,

What's the status of the C API? I don't see any mention of it in the transition plan.

Cheers,

Nicolai

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Arthur Eubanks via llvm-dev

unread,

Jun 22, 2021, 12:01:13 PM6/22/21

to Nicolai Hähnle, llvm-dev

I haven't looked at the C API closely, but of course we will need to make sure the C API is covered. Looks like there's already been some work, e.g. https://reviews.llvm.org/D56559, but there may be more to do. The opaque pointers work left to do list isn't exhaustive, it's mostly a list of random things that I've thought of/noticed and deemed worthy to put on a list. But anybody can go ahead and update the list. The important part is that we end up removing all calls to get a pointer type's pointee type.

Arthur Eubanks via llvm-dev

unread,

Jun 22, 2021, 12:15:36 PM6/22/21

to Kaylor, Andrew, llvm...@lists.llvm.org

On Mon, Jun 21, 2021 at 3:27 PM Kaylor, Andrew <andrew...@intel.com> wrote:

This has probably been discussed somewhere, but I missed it. Can you elaborate a bit on this?

Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer type (not to be turned on until ready)

To support legacy bitcode, such as legacy stores/loads, we need to track pointee types for all values since legacy instructions may infer the types from a pointer operand's pointee type

I‘m specifically trying to understand what will happen when typed pointer support is removed. How will IR with typed pointers be auto-upgraded to pure opaque pointer IR? Will the bitcode reader keep some level of typed pointer support indefinitely?

Yes, the plan is something along the lines of associating each Value with a possible pointee type inside the bitcode reader.

Also, do you have a plan for replacing intrinsics that currently rely on pointee types? For example, the load instruction was updated to take an explicit type operand but I don’t think we can do the same thing for an intrinsic like llvm.masked.load since there is Value for Type. This is an easy problem to work around for something like masked.load, but more complicated if anyone has a downstream GEP-like intrinsic that needs more than the size of an element (spoiler alert: I do have such an intrinsic). Would you use a metadata argument?

I don't think metadata can reference an LLVM type. My previous hacky suggestion was to add a new overloaded parameter with the type you want and pass undef/poison to it. In any case, we'll have to find a way to fix these sorts of intrinsics, we shouldn't block the opaque pointers project on some intrinsics.

LLVM already (mostly) treats memory as untyped, what is your intrinsic attempting to do?

Kaylor, Andrew via llvm-dev

unread,

Jun 22, 2021, 12:16:49 PM6/22/21

to David Blaikie, craig....@gmail.com, llvm...@lists.llvm.org

Honestly, I didn’t (and still don’t) understand enough about the intrinsics Craig mentioned to even be sure they were analogous to the case I brought up. It seemed like they might be more interested in debug types than IR types and may or may not be going down the same paths of reasoning, but I just don’t understand what the intrinsics are doing.

We’re scrambling to get our out-of-tree code caught up with the opaque pointer transition. I think the scatter/gather intrinsics are likely to be our first useful overlap with the in-tree work to be done, though now that I’ve thought about it a bit more, I think the type in that can just be inferred from non-pointer arguments. In any case, I guess talking about it and agreeing on an approach is a good first step.

I started trying to mock up some IR for how this might work, but I wasn’t entirely happy with it. Specifically, how do you describe an IR type in metadata. We could use an undef value in the metadata, similar to what Arthur proposed as an argument, but that only moves the ugliness. This is kind of what I was thinking

define void @f() {

ptr %p = <however we got %p>

call void @llvm.some.intrinsic(ptr %p, metadata !1)

<…>

return void

}

define void @llvm.some.intrinsic(ptr, metadata)

%MyElementTy = type { <whatever> }

!1 = !{ %MyElementTy undef }

I guess that would work, but I don’t really like it. I looked at the way TBAA and debug info describe types, but that’s not describing the IR type and so it seems simultaneously fragile and too verbose. Is there a better way to do this? What about introducing a parameter attribute that would be considered required for the intrinsic? Something like this:

define void @f() {

ptr %p = <however we got %p>

call void @llvm.some.intrinsic(ptr elementtype(%MyElementTy) %p)

<…>

return void

}

; Requires ‘elementype’ parameter attribute

define void @llvm.some.intrinsic(ptr)

%MyElementTy = type { <whatever> }

Does that feel like a step backward to you? I don’t know how hard it would be for front end’s to generate something like that, but it hasn’t the benefit that the elementtype would have intentional and well-defined semantic meaning.

Thanks,

Andy

Arthur Eubanks via llvm-dev

unread,

Jun 22, 2021, 12:21:45 PM6/22/21

to Kaylor, Andrew, llvm...@lists.llvm.org

Oh yes, an attribute type definitely makes much more sense, I like that. It's essentially the solution used for things like byval pointers.

As for the timeline, we'll have to support mixed opaque pointers and legacy pointers for a while, we don't want out of tree users to immediately break as soon as the opaque pointers work is finished up in-tree.

Nikita Popov via llvm-dev

unread,

Jun 22, 2021, 12:23:38 PM6/22/21

to Kaylor, Andrew, llvm...@lists.llvm.org

On Tue, Jun 22, 2021 at 6:16 PM Kaylor, Andrew via llvm-dev <llvm...@lists.llvm.org> wrote:

Honestly, I didn’t (and still don’t) understand enough about the intrinsics Craig mentioned to even be sure they were analogous to the case I brought up. It seemed like they might be more interested in debug types than IR types and may or may not be going down the same paths of reasoning, but I just don’t understand what the intrinsics are doing.

We’re scrambling to get our out-of-tree code caught up with the opaque pointer transition. I think the scatter/gather intrinsics are likely to be our first useful overlap with the in-tree work to be done, though now that I’ve thought about it a bit more, I think the type in that can just be inferred from non-pointer arguments. In any case, I guess talking about it and agreeing on an approach is a good first step.

I started trying to mock up some IR for how this might work, but I wasn’t entirely happy with it. Specifically, how do you describe an IR type in metadata. We could use an undef value in the metadata, similar to what Arthur proposed as an argument, but that only moves the ugliness. This is kind of what I was thinking

define void @f() {

ptr %p = <however we got %p>

call void @llvm.some.intrinsic(ptr %p, metadata !1)

<…>

return void

}

define void @llvm.some.intrinsic(ptr, metadata)

%MyElementTy = type { <whatever> }

!1 = !{ %MyElementTy undef }

I guess that would work, but I don’t really like it. I looked at the way TBAA and debug info describe types, but that’s not describing the IR type and so it seems simultaneously fragile and too verbose. Is there a better way to do this? What about introducing a parameter attribute that would be considered required for the intrinsic? Something like this:

define void @f() {

ptr %p = <however we got %p>

call void @llvm.some.intrinsic(ptr elementtype(%MyElementTy) %p)

<…>

return void

}

; Requires ‘elementype’ parameter attribute

define void @llvm.some.intrinsic(ptr)

%MyElementTy = type { <whatever> }

Does that feel like a step backward to you? I don’t know how hard it would be for front end’s to generate something like that, but it hasn’t the benefit that the elementtype would have intentional and well-defined semantic meaning.

Maybe the byref attribute is suitable for this? That is @llvm.some.intrinsic(ptr byref(%ElemTy) %p).

I believe byref() doesn't carry any particular semantics apart from providing the element type. It's currently used to carry ABI information in the presence of opaque pointers (for AMDGPU kernels), but I think it would be suitable for use with intrinsics as well.

Regards,

Nikita

Kaylor, Andrew via llvm-dev

unread,

Jun 22, 2021, 12:39:00 PM6/22/21

to Arthur Eubanks, llvm...@lists.llvm.org

> I don't think metadata can reference an LLVM type. My previous hacky suggestion was to add a new overloaded parameter with the type you want and pass undef/poison to it. In any case, we'll have to find a way to fix these sorts of intrinsics, we shouldn't block the opaque pointers project on some intrinsics.

It looks like I just missed your response before getting my response to David in flight. Yes, I see the metadata problem. I don’t like the undef/poison approach, but I agree that it would technically work.

I completely agree that we should solve this problem rather than block the opaque pointer project because of it. Nevertheless, we’ll need to solve the problem.

> LLVM already (mostly) treats memory as untyped, what is your intrinsic attempting to do?

It’s kind of like a GEP but more specialized. It’s doing a pointer calculation based on multidimensional composite types with a shape that is known at runtime but unknown at compile time, and we have a front end that’s able to supply enough info to make this useful.

> As for the timeline, we'll have to support mixed opaque pointers and legacy pointers for a while, we don't want out of tree users to immediately break as soon as the opaque pointers work is finished up in-tree.

Is there any consensus on the scope of “for a while”? Like, how many major releases after the opaque pointer work is complete? My manager would very much like to know the answer to this question. :-) I’ve been trying to prepare for the buffer being as little as one major release after the in-tree work is done but hoping for more. I expect there are others with a similar interest.

From: Arthur Eubanks <aeub...@google.com>
Sent: Tuesday, June 22, 2021 12:15 PM
To: Kaylor, Andrew <andrew...@intel.com>

Kaylor, Andrew via llvm-dev

unread,

Jun 22, 2021, 12:56:36 PM6/22/21

to Nikita Popov, llvm...@lists.llvm.org

> Maybe the byref attribute is suitable for this? That is @llvm.some.intrinsic(ptr byref(%ElemTy) %p).

I’m always hesitant to repurpose existing attributes, but byref does seem very close. The line in the LangRef saying explicitly that it is intended for ABI constraints and shouldn’t be used for optimization worries me.

Also, byref assumes the pointer is pointing to a singe element. For a GEP-like intrinsic, that’s not what I’d want. It would take some time to craft a precise definition, but I was thinking something like “Within the scope of this function call, the pointer may be assumed to point to zero or more elements of the specified type. This attribute does not guarantee the amount of dereferenceable memory, but whatever memory is dereferenceable may be assumed to be in increments of the size of the element type.” It may need to say more than that.

David Blaikie via llvm-dev

unread,

Jun 22, 2021, 1:12:03 PM6/22/21

to Kaylor, Andrew, llvm...@lists.llvm.org

On Tue, Jun 22, 2021 at 9:39 AM Kaylor, Andrew via llvm-dev <llvm...@lists.llvm.org> wrote:

> I don't think metadata can reference an LLVM type. My previous hacky suggestion was to add a new overloaded parameter with the type you want and pass undef/poison to it. In any case, we'll have to find a way to fix these sorts of intrinsics, we shouldn't block the opaque pointers project on some intrinsics.

It looks like I just missed your response before getting my response to David in flight. Yes, I see the metadata problem. I don’t like the undef/poison approach, but I agree that it would technically work.

I completely agree that we should solve this problem rather than block the opaque pointer project because of it. Nevertheless, we’ll need to solve the problem.

> LLVM already (mostly) treats memory as untyped, what is your intrinsic attempting to do?

It’s kind of like a GEP but more specialized. It’s doing a pointer calculation based on multidimensional composite types with a shape that is known at runtime but unknown at compile time, and we have a front end that’s able to supply enough info to make this useful.

> As for the timeline, we'll have to support mixed opaque pointers and legacy pointers for a while, we don't want out of tree users to immediately break as soon as the opaque pointers work is finished up in-tree.

Is there any consensus on the scope of “for a while”? Like, how many major releases after the opaque pointer work is complete? My manager would very much like to know the answer to this question. :-) I’ve been trying to prepare for the buffer being as little as one major release after the in-tree work is done but hoping for more. I expect there are others with a similar interest.

By default I'd vote for one major release - but likely the ongoing burden of carrying it for a bit longer if some significant contributors would benefit from extra time it's probably going to be pretty harmless to keep it around a bit longer, I think?

From: Arthur Eubanks <aeub...@google.com>
Sent: Tuesday, June 22, 2021 12:15 PM
To: Kaylor, Andrew <andrew...@intel.com>
Cc: llvm...@lists.llvm.org
Subject: Re: [llvm-dev] Opaque Pointers Help Wanted

On Mon, Jun 21, 2021 at 3:27 PM Kaylor, Andrew <andrew...@intel.com> wrote:

This has probably been discussed somewhere, but I missed it. Can you elaborate a bit on this?

Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer type (not to be turned on until ready)

To support legacy bitcode, such as legacy stores/loads, we need to track pointee types for all values since legacy instructions may infer the types from a pointer operand's pointee type

I‘m specifically trying to understand what will happen when typed pointer support is removed. How will IR with typed pointers be auto-upgraded to pure opaque pointer IR? Will the bitcode reader keep some level of typed pointer support indefinitely?

Yes, the plan is something along the lines of associating each Value with a possible pointee type inside the bitcode reader.

Also, do you have a plan for replacing intrinsics that currently rely on pointee types? For example, the load instruction was updated to take an explicit type operand but I don’t think we can do the same thing for an intrinsic like llvm.masked.load since there is Value for Type. This is an easy problem to work around for something like masked.load, but more complicated if anyone has a downstream GEP-like intrinsic that needs more than the size of an element (spoiler alert: I do have such an intrinsic). Would you use a metadata argument?

I don't think metadata can reference an LLVM type. My previous hacky suggestion was to add a new overloaded parameter with the type you want and pass undef/poison to it. In any case, we'll have to find a way to fix these sorts of intrinsics, we shouldn't block the opaque pointers project on some intrinsics.

LLVM already (mostly) treats memory as untyped, what is your intrinsic attempting to do?

_______________________________________________

Nikita Popov via llvm-dev

unread,

Jun 30, 2021, 2:03:04 PM6/30/21

to David Blaikie, llvm...@lists.llvm.org

On Tue, Jun 22, 2021 at 7:12 PM David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:

On Tue, Jun 22, 2021 at 9:39 AM Kaylor, Andrew via llvm-dev <llvm...@lists.llvm.org> wrote:

> I don't think metadata can reference an LLVM type. My previous hacky suggestion was to add a new overloaded parameter with the type you want and pass undef/poison to it. In any case, we'll have to find a way to fix these sorts of intrinsics, we shouldn't block the opaque pointers project on some intrinsics.

It looks like I just missed your response before getting my response to David in flight. Yes, I see the metadata problem. I don’t like the undef/poison approach, but I agree that it would technically work.

I completely agree that we should solve this problem rather than block the opaque pointer project because of it. Nevertheless, we’ll need to solve the problem.

> LLVM already (mostly) treats memory as untyped, what is your intrinsic attempting to do?

It’s kind of like a GEP but more specialized. It’s doing a pointer calculation based on multidimensional composite types with a shape that is known at runtime but unknown at compile time, and we have a front end that’s able to supply enough info to make this useful.

> As for the timeline, we'll have to support mixed opaque pointers and legacy pointers for a while, we don't want out of tree users to immediately break as soon as the opaque pointers work is finished up in-tree.

Is there any consensus on the scope of “for a while”? Like, how many major releases after the opaque pointer work is complete? My manager would very much like to know the answer to this question. :-) I’ve been trying to prepare for the buffer being as little as one major release after the in-tree work is done but hoping for more. I expect there are others with a similar interest.

By default I'd vote for one major release - but likely the ongoing burden of carrying it for a bit longer if some significant contributors would benefit from extra time it's probably going to be pretty harmless to keep it around a bit longer, I think?

Opaque pointers are not like the pass manager switch, where we can retain support for the legacy pass manager at close to zero cost. Nearly all tests work the same on both pass managers, so there is little ongoing maintenance cost.

This is not going to be the case with opaque pointers. Doing the switch will require changes to nearly the whole test suite. This means that either typed pointers will end up being entirely untested (or very weakly tested), or we need to duplicate a very large fraction of our tests.

My expectation is that after the opaque pointer migration is complete, the time the non-opaque pointer mode remains available will be measured in days, not years.

Regards,

Nikita

via llvm-dev

unread,

Jul 1, 2021, 10:39:39 AM7/1/21

to nikit...@gmail.com, dbla...@gmail.com, llvm...@lists.llvm.org

| My expectation is that after the opaque pointer migration is complete, the time the non-opaque pointer mode remains available will be measured in days, not years.

That can be true for textual IR, but we need to retain backward compatibility (auto-upgrade) for bitcode for much longer. I don’t have a clear idea of what’s involved (haven’t really been tracking this project) but many users/scenarios require bitcode compatibility for long periods. I don’t think we’ve had a true format break since about 4.0?

--paulr

Nikita Popov via llvm-dev

unread,

Jul 1, 2021, 11:00:38 AM7/1/21

to paul.r...@sony.com, llvm-dev

On Thu, Jul 1, 2021 at 4:39 PM <paul.r...@sony.com> wrote:

| My expectation is that after the opaque pointer migration is complete, the time the non-opaque pointer mode remains available will be measured in days, not years.

That can be true for textual IR, but we need to retain backward compatibility (auto-upgrade) for bitcode for much longer. I don’t have a clear idea of what’s involved (haven’t really been tracking this project) but many users/scenarios require bitcode compatibility for long periods. I don’t think we’ve had a true format break since about 4.0?

--paulr

Sure, we definitely need to retain bitcode compatibility (i.e. auto-upgrade to opaque pointers).

My comment was targeted at maintainers of out-of-free frontends and targets, who shouldn't assume there will be a long-term period of dual compatibility, like there was/is with the new pass manager. If you don't want to be stuck on an old toolchain, you need to be ready by the time the switch happens.

Regards,

Nikita

David Blaikie via llvm-dev

unread,

Jul 1, 2021, 1:44:07 PM7/1/21

to Nikita Popov, llvm-dev

On Thu, Jul 1, 2021 at 8:00 AM Nikita Popov <nikit...@gmail.com> wrote:

On Thu, Jul 1, 2021 at 4:39 PM <paul.r...@sony.com> wrote:

| My expectation is that after the opaque pointer migration is complete, the time the non-opaque pointer mode remains available will be measured in days, not years.

That can be true for textual IR, but we need to retain backward compatibility (auto-upgrade) for bitcode for much longer. I don’t have a clear idea of what’s involved (haven’t really been tracking this project) but many users/scenarios require bitcode compatibility for long periods. I don’t think we’ve had a true format break since about 4.0?

--paulr

Sure, we definitely need to retain bitcode compatibility (i.e. auto-upgrade to opaque pointers).

My comment was targeted at maintainers of out-of-free frontends and targets, who shouldn't assume there will be a long-term period of dual compatibility, like there was/is with the new pass manager. If you don't want to be stuck on an old toolchain, you need to be ready by the time the switch happens.

Correct me if I'm wrong, but to clarify, I think: LLVM IR API likely won't have a long period of overlap supporting both opaque and non-opaque pointers and especially the ability to mix/match in IR may never exist (hard cutover) - as discussed with things like intrinsics recently, we'll likely have to have a hard cutover to opaque pointers. Textual IR may be able to be auto-upgraded for a time, and Bitcode certainly will be - so, in the worst case, if a downstream project needed to pickup a newer version of LLVM but didn't have time to invest it migrating their IR generation - maybe they could get away with linking an older LLVM into their frontend, then serializing roundtrip via bitcode back into a process with the newer LLVM that would convert to opaque pointers on loading.

Is that about right?

- Dave

Nikita Popov via llvm-dev

unread,

Jul 1, 2021, 2:06:51 PM7/1/21

to David Blaikie, llvm-dev

On Thu, Jul 1, 2021 at 7:43 PM David Blaikie <dbla...@gmail.com> wrote:

On Thu, Jul 1, 2021 at 8:00 AM Nikita Popov <nikit...@gmail.com> wrote:
On Thu, Jul 1, 2021 at 4:39 PM <paul.r...@sony.com> wrote:

| My expectation is that after the opaque pointer migration is complete, the time the non-opaque pointer mode remains available will be measured in days, not years.

That can be true for textual IR, but we need to retain backward compatibility (auto-upgrade) for bitcode for much longer. I don’t have a clear idea of what’s involved (haven’t really been tracking this project) but many users/scenarios require bitcode compatibility for long periods. I don’t think we’ve had a true format break since about 4.0?

--paulr

Sure, we definitely need to retain bitcode compatibility (i.e. auto-upgrade to opaque pointers).

My comment was targeted at maintainers of out-of-free frontends and targets, who shouldn't assume there will be a long-term period of dual compatibility, like there was/is with the new pass manager. If you don't want to be stuck on an old toolchain, you need to be ready by the time the switch happens.

Correct me if I'm wrong, but to clarify, I think: LLVM IR API likely won't have a long period of overlap supporting both opaque and non-opaque pointers and especially the ability to mix/match in IR may never exist (hard cutover) - as discussed with things like intrinsics recently, we'll likely have to have a hard cutover to opaque pointers. Textual IR may be able to be auto-upgraded for a time, and Bitcode certainly will be - so, in the worst case, if a downstream project needed to pickup a newer version of LLVM but didn't have time to invest it migrating their IR generation - maybe they could get away with linking an older LLVM into their frontend, then serializing roundtrip via bitcode back into a process with the newer LLVM that would convert to opaque pointers on loading.

Is that about right?

Yeah, I believe that should work.

Kaylor, Andrew via llvm-dev

unread,

Jul 1, 2021, 5:49:03 PM7/1/21

to Nikita Popov, llvm...@lists.llvm.org

> My comment was targeted at maintainers of out-of-free frontends and targets, who shouldn't assume there will be a long-term period of dual compatibility, like there was/is with the new pass manager. If you don't want to be stuck on an old toolchain, you need to be ready by the time the switch happens.

I have serious concerns about this. I understand why you want to limit the time non-opaque pointers are supported, but having a sharp cutoff feels somewhat hostile to those who have a significant amount of out-of-tree code.

I know the opaque pointer work has been well-publicized and ongoing for quite some time, but that’s part of the problem. A lot of work has gone into making opaque pointers work, and a corresponding amount if work may be required for out-of-tree projects. My company, for example, has been following the work in LLVM and working on our own updates, but we’ve been allocating resources based on the pace of the open source work, which until recently has been fairly slow. We are adjusting our plans in accordance with the new push, but it is going to take us some time to catch up.

At a minimum, I think we need to agree upon a schedule for the transition so people can plan their work. Saying that we’re going to cut off support for a fundamental feature like this “when the migration is done” makes planning impossible. Beyond that, I would vote for keeping the duplicate tests around for some period of time.

-Andy

David Blaikie via llvm-dev

unread,

Jul 2, 2021, 4:53:09 PM7/2/21

to Kaylor, Andrew, llvm...@lists.llvm.org

On Thu, Jul 1, 2021 at 2:49 PM Kaylor, Andrew via llvm-dev <llvm...@lists.llvm.org> wrote:

> My comment was targeted at maintainers of out-of-free frontends and targets, who shouldn't assume there will be a long-term period of dual compatibility, like there was/is with the new pass manager. If you don't want to be stuck on an old toolchain, you need to be ready by the time the switch happens.

I have serious concerns about this. I understand why you want to limit the time non-opaque pointers are supported, but having a sharp cutoff feels somewhat hostile to those who have a significant amount of out-of-tree code.

I know the opaque pointer work has been well-publicized and ongoing for quite some time, but that’s part of the problem. A lot of work has gone into making opaque pointers work, and a corresponding amount if work may be required for out-of-tree projects. My company, for example, has been following the work in LLVM and working on our own updates, but we’ve been allocating resources based on the pace of the open source work, which until recently has been fairly slow. We are adjusting our plans in accordance with the new push, but it is going to take us some time to catch up.

At a minimum, I think we need to agree upon a schedule for the transition so people can plan their work. Saying that we’re going to cut off support for a fundamental feature like this “when the migration is done” makes planning impossible. Beyond that, I would vote for keeping the duplicate tests around for some period of time.

I think there's still some confusion about what "dual compatibility" means - some things are going to be a hard cutover (like intrinsics) - that doesn't mean it's going to be sprung on you - we'll have all the pieces in place for you to do as much prep as possible, and a healthy amount of warning (probably at least a release or so, I should think) - but then it'll be a hard cut.

We'll leave tests around for bitcode upgrade, for instance - because that must continue working. I'm not sure if it makes sense to leave tests covering other LLVM functionality (like transformation passes) using old IR (textual or bitcode) - if the actual in-memory IR that it's parsed into is the modern typeless pointer IR anyway.

It might be possible to support opaque pointers optionally and have IR passes do the right thing for either representation - but that may be significantly more work than is practical/suitable, I'm not sure (other folks doing more of the work right now might have a better feel for this). I can appreciate the desire for this, but I'm not sure it's feasible.

- Dave

Kaylor, Andrew via llvm-dev

unread,

Jul 6, 2021, 11:22:37 AM7/6/21

to David Blaikie, llvm...@lists.llvm.org

I understand the hard cutover for things like intrinsics and other API that are being marked as deprecated. I’m more concerned about the ability to continue generating (from clang or other front ends) IR with pseudo-typed pointers (by which I mean pointers as they have been and mostly are now) and bitcasts that our out-of-tree passes are currently depending on and having in-tree passes continue to be able to optimize that. I’d like to especially emphasize this latter point. If key in-tree passes suddenly stop optimizing code that contains non-opaque pointers and bitcasts, that’s effectively a hard cutover for the entire optimizer even if the IR is still technically legal.

I see the problems with doing this. I really do. Once the opaque pointer work is complete it will be possible to remove a lot of code that is only there to support a “dead” form of the IR, and obviously we’d like to do that as soon as we can. I guess what I’m asking is that the non-opaque pointer IR form be considered deprecated rather than unsupported in the optimizer for at least one release after the work is complete and the code to optimize it be kept around for that long. If we make improvements that don’t also improve performance with the old form, that’s fine. I just don’t want to lose what’s there now.

It would also be very helpful if we could make a reasonable estimate of when the opaque pointer work will be complete. I know we have a list of what’s left to be done, but it would be very helpful to me if those working on this could say when they would like to have it done and when they realistically think it will be done.

Finally, if anyone else shares my concerns it would be helpful to hear that, because I realize that just one LLVM contributor (by which I mean my company, not me as an individual) can only expect to have so much influence.

Thanks,

Andy

Krzysztof Parzyszek via llvm-dev

unread,

Jul 6, 2021, 11:36:00 AM7/6/21

to Kaylor, Andrew, David Blaikie, llvm-dev

It would also be very helpful if we could make a reasonable estimate of when the opaque pointer work will be complete. I know we have a list of what’s left to be done, but it would be very helpful to me if those working on this could say when they would like to have it done and when they realistically think it will be done.

Is this list posted somewhere?

--

Krzysztof Parzyszek kpar...@quicinc.com AI tools development

Neil Henning via llvm-dev

unread,

Jul 6, 2021, 11:49:19 AM7/6/21

to llvm-dev

Finally, if anyone else shares my concerns it would be helpful to hear that, because I realize that just one LLVM contributor (by which I mean my company, not me as an individual) can only expect to have so much influence.

Echoing Andy Kaylor's remarks - this will be initially painful for us to support in Unity's HPC# Burst compiler.

I guess what I’m asking is that the non-opaque pointer IR form be considered deprecated rather than unsupported in the optimizer for at least one release after the work is complete and the code to optimize it be kept around for that long.

For us it'd be great to at least have a single LLVM point release that has both opaque pointers and the existing typed-pointer support. That'd give us ~6 months to port over all our custom optimization passes and ensure compliance on the 25-ish platforms we support (EG. LLVM 13 has opaque pointers, LLVM 14 has only opaque pointers).

Cheers,

-Neil.

--

Neil Henning

Senior Software Engineer Compiler

unity.com

Nikita Popov via llvm-dev

unread,

Jul 6, 2021, 11:50:53 AM7/6/21

to Kaylor, Andrew, llvm...@lists.llvm.org

On Tue, Jul 6, 2021 at 5:22 PM Kaylor, Andrew <andrew...@intel.com> wrote:

I understand the hard cutover for things like intrinsics and other API that are being marked as deprecated. I’m more concerned about the ability to continue generating (from clang or other front ends) IR with pseudo-typed pointers (by which I mean pointers as they have been and mostly are now) and bitcasts that our out-of-tree passes are currently depending on and having in-tree passes continue to be able to optimize that. I’d like to especially emphasize this latter point. If key in-tree passes suddenly stop optimizing code that contains non-opaque pointers and bitcasts, that’s effectively a hard cutover for the entire optimizer even if the IR is still technically legal.

I see the problems with doing this. I really do. Once the opaque pointer work is complete it will be possible to remove a lot of code that is only there to support a “dead” form of the IR, and obviously we’d like to do that as soon as we can. I guess what I’m asking is that the non-opaque pointer IR form be considered deprecated rather than unsupported in the optimizer for at least one release after the work is complete and the code to optimize it be kept around for that long. If we make improvements that don’t also improve performance with the old form, that’s fine. I just don’t want to lose what’s there now.

Let me clarify how I envision this working:

1. We finalize support for opaque pointers in LLVM.

2. Now we're in a state where both typed pointers and opaque pointers are supported at the same time, but only typed pointers are heavily tested.

3. There is some period of time where we continue to default to typed pointers, while opaque pointers already fully work.

4. We switch to opaque pointers by default, and switch all tests to use opaque pointers.

5. Shortly afterwards, support for typed pointers is dropped.

I really don't think we should compromise on step 5. Or at least, the only form I would be willing to allow this is that we promise to not actively/intentionally break typed pointers, but all the testing and fixing burden is on a 3rd-party.

Where we do have some leeway is step 3, where we have opaque pointer support that is complete but not enabled by default yet. I think that with all relevant APIs deprecated, we can be reasonably confident that opaque pointers continue to work. Personally, I would like to keep this period as short as possible as well, but if we want to extend the transition, I think it should be here.

It would also be very helpful if we could make a reasonable estimate of when the opaque pointer work will be complete. I know we have a list of what’s left to be done, but it would be very helpful to me if those working on this could say when they would like to have it done and when they realistically think it will be done.

My current target would be to complete full opaque pointer support sometime before the LLVM 14 release, and do the switch shortly after LLVM 14 is branched. Take this estimate with a grain of salt though :)

Regards,

Nikita

Kaylor, Andrew via llvm-dev

unread,

Jul 6, 2021, 12:26:59 PM7/6/21

to Nikita Popov, llvm...@lists.llvm.org

> Or at least, the only form I would be willing to allow this is that we promise to not actively/intentionally break typed pointers, but all the testing and fixing burden is on a 3rd-party.

That’s fair, and I could probably live with that.

If I understand correctly, lagging out-of-tree projects would effectively continue existing in a state between steps 3 and 4, while the main source base moved on. That state between steps 3 and 4 seems a lot like where we were for a couple of years with the new and legacy pass manager -- the support was all there and many downstream projects were using it by default but we delayed changing the default in trunk until all issues were resolved.

Nikita Popov via llvm-dev

unread,

Sep 4, 2021, 4:39:03 AM9/4/21

to Kaylor, Andrew, llvm...@lists.llvm.org

On Tue, Jun 22, 2021 at 6:16 PM Kaylor, Andrew via llvm-dev <llvm...@lists.llvm.org> wrote:

Honestly, I didn’t (and still don’t) understand enough about the intrinsics Craig mentioned to even be sure they were analogous to the case I brought up. It seemed like they might be more interested in debug types than IR types and may or may not be going down the same paths of reasoning, but I just don’t understand what the intrinsics are doing.

We’re scrambling to get our out-of-tree code caught up with the opaque pointer transition. I think the scatter/gather intrinsics are likely to be our first useful overlap with the in-tree work to be done, though now that I’ve thought about it a bit more, I think the type in that can just be inferred from non-pointer arguments. In any case, I guess talking about it and agreeing on an approach is a good first step.

I started trying to mock up some IR for how this might work, but I wasn’t entirely happy with it. Specifically, how do you describe an IR type in metadata. We could use an undef value in the metadata, similar to what Arthur proposed as an argument, but that only moves the ugliness. This is kind of what I was thinking

define void @f() {

ptr %p = <however we got %p>

call void @llvm.some.intrinsic(ptr %p, metadata !1)

<…>

return void

}

define void @llvm.some.intrinsic(ptr, metadata)

%MyElementTy = type { <whatever> }

!1 = !{ %MyElementTy undef }

I guess that would work, but I don’t really like it. I looked at the way TBAA and debug info describe types, but that’s not describing the IR type and so it seems simultaneously fragile and too verbose. Is there a better way to do this? What about introducing a parameter attribute that would be considered required for the intrinsic? Something like this:

define void @f() {

ptr %p = <however we got %p>

call void @llvm.some.intrinsic(ptr elementtype(%MyElementTy) %p)

<…>

return void

}

; Requires ‘elementype’ parameter attribute

define void @llvm.some.intrinsic(ptr)

%MyElementTy = type { <whatever> }

Does that feel like a step backward to you? I don’t know how hard it would be for front end’s to generate something like that, but it hasn’t the benefit that the elementtype would have intentional and well-defined semantic meaning.

Thanks,

Andy

For the record, we did end up adding the elementtype() attribute for use by intrinsics in https://reviews.llvm.org/D105407 / https://reviews.llvm.org/D106008 and a sample usage for BPF intrinsics in https://reviews.llvm.org/D106184.

Would be great if you can confirm whether this works for your use-case as well!

Regards,

Nikita

James Dutton via llvm-dev

unread,

Sep 5, 2021, 6:26:41 PM9/5/21

to Kaylor, Andrew, llvm...@lists.llvm.org

Hi,

Which version of LLVM and CLANG do I need to test opaque pointers?

What command line options do I need to test opaque pointers?

When I say "test", I mean more like "experiment with".

I would like to use the LLVM IR Builder to create IR code that uses opaque pointers.

I would also like to use CLANG to create LLVM IR code that uses opaque pointers.

Kind Regards

James

Kaylor, Andrew via llvm-dev

unread,

Sep 7, 2021, 1:14:11 AM9/7/21

to Nikita Popov, llvm...@lists.llvm.org

Hi Nikita,

We saw the elementtype attribute changes and have been working on some patches to make use of it. This is what led us to the change in https://reviews.llvm.org/D108796, which I plan to commit for Chenyang tomorrow. Apart from that minor problem, it looks like it will work very well for us. Thanks!

-Andy

From: Nikita Popov <nikit...@gmail.com>
Sent: Saturday, September 04, 2021 1:39 AM
To: Kaylor, Andrew <andrew...@intel.com>

Cc: David Blaikie <dbla...@gmail.com>; craig....@gmail.com; llvm...@lists.llvm.org
Subject: Re: [llvm-dev] Opaque Pointers Help Wanted

On Tue, Jun 22, 2021 at 6:16 PM Kaylor, Andrew via llvm-dev <llvm...@lists.llvm.org> wrote:

Reply all

Reply to author

Forward