[llvm-dev] RFC: Opaque pointer status and future direction

Tim Northover via llvm-dev

unread,

Dec 18, 2019, 10:16:04 AM12/18/19

to LLVM Developers Mailing List

Hi all,

At the dev meeting I promised to update everyone on where my work with opaque
pointers is right now. It's taken me a while, but at least it's the same year!

Current Status
==============

I've put two branches up at https://github.com/TNorthover/llvm-project:
"opaque-ptr" which has most of the real work so far; and "opaque-ptr-always"
that additionally has a patch to force every pointer to be opaque and see what
falls over. It's about 40 patches on top of master in a few categories.

1. Serialization: bitcode <-> in-memory <-> textual IR[0].
2. Relaxing assertions in Instruction constructors and the Verifier so
that we don't assume every pointer has an element type.
3. Modifying passes and other components to get their element types from other
sources when needed. This is where I see the bulk of the future work in LLVM
itself.

All of them are very much from my dev machine and not prepared for real review.

To give an idea of the work ahead, on "opaque-ptr-always", running "ninja check"
there are about 4500 failures.

Many of these are of course CHECK lines looking for typed pointers that LLVM
will never again print; we'll need some kind of script to automate
converting as many
of those tests as possible.

Byeond that, there are still about 800 assertion failures, but looking at the
backtraces I think that there are "only" 75ish distinct callsites[1] that would
have to be fixed, plus whatever's revealed behind them.

Future Direction
================

I think this work needs to happen more incrementally. It's really not great that
I've built up a backlog of 40 patches that only I have access to and can work
on.

So at a high level I think we should put the serialization and Instruction
changes in sooner rather than later, giving us a largely undocumented[2] dialect
of IR with opaque pointers that we can write tests against to upstream the rest
of what I've done (and others can use to continue work in parallel if they're
inclined).

The risk is of course that this becomes yet another unfinished feature we drag
around for years, with a corresponding maintenance burden. And it's a real risk,
I unfortunately don't have the go ahead to work on this full time.

But I don't think the alternatives are much better. Even full time I don't think
I could do it completely alone because some choices will need input from
experts. Even if I could, it would finish with a patch bomb even bigger than
what I'm dropping here.

Proposal
========

Short term (because otherwise we can't do it for another six months):

1. Add inalloca(<ty>) support.
2. Document for January release the planned removal of:
* Old style byval
* Old style inalloca
* Typeless CreateCall, CreateLoad, CreateGEP.
3. Soon after January branch, strip out those bits. The third in
particular should prevent front-end regression, I had to fix a fair
few new deprecated callsites in Clang when rebasing everything this
week.

Short/medium term:

1. Commit serialization and Instruction changes.
2. Use that to add tests for patches I already have and upstream them.
3. Keep fixing the issues, but no-one not working on opaque pointers should need
to change their behaviour other than a general encouragement to not use
getElementType unless they have to.

Long term:

As we get close to everything working, we should shift the expectations so that
new uses of getElementPtr aren't allowed in LLVM.

Front-ends (including Clang) will need more work I suspect.

No doubt there will be performance issues where having a pointee type helps some
heuristic be a bit better. We'll have to decide what to do about those.

[0] See attached opaque.ll for some proposed IR.
[1] See attached asserts.txt if interested.
[2] Or perhaps more likely documented with "don't use it unless you're working
on opaque pointers" warnings.

asserts.txt

opaque.ll

David Blaikie via llvm-dev

unread,

Dec 18, 2019, 6:16:16 PM12/18/19

to Tim Northover, LLVM Developers Mailing List

My, admittedly rather vague, plan was to change the API down to the point where there was only a primitive for "propagating pointee type from one place to another" but without the ability to query it otherwise - well, with a deprecated way to query it that we could chase down calls to as the main migration. Once we got to zero "getElementType" callers we could figure out the actual IR migration piece.

Do you think that wouldn't be viable & that introducing the new opaque pointer type sooner would be better/more viable?

The risk is of course that this becomes yet another unfinished feature we drag
around for years, with a corresponding maintenance burden. And it's a real risk,
I unfortunately don't have the go ahead to work on this full time.

But I don't think the alternatives are much better. Even full time I don't think
I could do it completely alone because some choices will need input from
experts. Even if I could, it would finish with a patch bomb even bigger than
what I'm dropping here.

Proposal
========

Short term (because otherwise we can't do it for another six months):

1. Add inalloca(<ty>) support.

Is byval already fixed/changed? I may've lost track of some of these changes, but I knew that was next on my list. (& how was byval addressed - byval(<ty>) or byval(byte count)? I guess inalloca goes/should go the same way as byval)

2. Document for January release the planned removal of:
* Old style byval
* Old style inalloca
* Typeless CreateCall, CreateLoad, CreateGEP.

Sounds good to me.

3. Soon after January branch, strip out those bits. The third in
particular should prevent front-end regression, I had to fix a fair
few new deprecated callsites in Clang when rebasing everything this
week.

Short/medium term:

1. Commit serialization and Instruction changes.

2. Use that to add tests for patches I already have and upstream them.
3. Keep fixing the issues, but no-one not working on opaque pointers should need
to change their behaviour other than a general encouragement to not use
getElementType unless they have to.

Long term:

As we get close to everything working, we should shift the expectations so that
new uses of getElementPtr aren't allowed in LLVM.

Front-ends (including Clang) will need more work I suspect.

No doubt there will be performance issues where having a pointee type helps some
heuristic be a bit better. We'll have to decide what to do about those.

[0] See attached opaque.ll for some proposed IR.
[1] See attached asserts.txt if interested.
[2] Or perhaps more likely documented with "don't use it unless you're working
on opaque pointers" warnings.

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Nicolai Hähnle via llvm-dev

unread,

Dec 19, 2019, 3:53:55 AM12/19/19

to Tim Northover, LLVM Developers Mailing List

Hi Tim,

I very much like this. One note:

On Wed, Dec 18, 2019 at 4:16 PM Tim Northover via llvm-dev
<llvm...@lists.llvm.org> wrote:
> Short term (because otherwise we can't do it for another six months):
> 1. Add inalloca(<ty>) support.
> 2. Document for January release the planned removal of:
> * Old style byval
> * Old style inalloca
> * Typeless CreateCall, CreateLoad, CreateGEP.
> 3. Soon after January branch, strip out those bits. The third in
> particular should prevent front-end regression, I had to fix a fair
> few new deprecated callsites in Clang when rebasing everything this
> week.

Can we please have a transition period where those Create* functions
are marked as LLVM_ATTRIBUTE_DEPRECATED? And same for the
corresponding C API functions. This would be tremendously helpful for
out-of-tree frontends that track master.

This transition period doesn't necessarily have to be long. As far as
I'm concerned it wouldn't even have to be a full release cycle, though
others might disagree. But keeping those methods around as explicitly
deprecated (in a way that causes compiler noise so that it doesn't get
missed!) for a month or two would be good.

> Short/medium term:
>
> 1. Commit serialization and Instruction changes.
> 2. Use that to add tests for patches I already have and upstream them.
> 3. Keep fixing the issues, but no-one not working on opaque pointers should need
> to change their behaviour other than a general encouragement to not use
> getElementType unless they have to.

I think "general encouragement" is too weak here. Getting rid of
getPointerElementType() seems to me the hardest piece of the puzzle
because it creates architectural issues in a lot of different places.
For example, I noticed that some of the sanitizers lean on being able
to use this function, and this cannot be fixed locally in just the
backend sanitizer implementation. Our own AMDGPU backend also has some
issues like that, and I'd expect that more are lurking somewhere.

Most getPointerElementType() uses seem to be fairly harmless and
fixable locally, but we need a stronger lever to get those
architectural issues fixed.

Cheers,
Nicolai

>
> Long term:
>
> As we get close to everything working, we should shift the expectations so that
> new uses of getElementPtr aren't allowed in LLVM.
>
> Front-ends (including Clang) will need more work I suspect.
>
> No doubt there will be performance issues where having a pointee type helps some
> heuristic be a bit better. We'll have to decide what to do about those.
>
> [0] See attached opaque.ll for some proposed IR.
> [1] See attached asserts.txt if interested.
> [2] Or perhaps more likely documented with "don't use it unless you're working
> on opaque pointers" warnings.

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

Kevin Neal via llvm-dev

unread,

Dec 19, 2019, 1:44:12 PM12/19/19

to llvm-dev

The IRBuilder's CreateConstrainedFPCall() also needs to be changed.

I got stuck when I found that Intrinsic::getDeclaration() threw away the info I needed, and changing it was going to be a rather large task due to the number of callers.

--
Kevin P. Neal
SAS/C and SAS/C++ Compiler
Host Research and Development
SAS Institute, Inc.

-----Original Message-----
From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Nicolai Hähnle via llvm-dev
Sent: Thursday, December 19, 2019 3:53 AM
To: Tim Northover <t.p.no...@gmail.com>
Cc: LLVM Developers Mailing List <llvm...@lists.llvm.org>
Subject: Re: [llvm-dev] RFC: Opaque pointer status and future direction

EXTERNAL

> https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7Ckevin.neal%40sas.com%7Cc42d435f4d694bedb98508d78460f6b0%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637123424303727580&sdata=SFb%2BeNkdgDLxKf8cJ1P7PBnQMET2h%2BOZ1DSQ9cShXRs%3D&reserved=0

--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org

https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7Ckevin.neal%40sas.com%7Cc42d435f4d694bedb98508d78460f6b0%7Cb1c14d5c362545b3a4309552373a0c2f%7C0%7C0%7C637123424303727580&sdata=SFb%2BeNkdgDLxKf8cJ1P7PBnQMET2h%2BOZ1DSQ9cShXRs%3D&reserved=0

Tim Northover via llvm-dev

unread,

Dec 19, 2019, 2:15:52 PM12/19/19

to Kevin Neal, llvm-dev

Hi Kevin,

On Thu, 19 Dec 2019 at 18:44, Kevin Neal via llvm-dev
<llvm...@lists.llvm.org> wrote:
> The IRBuilder's CreateConstrainedFPCall() also needs to be changed.
>
> I got stuck when I found that Intrinsic::getDeclaration() threw away the info I needed, and changing it was going to be a rather large task due to the number of callers.

Could you elaborate on that? I did actually discover that function
needed modification the other day, but converting it to take a
"Function *" seemed to satisfy Clang (and make the implementation
simpler).

My current understanding is that Intrinsic::getDeclaration returns a
Function that *does* have the necessary info
(Function::getFunctionType, though CreateCall can be used directly).

Cheers.

Tim.

Tim Northover via llvm-dev

unread,

Dec 19, 2019, 2:16:57 PM12/19/19

to Nicolai Hähnle, LLVM Developers Mailing List

On Thu, 19 Dec 2019 at 08:53, Nicolai Hähnle <nhae...@gmail.com> wrote:
> Can we please have a transition period where those Create* functions
> are marked as LLVM_ATTRIBUTE_DEPRECATED? And same for the
> corresponding C API functions. This would be tremendously helpful for
> out-of-tree frontends that track master.

Yep, that sounds like something we ought to be able to get in by
January so it can be in for a release cycle. I'll add it to my list.

> I think "general encouragement" is too weak here. Getting rid of
> getPointerElementType() seems to me the hardest piece of the puzzle
> because it creates architectural issues in a lot of different places.
> For example, I noticed that some of the sanitizers lean on being able
> to use this function, and this cannot be fixed locally in just the
> backend sanitizer implementation. Our own AMDGPU backend also has some
> issues like that, and I'd expect that more are lurking somewhere.
>
> Most getPointerElementType() uses seem to be fairly harmless and
> fixable locally, but we need a stronger lever to get those
> architectural issues fixed.

Fair point. I'll try to come up with something a bit more strongly worded.

Cheers.

Tim.

Tim Northover via llvm-dev

unread,

Dec 19, 2019, 2:27:38 PM12/19/19

to David Blaikie, LLVM Developers Mailing List

>> So at a high level I think we should put the serialization and Instruction
>> changes in sooner rather than later, giving us a largely undocumented[2] dialect
>> of IR with opaque pointers that we can write tests against to upstream the rest
>> of what I've done (and others can use to continue work in parallel if they're
>> inclined).
>
> My, admittedly rather vague, plan was to change the API down to the point where there was only a primitive for "propagating pointee type from one place to another" but without the ability to query it otherwise - well, with a deprecated way to query it that we could chase down calls to as the main migration. Once we got to zero "getElementType" callers we could figure out the actual IR migration piece.
>
> Do you think that wouldn't be viable & that introducing the new opaque pointer type sooner would be better/more viable?

I think it'd be viable, but would turn many of the patches into NFC
with no way to test them. I'm quite happy to do that for strictly
obvious changes, but I get a bit more nervous when we get to larger
scale refactorings. I'd like to be able to do more than just hope
people don't regress to getElementPtr, ideally.

But I think that approach definitely pushes the IR changes from
short/medium term to medium term at the very least, and I'll pursue
the kind of patches you're suggesting for now. Perhaps revisit the
question when things start getting tricky.

> Is byval already fixed/changed? I may've lost track of some of these changes, but I knew that was next on my list. (& how was byval addressed - byval(<ty>) or byval(byte count)? I guess inalloca goes/should go the same way as byval)

byval(<ty>) is in, and used by Clang now. In retrospect I should have
done inalloca at the same time, but for some reason I decided it
wasn't necessary. I think it'll need the same conversion, and then
both must be made compulsory.

Cheers.

Tim.

David Blaikie via llvm-dev

unread,

Dec 23, 2019, 6:16:41 PM12/23/19

to Tim Northover, LLVM Developers Mailing List

On Thu, Dec 19, 2019 at 11:27 AM Tim Northover <t.p.no...@gmail.com> wrote:

>> So at a high level I think we should put the serialization and Instruction
>> changes in sooner rather than later, giving us a largely undocumented[2] dialect
>> of IR with opaque pointers that we can write tests against to upstream the rest
>> of what I've done (and others can use to continue work in parallel if they're
>> inclined).
>
> My, admittedly rather vague, plan was to change the API down to the point where there was only a primitive for "propagating pointee type from one place to another" but without the ability to query it otherwise - well, with a deprecated way to query it that we could chase down calls to as the main migration. Once we got to zero "getElementType" callers we could figure out the actual IR migration piece.
>
> Do you think that wouldn't be viable & that introducing the new opaque pointer type sooner would be better/more viable?

I think it'd be viable, but would turn many of the patches into NFC
with no way to test them.

Hmm, I get where you're coming from & would hope they'd sort of be NFC - but yeah, if we made these changes with an opaque pointer type available, we could write tests demonstrating the support for opaque pointers by writing test cases with mismatched pointers... except they wouldn't be mismatched anyway because there'd be no type. So I guess at best we could demonstrate a new optimization available with the opaque pointers that couldn't be expressed with typed pointers.

I'm quite happy to do that for strictly
obvious changes, but I get a bit more nervous when we get to larger
scale refactorings. I'd like to be able to do more than just hope
people don't regress to getElementPtr, ideally.

Please go ahead with whatever way you think works best - I can totally see it both/either way.

Nicolai Hähnle via llvm-dev

unread,

Feb 15, 2020, 3:55:26 PM2/15/20

to David Blaikie, LLVM Developers Mailing List

Re-upping this thread from almost two months ago...

As I expected, one new use of a deprecated IRBuilder::CreateCall
method has cropped up, and probably more of the CreateLoad variety. To
put an end to this whack-a-mole, I suggested back then to mark those
methods deprecated for real with an attribute.
https://reviews.llvm.org/D74676 (+ parent revisions on the stack) does
this for CreateCall, which is the least painful one to address.

Cheers,
Nicolai

> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Lerne, wie die Welt wirklich ist,

aber vergiss niemals, wie sie sein sollte.

Reply all

Reply to author

Forward