[llvm-dev] [RFC] Introducing the opaque pointer type

128 views
Skip to first unread message

Arthur Eubanks via llvm-dev

unread,
May 3, 2021, 8:38:58 PMMay 3
to llvm-dev
For background on opaque pointer types, see [1] and many other patches/threads searchable with "opaque pointers".

While there's been lots of work around making opaque pointers work, we don't actually have a type like that in LLVM yet. https://reviews.llvm.org/D101704 introduces the opaque pointer type within LLVM so we can start playing around with the opaque pointer type and see what goes wrong. Much of the patch above is based on TNorthover's branch from a couple years ago [2].

The opaque pointer type is essentially just a PointerType with a null pointee type. Calling getElementType() on an opaque pointer asserts.

Since the bitcode representation for non-opaque pointers contains the pointee type, we need a new bitcode type code for opaque pointers, which only contains the address space.

For the textual IR representation, the current proposal is to represent an opaque pointer type with "ptr" with an optional "addrspace(N)". This seems consistent with existing uses of "addrspace(N)" and "ptr" seems right.
There are a couple alternatives. TNorthover's version uses "pN" where "N" is the address space, so most pointers would be "p0", and a pointer in address space #5 would be "p5". I initially attempted something like "ptr(N)", but the spelling is slightly ambiguous with function types. We could also simply use a void pointer, which LLVM currently does not allow [3].
Feel free to bikeshed.

[1]: https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html
[2]: https://lists.llvm.org/pipermail/llvm-dev/2019-December/137684.html
[3]: https://llvm.org/docs/LangRef.html#pointer-type

Nicolai Hähnle via llvm-dev

unread,
May 4, 2021, 2:02:12 AMMay 4
to Arthur Eubanks, llvm-dev
Hi Arthur,

On Tue, May 4, 2021 at 2:39 AM Arthur Eubanks via llvm-dev <llvm...@lists.llvm.org> wrote:
For background on opaque pointer types, see [1] and many other patches/threads searchable with "opaque pointers".

While there's been lots of work around making opaque pointers work, we don't actually have a type like that in LLVM yet. https://reviews.llvm.org/D101704 introduces the opaque pointer type within LLVM so we can start playing around with the opaque pointer type and see what goes wrong. Much of the patch above is based on TNorthover's branch from a couple years ago [2].

The opaque pointer type is essentially just a PointerType with a null pointee type. Calling getElementType() on an opaque pointer asserts.

Since the bitcode representation for non-opaque pointers contains the pointee type, we need a new bitcode type code for opaque pointers, which only contains the address space.

For the textual IR representation, the current proposal is to represent an opaque pointer type with "ptr" with an optional "addrspace(N)". This seems consistent with existing uses of "addrspace(N)" and "ptr" seems right.
There are a couple alternatives. TNorthover's version uses "pN" where "N" is the address space, so most pointers would be "p0", and a pointer in address space #5 would be "p5". I initially attempted something like "ptr(N)", but the spelling is slightly ambiguous with function types. We could also simply use a void pointer, which LLVM currently does not allow [3].

Thank you for doing this, and the approach seems largely good to me, except for one important point: We've been moving steadily towards making addrspace 0 be non-special for a long time now, so I *strongly* prefer a spelling that always has an address space. I don't care too much about the exact spelling, pN and ptr(N) both seem fine to me assuming technical issues can be sorted out. pN has the benefit of already being used in codegen contexts, so count that as a *mild* preference for that spelling.

Cheers,
Nicolai
 
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev


--
Lerne, wie die Welt wirklich ist,
aber vergiss niemals, wie sie sein sollte.

James Y Knight via llvm-dev

unread,
May 4, 2021, 10:10:12 AMMay 4
to Nicolai Hähnle, llvm-dev
On Tue, May 4, 2021 at 2:02 AM Nicolai Hähnle via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi Arthur,

On Tue, May 4, 2021 at 2:39 AM Arthur Eubanks via llvm-dev <llvm...@lists.llvm.org> wrote:
For background on opaque pointer types, see [1] and many other patches/threads searchable with "opaque pointers".

While there's been lots of work around making opaque pointers work, we don't actually have a type like that in LLVM yet. https://reviews.llvm.org/D101704 introduces the opaque pointer type within LLVM so we can start playing around with the opaque pointer type and see what goes wrong. Much of the patch above is based on TNorthover's branch from a couple years ago [2].

The opaque pointer type is essentially just a PointerType with a null pointee type. Calling getElementType() on an opaque pointer asserts.

Since the bitcode representation for non-opaque pointers contains the pointee type, we need a new bitcode type code for opaque pointers, which only contains the address space.

For the textual IR representation, the current proposal is to represent an opaque pointer type with "ptr" with an optional "addrspace(N)". This seems consistent with existing uses of "addrspace(N)" and "ptr" seems right.
There are a couple alternatives. TNorthover's version uses "pN" where "N" is the address space, so most pointers would be "p0", and a pointer in address space #5 would be "p5". I initially attempted something like "ptr(N)", but the spelling is slightly ambiguous with function types. We could also simply use a void pointer, which LLVM currently does not allow [3].

Thank you for doing this, and the approach seems largely good to me, except for one important point: We've been moving steadily towards making addrspace 0 be non-special for a long time now, so I *strongly* prefer a spelling that always has an address space. I don't care too much about the exact spelling, pN and ptr(N) both seem fine to me assuming technical issues can be sorted out. pN has the benefit of already being used in codegen contexts, so count that as a *mild* preference for that spelling.

There are many other places in the textual IR where we use the "addrspace(N)" syntax -- and AFAIK they all default to 0 right now. So my first inclination would be to agree with Arthur that it's a shame to have this syntax diverge from that. But -- do you have plans to change the behavior of those other contexts in the future?

Arthur Eubanks via llvm-dev

unread,
May 4, 2021, 2:20:16 PMMay 4
to James Y Knight, llvm-dev
Somebody pointed out to me that there's very little actual documentation on opaque pointer types. I'll try to write up some documentation so that the motivation and tradeoffs can be better discussed.

 
Thank you for doing this, and the approach seems largely good to me, except for one important point: We've been moving steadily towards making addrspace 0 be non-special for a long time now, so I *strongly* prefer a spelling that always has an address space. I don't care too much about the exact spelling, pN and ptr(N) both seem fine to me assuming technical issues can be sorted out. pN has the benefit of already being used in codegen contexts, so count that as a *mild* preference for that spelling.

There are many other places in the textual IR where we use the "addrspace(N)" syntax -- and AFAIK they all default to 0 right now. So my first inclination would be to agree with Arthur that it's a shame to have this syntax diverge from that. But -- do you have plans to change the behavior of those other contexts in the future?
+1 from somebody not super familiar with address spaces. 

Tom Stellard via llvm-dev

unread,
May 4, 2021, 2:32:52 PMMay 4
to Nicolai Hähnle, Arthur Eubanks, llvm-dev
On 5/3/21 11:01 PM, Nicolai Hähnle via llvm-dev wrote:
> Hi Arthur,
>
> On Tue, May 4, 2021 at 2:39 AM Arthur Eubanks via llvm-dev <llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>> wrote:
>
> For background on opaque pointer types, see [1] and many other patches/threads searchable with "opaque pointers".
>
> While there's been lots of work around making opaque pointers work, we don't actually have a type like that in LLVM yet. https://reviews.llvm.org/D101704 <https://reviews.llvm.org/D101704> introduces the opaque pointer type within LLVM so we can start playing around with the opaque pointer type and see what goes wrong. Much of the patch above is based on TNorthover's branch from a couple years ago [2].

>
> The opaque pointer type is essentially just a PointerType with a null pointee type. Calling getElementType() on an opaque pointer asserts.
>
> Since the bitcode representation for non-opaque pointers contains the pointee type, we need a new bitcode type code for opaque pointers, which only contains the address space.
>
> For the textual IR representation, the current proposal is to represent an opaque pointer type with "ptr" with an optional "addrspace(N)". This seems consistent with existing uses of "addrspace(N)" and "ptr" seems right.
> There are a couple alternatives. TNorthover's version uses "pN" where "N" is the address space, so most pointers would be "p0", and a pointer in address space #5 would be "p5". I initially attempted something like "ptr(N)", but the spelling is slightly ambiguous with function types. We could also simply use a void pointer, which LLVM currently does not allow [3].
>
>
> Thank you for doing this, and the approach seems largely good to me, except for one important point: We've been moving steadily towards making addrspace 0 be non-special for a long time now, so I *strongly* prefer a spelling that always has an address space. I don't care too much about the exact spelling, pN and ptr(N) both seem fine to me assuming technical issues can be sorted out. pN has the benefit of already being used in codegen contexts, so count that as a *mild* preference for that spelling.
>

I think requiring an address space would be too confusing for a majority of use
cases. Would it help if instead of defaulting to 0, the default address space
was target dependent?

- Tom

> Cheers,
> Nicolai
>
> Feel free to bikeshed.
>

> [1]: https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html <https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html>
> [2]: https://lists.llvm.org/pipermail/llvm-dev/2019-December/137684.html <https://lists.llvm.org/pipermail/llvm-dev/2019-December/137684.html>
> [3]: https://llvm.org/docs/LangRef.html#pointer-type <https://llvm.org/docs/LangRef.html#pointer-type>


> _______________________________________________
> LLVM Developers mailing list

> llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>

David Chisnall via llvm-dev

unread,
May 7, 2021, 11:40:36 AMMay 7
to llvm...@lists.llvm.org
On 04/05/2021 19:32, Tom Stellard via llvm-dev wrote:
> I think requiring an address space would be too confusing for a majority
> of use
> cases.  Would it help if instead of defaulting to 0, the default address
> space
> was target dependent?

For CHERI targets, the default address space is ABI dependent: AS0 is a
64-bit integer that's relative to the default data capability, AS200 is
a 128-bit capability (on 64-bit platforms). It can also differ between
code, heap, and stack.

If this is purely a syntactic thing in the text serialisation, would it
be possible to put something in the DataLayout that is ignored by
everything except the pretty-printer / parser?

David

Arthur Eubanks via llvm-dev

unread,
May 7, 2021, 2:20:41 PMMay 7
to David Chisnall, llvm-dev
On Fri, May 7, 2021 at 8:40 AM David Chisnall via llvm-dev <llvm...@lists.llvm.org> wrote:
On 04/05/2021 19:32, Tom Stellard via llvm-dev wrote:
> I think requiring an address space would be too confusing for a majority
> of use
> cases.  Would it help if instead of defaulting to 0, the default address
> space
> was target dependent?

For CHERI targets, the default address space is ABI dependent: AS0 is a
64-bit integer that's relative to the default data capability, AS200 is
a 128-bit capability (on 64-bit platforms).  It can also differ between
code, heap, and stack.

If this is purely a syntactic thing in the text serialisation, would it
be possible to put something in the DataLayout that is ignored by
everything except the pretty-printer / parser?
Could you give an example?


Also, perhaps we should separate the opaque pointer types transition from any changes to address spaces. Currently the proposal is basically unchanged from the current status quo in terms of pointer address spaces. We definitely should have a "default" pointer type in some shape or form which is represented by "ptr", or else writing IR tests is too cumbersome. Currently that means AS0, but we can change that in the future if we want independently of opaque pointers.

David Blaikie via llvm-dev

unread,
May 7, 2021, 2:27:36 PMMay 7
to Arthur Eubanks, llvm-dev
On Fri, May 7, 2021 at 11:20 AM Arthur Eubanks via llvm-dev

+1 to this - pointers already carry their address space with explicit
syntax and I think it's OK to do that for this transition. Though I
wouldn't be opposed to a change in the future to roll it into the
pointer type name if that seems suitable.

- Dave

Arthur Eubanks via llvm-dev

unread,
May 10, 2021, 6:29:00 PMMay 10
to David Blaikie, Nicolai Hähnle, llvm-dev
If there's a larger effort to make address spaces then I'd be happy to change the representation since mass updating tests once is better than twice, but I'm worried that this may start becoming intertwined with more address space work, and the opaque pointers project has gone on long enough (like many other LLVM projects).

And of course, there's always time before we do mass test updates to easily change the textual representation.

Duncan P. N. Exon Smith via llvm-dev

unread,
May 10, 2021, 8:35:30 PMMay 10
to Arthur Eubanks, LLVM Dev
I agree. I think it would be a mistake to add an unnecessary difference vs. typed pointers along some other axis (address space, or otherwise). Opaque pointers have enough of their own challenges to solve.

_______________________________________________

pawel k. via llvm-dev

unread,
May 11, 2021, 2:59:51 AMMay 11
to Duncan P. N. Exon Smith, LLVM Dev
I am very much beginner in opaque pointers but I am also minimalist too in a sense entities shouldnt be multiplied but rather divided where applicable.

Can someone point me to article(s) describing what problems opaque pointers solve that cant be solved with forward declaractions and typed pointers etc?

My first gutfeeling was when learning on idea of opaque pointers, theyre not much more than void* with all its issues from static analysis, compiler design, code readability, code quality, code security perspective. Can someone correct a newbie? Very open to change my mind.

-Pawel

David Blaikie via llvm-dev

unread,
May 11, 2021, 3:20:59 AMMay 11
to pawel...@gmail.com, LLVM Dev
On Mon, May 10, 2021 at 11:59 PM pawel k. via llvm-dev <llvm...@lists.llvm.org> wrote:
I am very much beginner in opaque pointers but I am also minimalist too in a sense entities shouldnt be multiplied but rather divided where applicable.

Can someone point me to article(s) describing what problems opaque pointers solve that cant be solved with forward declaractions and typed pointers etc?

My first gutfeeling was when learning on idea of opaque pointers, theyre not much more than void*

Yep, that's basically what they are. Though this is only relative to the IR design, not source language design.
 
with all its issues from static analysis, compiler design, code readability, code quality, code security perspective. Can someone correct a newbie? Very open to change my mind.

LLVM doesn't provide any guarantees about pointer types (unlike, say, C++ that has type based aliasing guarantees about pointers - if you have an int* you know it can't hold the same value as a float* in C++, but this property isn't true in LLVM IR (this information can be carried separately in type based alias analysis metadata - but it's not inherent in the LLVM IR of pointers themselves)) - so the type information provides limited value (somewhat useful for frontends generating IR to be able to have some intended type information carried around in the IR as it's being constructed) and inhibits optimizations - converting between pointer types involves instructions (geps or bitcasts) - instructions that optimizations have to know to skip over/look through. 

So instead, we're moving to a model where pointers don't have a type (since it's not informative to optimizations anyway) - and operations carry type information (instead of "load from this int pointer" it'll be "load an integer from this opaque pointer").

If you look at the LLVM IR today, you'll see these explicit types on operations (eg: the load instruction has an explicit type parameter to it, which currently looks redundant with the type of the pointer parameter that's passed to the load instruction - but in the future that pointer parameter won't carry any pointee type information and the load will rely entirely on the explicit type parameter it has).

- Dave 

pawel k. via llvm-dev

unread,
May 11, 2021, 4:10:07 AMMay 11
to David Blaikie, LLVM Dev
Ok cool. If that makes llvm better cool with me. Just dont spread it to lang spec. One void* issue in complang spec is more than enough trouble from perspective of dude working on static analysis and other mentioned topics.

-Pawel

David Chisnall via llvm-dev

unread,
May 11, 2021, 4:56:25 AMMay 11
to Arthur Eubanks, llvm-dev
On 07/05/2021 19:20, Arthur Eubanks wrote:
>
>
> On Fri, May 7, 2021 at 8:40 AM David Chisnall via llvm-dev
> <llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>> wrote:
>
> On 04/05/2021 19:32, Tom Stellard via llvm-dev wrote:
> > I think requiring an address space would be too confusing for a
> majority
> > of use
> > cases.  Would it help if instead of defaulting to 0, the default
> address
> > space
> > was target dependent?
>
> For CHERI targets, the default address space is ABI dependent: AS0 is a
> 64-bit integer that's relative to the default data capability, AS200 is
> a 128-bit capability (on 64-bit platforms).  It can also differ between
> code, heap, and stack.
>
> If this is purely a syntactic thing in the text serialisation, would it
> be possible to put something in the DataLayout that is ignored by
> everything except the pretty-printer / parser?
>
> Could you give an example?

An example of what?

> Also, perhaps we should separate the opaque pointer types transition
> from any changes to address spaces. Currently the proposal is basically
> unchanged from the current status quo in terms of pointer address
> spaces. We definitely should have a "default" pointer type in some shape
> or form which is represented by "ptr", or else writing IR tests is too
> cumbersome. Currently that means AS0, but we can change that in the
> future if we want independently of opaque pointers.

I agree that doing this incrementally is probably the right thing, but I
disagree on the tests side. If we used a p{address space} notation then
writing p0 is less to type than ptr, so writing tests that want AS0 is
less effort and writing tests that want another address space is even
less effort than writing `ptr addrspace(42)`.

David Chisnall via llvm-dev

unread,
May 11, 2021, 5:19:45 AMMay 11
to llvm...@lists.llvm.org
On 11/05/2021 07:59, pawel k. via llvm-dev wrote:
> I am very much beginner in opaque pointers but I am also minimalist too
> in a sense entities shouldnt be multiplied but rather divided where
> applicable.
>
> Can someone point me to article(s) describing what problems opaque
> pointers solve that cant be solved with forward declaractions and typed
> pointers etc?
>
> My first gutfeeling was when learning on idea of opaque pointers, theyre
> not much more than void* with all its issues from static analysis,
> compiler design, code readability, code quality, code security
> perspective. Can someone correct a newbie? Very open to change my mind.

There are a few problems with the current representation and they
largely mirror the old problem with signed vs unsigned integers in the
IR 15 years ago. In early versions of LLVM, integers were explicitly
signed. This meant that the IR was cluttered with bitcasts from signed
to unsigned integers, which slowed down analysis and didn't convey any
useful semantics. Worse, there were a bunch of things conflated, for
example does unsigned imply wrapping? Some time in the 2.x series (2.0?
My memory is fuzzy here), LLVM moved to just i{size} types for integer
and moved all of the semantics to the operations. It's now explicit
whether an operation is signed or unsigned, whether overflow wraps or
has undefined behaviour, and so on.

Pointers have a similar set of problems. Pointers carry a type, but
that type doesn't actually carry any semantics. There are a lot of
things that don't care about the type of the pointer, but they have no
way of specifying this and generally use i8*. This means that the IR is
full of bitcasts from {something}* to i8* and then back again.

This is particularly important for code that wants to use non-zero
address spaces, because a lot of code does casts via i8* and forgets to
change this to i8*-in-another-address-space.

The fact that a pointer is a pointer to some struct type currently
doesn't imply anything about whether the pointed-to data and it's
completely valid to bitcast a pointer to a random type and back again in
an optimisation. The real type info (where applicable) is carried by
TBAA metadata, dereferencability info by attributes, and so on.

TL;DR: The pointee type has no (or worse, misleading) semantics and
forces a load of bitcasts. Opaque pointers remove this.

David

pawel k. via llvm-dev

unread,
May 11, 2021, 9:23:57 AMMay 11
to David Chisnall, llvm-dev
Ok. Cool. Im starting to understand now. ThankYou.

-Pawel

Arthur Eubanks via llvm-dev

unread,
May 11, 2021, 8:28:18 PMMay 11
to llvm-dev
A quick doc on opaque pointers: https://reviews.llvm.org/D102292
Reply all
Reply to author
Forward
0 new messages