For background on opaque pointer types, see [1] and many other patches/threads searchable with "opaque pointers".
While there's been lots of work around making opaque pointers work, we don't actually have a type like that in LLVM yet. https://reviews.llvm.org/D101704 introduces the opaque pointer type within LLVM so we can start playing around with the opaque pointer type and see what goes wrong. Much of the patch above is based on TNorthover's branch from a couple years ago [2].
The opaque pointer type is essentially just a PointerType with a null pointee type. Calling getElementType() on an opaque pointer asserts.
Since the bitcode representation for non-opaque pointers contains the pointee type, we need a new bitcode type code for opaque pointers, which only contains the address space.
For the textual IR representation, the current proposal is to represent an opaque pointer type with "ptr" with an optional "addrspace(N)". This seems consistent with existing uses of "addrspace(N)" and "ptr" seems right.
There are a couple alternatives. TNorthover's version uses "pN" where "N" is the address space, so most pointers would be "p0", and a pointer in address space #5 would be "p5". I initially attempted something like "ptr(N)", but the spelling is slightly ambiguous with function types. We could also simply use a void pointer, which LLVM currently does not allow [3].
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Hi Arthur,On Tue, May 4, 2021 at 2:39 AM Arthur Eubanks via llvm-dev <llvm...@lists.llvm.org> wrote:For background on opaque pointer types, see [1] and many other patches/threads searchable with "opaque pointers".
While there's been lots of work around making opaque pointers work, we don't actually have a type like that in LLVM yet. https://reviews.llvm.org/D101704 introduces the opaque pointer type within LLVM so we can start playing around with the opaque pointer type and see what goes wrong. Much of the patch above is based on TNorthover's branch from a couple years ago [2].
The opaque pointer type is essentially just a PointerType with a null pointee type. Calling getElementType() on an opaque pointer asserts.
Since the bitcode representation for non-opaque pointers contains the pointee type, we need a new bitcode type code for opaque pointers, which only contains the address space.
For the textual IR representation, the current proposal is to represent an opaque pointer type with "ptr" with an optional "addrspace(N)". This seems consistent with existing uses of "addrspace(N)" and "ptr" seems right.
There are a couple alternatives. TNorthover's version uses "pN" where "N" is the address space, so most pointers would be "p0", and a pointer in address space #5 would be "p5". I initially attempted something like "ptr(N)", but the spelling is slightly ambiguous with function types. We could also simply use a void pointer, which LLVM currently does not allow [3].Thank you for doing this, and the approach seems largely good to me, except for one important point: We've been moving steadily towards making addrspace 0 be non-special for a long time now, so I *strongly* prefer a spelling that always has an address space. I don't care too much about the exact spelling, pN and ptr(N) both seem fine to me assuming technical issues can be sorted out. pN has the benefit of already being used in codegen contexts, so count that as a *mild* preference for that spelling.
Thank you for doing this, and the approach seems largely good to me, except for one important point: We've been moving steadily towards making addrspace 0 be non-special for a long time now, so I *strongly* prefer a spelling that always has an address space. I don't care too much about the exact spelling, pN and ptr(N) both seem fine to me assuming technical issues can be sorted out. pN has the benefit of already being used in codegen contexts, so count that as a *mild* preference for that spelling.There are many other places in the textual IR where we use the "addrspace(N)" syntax -- and AFAIK they all default to 0 right now. So my first inclination would be to agree with Arthur that it's a shame to have this syntax diverge from that. But -- do you have plans to change the behavior of those other contexts in the future?
I think requiring an address space would be too confusing for a majority of use
cases. Would it help if instead of defaulting to 0, the default address space
was target dependent?
- Tom
> Cheers,
> Nicolai
>
> Feel free to bikeshed.
>
> [1]: https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html <https://lists.llvm.org/pipermail/llvm-dev/2015-February/081822.html>
> [2]: https://lists.llvm.org/pipermail/llvm-dev/2019-December/137684.html <https://lists.llvm.org/pipermail/llvm-dev/2019-December/137684.html>
> [3]: https://llvm.org/docs/LangRef.html#pointer-type <https://llvm.org/docs/LangRef.html#pointer-type>
> _______________________________________________
> LLVM Developers mailing list
> llvm...@lists.llvm.org <mailto:llvm...@lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
For CHERI targets, the default address space is ABI dependent: AS0 is a
64-bit integer that's relative to the default data capability, AS200 is
a 128-bit capability (on 64-bit platforms). It can also differ between
code, heap, and stack.
If this is purely a syntactic thing in the text serialisation, would it
be possible to put something in the DataLayout that is ignored by
everything except the pretty-printer / parser?
David
On 04/05/2021 19:32, Tom Stellard via llvm-dev wrote:
> I think requiring an address space would be too confusing for a majority
> of use
> cases. Would it help if instead of defaulting to 0, the default address
> space
> was target dependent?
For CHERI targets, the default address space is ABI dependent: AS0 is a
64-bit integer that's relative to the default data capability, AS200 is
a 128-bit capability (on 64-bit platforms). It can also differ between
code, heap, and stack.
If this is purely a syntactic thing in the text serialisation, would it
be possible to put something in the DataLayout that is ignored by
everything except the pretty-printer / parser?
+1 to this - pointers already carry their address space with explicit
syntax and I think it's OK to do that for this transition. Though I
wouldn't be opposed to a change in the future to roll it into the
pointer type name if that seems suitable.
- Dave
_______________________________________________
I am very much beginner in opaque pointers but I am also minimalist too in a sense entities shouldnt be multiplied but rather divided where applicable.Can someone point me to article(s) describing what problems opaque pointers solve that cant be solved with forward declaractions and typed pointers etc?My first gutfeeling was when learning on idea of opaque pointers, theyre not much more than void*
with all its issues from static analysis, compiler design, code readability, code quality, code security perspective. Can someone correct a newbie? Very open to change my mind.
An example of what?
> Also, perhaps we should separate the opaque pointer types transition
> from any changes to address spaces. Currently the proposal is basically
> unchanged from the current status quo in terms of pointer address
> spaces. We definitely should have a "default" pointer type in some shape
> or form which is represented by "ptr", or else writing IR tests is too
> cumbersome. Currently that means AS0, but we can change that in the
> future if we want independently of opaque pointers.
I agree that doing this incrementally is probably the right thing, but I
disagree on the tests side. If we used a p{address space} notation then
writing p0 is less to type than ptr, so writing tests that want AS0 is
less effort and writing tests that want another address space is even
less effort than writing `ptr addrspace(42)`.
There are a few problems with the current representation and they
largely mirror the old problem with signed vs unsigned integers in the
IR 15 years ago. In early versions of LLVM, integers were explicitly
signed. This meant that the IR was cluttered with bitcasts from signed
to unsigned integers, which slowed down analysis and didn't convey any
useful semantics. Worse, there were a bunch of things conflated, for
example does unsigned imply wrapping? Some time in the 2.x series (2.0?
My memory is fuzzy here), LLVM moved to just i{size} types for integer
and moved all of the semantics to the operations. It's now explicit
whether an operation is signed or unsigned, whether overflow wraps or
has undefined behaviour, and so on.
Pointers have a similar set of problems. Pointers carry a type, but
that type doesn't actually carry any semantics. There are a lot of
things that don't care about the type of the pointer, but they have no
way of specifying this and generally use i8*. This means that the IR is
full of bitcasts from {something}* to i8* and then back again.
This is particularly important for code that wants to use non-zero
address spaces, because a lot of code does casts via i8* and forgets to
change this to i8*-in-another-address-space.
The fact that a pointer is a pointer to some struct type currently
doesn't imply anything about whether the pointed-to data and it's
completely valid to bitcast a pointer to a random type and back again in
an optimisation. The real type info (where applicable) is carried by
TBAA metadata, dereferencability info by attributes, and so on.
TL;DR: The pointee type has no (or worse, misleading) semantics and
forces a load of bitcasts. Opaque pointers remove this.
David