[LLVMdev] make DataLayout a mandatory part of Module

Nick Lewycky

unread,

Jan 29, 2014, 6:40:33 PM1/29/14

to LLVM Developers Mailing List

The LLVM Module has an optional target triple and target datalayout. Without them, an llvm::DataLayout can't be constructed with meaningful data. The benefit to making them optional is to permit optimization that would work across all possible DataLayouts, then allow us to commit to a particular one at a later point in time, thereby performing more optimization in advance.

This feature is not being used. Instead, every user of LLVM IR in a portability system defines one or more standardized datalayouts for their platform, and shims to place calls with the outside world. The primary reason for this is that independence from DataLayout is not sufficient to achieve portability because it doesn't also represent ABI lowering constraints. If you have a system that attempts to use LLVM IR in a portable fashion and does it without standardizing on a datalayout, please share your experience.

The cost to keeping this feature around is that we have to pass around the DataLayout object in many places, test for its presence, in some cases write different optimizations depending on whether we have DataLayout, and in the worst case I can think of, we have two different canonical forms for constant expressions depending on whether DL is present. Our canonical IR is different with and without datalayout, and we have two canonicalizers fighting it out (IR/ConstantFold.cpp and Analysis/ConstantFolding.cpp).

I'm trying to force the issue. Either this is a useful feature to maintain in which case I want to see a design on how to defer ABI decisions until a later point in time, or else we do not support it and target triple and target datalayout become a mandatory part of a valid Module again. I think the correct direction is to make them mandatory, but this is a large change that warrants debate.

If we decide that target information should be a mandatory part of a module, there's another question about what we should do with existing .bc and .ll files that don't have one. Load in a default of the host machine?

Nick

Jim Grosbach

unread,

Jan 29, 2014, 6:53:07 PM1/29/14

to Nick Lewycky, LLVM Developers Mailing List

Hi Nick,

The main use case I’ve seen is that it makes writing generic test cases for ‘opt’ easier in that it’s not necessary to specify a target triple on the command line or have a data layout in the .ll/.bc file. That is, in my experience, it’s more for convenience and perhaps historical layering considerations.

I have no philosophical objection to the direction you’re suggesting.

For modules without a data layout, use the host machine as you suggest. That’s consistent with what already happens with llc, so extending that to opt and other such tools seems reasonable to me.

-Jim

> _______________________________________________
> LLVM Developers mailing list
> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Nick Lewycky

unread,

Jan 29, 2014, 7:04:16 PM1/29/14

to Jim Grosbach, LLVM Developers Mailing List

On 29 January 2014 15:53, Jim Grosbach <gros...@apple.com> wrote:

Hi Nick,

The main use case I’ve seen is that it makes writing generic test cases for ‘opt’ easier in that it’s not necessary to specify a target triple on the command line or have a data layout in the .ll/.bc file. That is, in my experience, it’s more for convenience and perhaps historical layering considerations.

I have no philosophical objection to the direction you’re suggesting.

For modules without a data layout, use the host machine as you suggest. That’s consistent with what already happens with llc, so extending that to opt and other such tools seems reasonable to me.

This is also what many clang tests do, where TUs get parsed using the host triple. If we keep target datalayout out of the test files and fill it in with the host's information, then our test coverage expands as our buildbot diversity grows, which is a neat property.

Nick

David Chisnall

unread,

Jan 30, 2014, 5:10:55 AM1/30/14

to Nick Lewycky, LLVM Developers Mailing List

On 30 Jan 2014, at 00:04, Nick Lewycky <nlew...@google.com> wrote:

> This is also what many clang tests do, where TUs get parsed using the host triple. If we keep target datalayout out of the test files and fill it in with the host's information, then our test coverage expands as our buildbot diversity grows, which is a neat property.

Unfortunately, reproducibility suffers. You commit a change, a test fails on two buildbots but passes on all of the others and on your local system. Now what do you do? I've already hit this problem in clang, with host-defined tool search paths leaking into the tests and causing them to fail on Windows only. It's hard to fix a bug that causes a buildbot failure if you can't reproduce it. At the very least, the target / data layout should be in the failure message that the test suite generates in case of failure so that you can reproduce it locally if a buildbot reports failure.

David

Matt Arsenault

unread,

Jan 30, 2014, 10:24:25 AM1/30/14

to David Chisnall, LLVM Developers Mailing List

Why not default to using a generic datalayout that just uses the defaults for everything?. There are already defaults, since not every option needs to be specified in it, you just don’t get them when you don’t have one at all. Some places without one already make some assumptions like that.

Philip Reames

unread,

Jan 30, 2014, 12:55:04 PM1/30/14

to Nick Lewycky, LLVM Developers Mailing List

On 1/29/14 3:40 PM, Nick Lewycky wrote:
> The LLVM Module has an optional target triple and target datalayout.
> Without them, an llvm::DataLayout can't be constructed with meaningful
> data. The benefit to making them optional is to permit optimization
> that would work across all possible DataLayouts, then allow us to
> commit to a particular one at a later point in time, thereby
> performing more optimization in advance.
>
> This feature is not being used. Instead, every user of LLVM IR in a
> portability system defines one or more standardized datalayouts for
> their platform, and shims to place calls with the outside world. The
> primary reason for this is that independence from DataLayout is not
> sufficient to achieve portability because it doesn't also represent
> ABI lowering constraints. If you have a system that attempts to use
> LLVM IR in a portable fashion and does it without standardizing on a
> datalayout, please share your experience.

Nick, I don't have a current system in place, but I do want to put
forward an alternate perspective.

We've been looking at doing late insertion of safepoints for garbage
collection. One of the properties that we end up needing to preserve
through all the optimizations which precede our custom rewriting phase
is that the optimizer has not chosen to "hide" pointers from us by using
ptrtoint and integer math tricks. Currently, we're simply running a
verification pass before our rewrite, but I'm very interested long term
in constructing ways to ensure a "gc safe" set of optimization passes.

One of the ways I've been thinking about - but haven't actually
implemented yet - is to deny the optimization passes information about
pointer sizing. Under the assumption that an opto pass can't insert an
ptrtoint cast without knowing a safe integer size to use, this seems
like it would outlaw a class of optimizations we'd be broken by.

My understanding is that the only current way to do this would be to not
specify a DataLayout. (And hack a few places with built in
assumptions. Let's ignore that for the moment.) With your proposed
change, would there be a clean way to express something like this?

p.s. From reading the mailing list a while back, I suspect that the SPIR
folks might have similar needs. (i.e. hiding pointer sizes, etc..)
Pure speculation on my part though.

Philip

Reid Kleckner

unread,

Jan 30, 2014, 1:08:35 PM1/30/14

to David Chisnall, LLVM Developers Mailing List

On Thu, Jan 30, 2014 at 2:10 AM, David Chisnall <David.C...@cl.cam.ac.uk> wrote:

On 30 Jan 2014, at 00:04, Nick Lewycky <nlew...@google.com> wrote:

> This is also what many clang tests do, where TUs get parsed using the host triple. If we keep target datalayout out of the test files and fill it in with the host's information, then our test coverage expands as our buildbot diversity grows, which is a neat property.

Unfortunately, reproducibility suffers. You commit a change, a test fails on two buildbots but passes on all of the others and on your local system. Now what do you do? I've already hit this problem in clang, with host-defined tool search paths leaking into the tests and causing them to fail on Windows only. It's hard to fix a bug that causes a buildbot failure if you can't reproduce it. At the very least, the target / data layout should be in the failure message that the test suite generates in case of failure so that you can reproduce it locally if a buildbot reports failure.

I don't think this will be a problem for opt or other LLVM tools. If opt has a dependence on the host's default triple and datalayout, reproducing the failure should be a simple matter of running the test locally with a manually specified triple. It doesn't have implicit header search paths or other weird host dependencies.

Rafael Espíndola

unread,

Jan 30, 2014, 4:07:47 PM1/30/14

to Nick Lewycky, LLVM Developers Mailing List

I don't think we can reasonably express all the information needed by
ABIs at the LLVM level. Given that, It would *love* to see DataLayout
become a mandatory part of the IR!

> If we decide that target information should be a mandatory part of a module,
> there's another question about what we should do with existing .bc and .ll
> files that don't have one. Load in a default of the host machine?

For tools that don't link with target (llvm-as and llvm-dis being the
most extreme cases) it would have to be the default "". For opt I
would be ok with "" or the host triple.

Thanks,
Rafael

Jeremy Lakeman

unread,

Jan 30, 2014, 4:56:15 PM1/30/14

to Matt Arsenault, LLVM Developers Mailing List

In a very general sense, I would recommend this approach.

Push all of the existing "No datalayout" behaviour decisions into a default data layout. Keep the behaviour, but tidy up the API.

While LLVM is not designed to be a target independent IR, particularly for compiling C. Some other frontend languages may wish to use it that way.

Nick Lewycky

unread,

Jan 31, 2014, 8:21:31 PM1/31/14

to David Chisnall, LLVM Developers Mailing List

On 30 January 2014 02:10, David Chisnall <David.C...@cl.cam.ac.uk> wrote:

On 30 Jan 2014, at 00:04, Nick Lewycky <nlew...@google.com> wrote:

> This is also what many clang tests do, where TUs get parsed using the host triple. If we keep target datalayout out of the test files and fill it in with the host's information, then our test coverage expands as our buildbot diversity grows, which is a neat property.

Unfortunately, reproducibility suffers. You commit a change, a test fails on two buildbots but passes on all of the others and on your local system. Now what do you do?

There's two issues here. One is what to do if we encounter a .ll/.bc with no target data. We're obliged to support llvm 3.0 bitcode files, so we need to have an answer to this question.

Second is what to do in our test suite. If the answer to the first question is "make it use the host target data" then the second part is a choice either to leave the tests with no explicit layout and thereby use the host target, or to require that tests in the testsuite specify their datalayout. The tradeoff is that in one case we get more coverage across different machines, and in the other case we get better reproducibility, which is important for a regression suite or for a new user to verify that their build of llvm is valid.

I've already hit this problem in clang, with host-defined tool search paths leaking into the tests and causing them to fail on Windows only. It's hard to fix a bug that causes a buildbot failure if you can't reproduce it. At the very least, the target / data layout should be in the failure message that the test suite generates in case of failure so that you can reproduce it locally if a buildbot reports failure.

Exactly. As long as it's easy to grab the target datalayout from a buildbot, we can slap it into our .ll file and reproduce the failure.

I can see both sides, and I know my preference, but I'd like to form consensus.

Nick

Nick Lewycky

unread,

Jan 31, 2014, 8:23:35 PM1/31/14

to Philip Reames, LLVM Developers Mailing List

On 30 January 2014 09:55, Philip Reames <list...@philipreames.com> wrote:

On 1/29/14 3:40 PM, Nick Lewycky wrote:

The LLVM Module has an optional target triple and target datalayout. Without them, an llvm::DataLayout can't be constructed with meaningful data. The benefit to making them optional is to permit optimization that would work across all possible DataLayouts, then allow us to commit to a particular one at a later point in time, thereby performing more optimization in advance.

This feature is not being used. Instead, every user of LLVM IR in a portability system defines one or more standardized datalayouts for their platform, and shims to place calls with the outside world. The primary reason for this is that independence from DataLayout is not sufficient to achieve portability because it doesn't also represent ABI lowering constraints. If you have a system that attempts to use LLVM IR in a portable fashion and does it without standardizing on a datalayout, please share your experience.

Nick, I don't have a current system in place, but I do want to put forward an alternate perspective.

We've been looking at doing late insertion of safepoints for garbage collection. One of the properties that we end up needing to preserve through all the optimizations which precede our custom rewriting phase is that the optimizer has not chosen to "hide" pointers from us by using ptrtoint and integer math tricks. Currently, we're simply running a verification pass before our rewrite, but I'm very interested long term in constructing ways to ensure a "gc safe" set of optimization passes.

As a general rule passes need to support the whole of what the IR can support. Trying to operate on a subset of IR seems like a losing battle, unless you can show a mapping from one to the other (ie., using code duplication to remove all unnatural loops from IR, or collapsing a function to having a single exit node).

What language were you planning to do this for? Does the language permit the user to convert pointers to integers and vice versa? If so, what do you do if the user program writes a pointer out to a file, reads it back in later, and uses it?

One of the ways I've been thinking about - but haven't actually implemented yet - is to deny the optimization passes information about pointer sizing.

Right, pointer size (address space size) will become known to all parts of the compiler. It's not even going to be just the optimizations, ConstantExpr::get is going to grow smarter because of this, as lib/Analysis/ConstantFolding.cpp merges into lib/IR/ConstantFold.cpp. That is one of the major benefits that's driving this. (All parts of the compiler will also know endian-ness, which means we can constant fold loads, too.)

Under the assumption that an opto pass can't insert an ptrtoint cast without knowing a safe integer size to use, this seems like it would outlaw a class of optimizations we'd be broken by.

Optimization passes generally prefer converting ptrtoint and inttoptr to GEPs whenever possible. I expect that we'll end up with *fewer* ptr<->int conversions with this change, because we'll know enough about the target to convert them into GEPs.

My understanding is that the only current way to do this would be to not specify a DataLayout. (And hack a few places with built in assumptions. Let's ignore that for the moment.) With your proposed change, would there be a clean way to express something like this?

I think your GC placement algorithm needs to handle inttoptr and ptrtoint, whichever way this discussion goes. Sorry. I'd be happy to hear others chime in -- I know I'm not an expert in this area or about GCs -- but I don't find this rationale compelling.

p.s. From reading the mailing list a while back, I suspect that the SPIR folks might have similar needs. (i.e. hiding pointer sizes, etc..) Pure speculation on my part though.

The SPIR spec specifies two target datalayouts, one for 32 bits and one for 64 bits.

Nick

Chandler Carruth

unread,

Feb 1, 2014, 3:57:09 AM2/1/14

to Nick Lewycky, LLVM Developers Mailing List

FWIW, I strongly support making this a mandatory part of the module. There is *so* much code to delete, this clearly simplifies the IR model.

On Fri, Jan 31, 2014 at 5:21 PM, Nick Lewycky <nlew...@google.com> wrote:

On 30 January 2014 02:10, David Chisnall <David.C...@cl.cam.ac.uk> wrote:

On 30 Jan 2014, at 00:04, Nick Lewycky <nlew...@google.com> wrote:

> This is also what many clang tests do, where TUs get parsed using the host triple. If we keep target datalayout out of the test files and fill it in with the host's information, then our test coverage expands as our buildbot diversity grows, which is a neat property.

Unfortunately, reproducibility suffers. You commit a change, a test fails on two buildbots but passes on all of the others and on your local system. Now what do you do?

There's two issues here. One is what to do if we encounter a .ll/.bc with no target data. We're obliged to support llvm 3.0 bitcode files, so we need to have an answer to this question.

Second is what to do in our test suite. If the answer to the first question is "make it use the host target data" then the second part is a choice either to leave the tests with no explicit layout and thereby use the host target, or to require that tests in the testsuite specify their datalayout. The tradeoff is that in one case we get more coverage across different machines, and in the other case we get better reproducibility, which is important for a regression suite or for a new user to verify that their build of llvm is valid.

Since you mentioned this I've been of two minds, but increasingly I think following the host is the wrong behavior here.

Following the host makes lots of sense for Clang because a) there is a reasonable portable subset of C and C++ in which we can (and should) write test cases, and b) there is a need for Clang itself to default to the host, so exercising that behavior seems reasonable. Still, we try to isolate many parts of it by using the CC1 layer, etc, to make things more explicit and more reproducible. It isn't perfect, but the tradeoff makes sense to me.

With llc we have a different tradeoff -- without following the host there is really nothing sensible to do. While I would have a mild preference for failing to specify a target being an error, :: shrug ::, this just doesn't seem to matter much.

But with opt, I feel like there is a different and better tradeoff. Here, the primary use case is driving, testing, experimenting, and regression analysis of the optimizer. Also, there seems to be a very good default of a completely boring or "typical" layout with minimal information. This might correspond to "", or not, it doesn't matter to me. So I would set up a stable and reliable default. That way test cases and other things can be simpler and totally reproducible. We can generate RUN lines with different triples and/or data layouts to get testing across different configurations, but it doesn't really seem like the primary role we need from the opt tool.

So to sum up, I'm increasingly a fan of what makes testing and playing with optimizations easier. A simple, representative default that is consistent across all platforms seems to fit that bill really nicely.

Philip Reames

unread,

Feb 5, 2014, 12:45:02 PM2/5/14

to Nick Lewycky, LLVM Developers Mailing List

On 1/31/14 5:23 PM, Nick Lewycky wrote:

On 30 January 2014 09:55, Philip Reames <list...@philipreames.com> wrote:

On 1/29/14 3:40 PM, Nick Lewycky wrote:

The LLVM Module has an optional target triple and target datalayout. Without them, an llvm::DataLayout can't be constructed with meaningful data. The benefit to making them optional is to permit optimization that would work across all possible DataLayouts, then allow us to commit to a particular one at a later point in time, thereby performing more optimization in advance.

This feature is not being used. Instead, every user of LLVM IR in a portability system defines one or more standardized datalayouts for their platform, and shims to place calls with the outside world. The primary reason for this is that independence from DataLayout is not sufficient to achieve portability because it doesn't also represent ABI lowering constraints. If you have a system that attempts to use LLVM IR in a portable fashion and does it without standardizing on a datalayout, please share your experience.

Nick, I don't have a current system in place, but I do want to put forward an alternate perspective.

We've been looking at doing late insertion of safepoints for garbage collection. One of the properties that we end up needing to preserve through all the optimizations which precede our custom rewriting phase is that the optimizer has not chosen to "hide" pointers from us by using ptrtoint and integer math tricks. Currently, we're simply running a verification pass before our rewrite, but I'm very interested long term in constructing ways to ensure a "gc safe" set of optimization passes.

As a general rule passes need to support the whole of what the IR can support. Trying to operate on a subset of IR seems like a losing battle, unless you can show a mapping from one to the other (ie., using code duplication to remove all unnatural loops from IR, or collapsing a function to having a single exit node).

What language were you planning to do this for? Does the language permit the user to convert pointers to integers and vice versa? If so, what do you do if the user program writes a pointer out to a file, reads it back in later, and uses it?

Java - which does not permit arbitrary pointer manipulation. (Well, without resorting to mechanism like JNI and sun.misc.Unsafe. Doing so would be explicitly undefined behavior though.) We also use raw pointer manipulations in our implementation (which is eventually inlined), but this happens after the safepoint insertion rewrite.

We strictly control the input IR. As a result, I can insure that the initial IR meets our subset requirements. In practice, all of the opto passes appear to preserve these invariants (i.e. not introducing inttoptr), but we'd like to justify that a bit more.

One of the ways I've been thinking about - but haven't actually implemented yet - is to deny the optimization passes information about pointer sizing.

Right, pointer size (address space size) will become known to all parts of the compiler. It's not even going to be just the optimizations, ConstantExpr::get is going to grow smarter because of this, as lib/Analysis/ConstantFolding.cpp merges into lib/IR/ConstantFold.cpp. That is one of the major benefits that's driving this. (All parts of the compiler will also know endian-ness, which means we can constant fold loads, too.)

I would argue that all of the pieces you mentioned are performing optimizations. :) However, the exact semantics are unimportant for the overall discussion.

Under the assumption that an opto pass can't insert an ptrtoint cast without knowing a safe integer size to use, this seems like it would outlaw a class of optimizations we'd be broken by.

Optimization passes generally prefer converting ptrtoint and inttoptr to GEPs whenever possible.

This is good to hear and helps us.

I expect that we'll end up with *fewer* ptr<->int conversions with this change, because we'll know enough about the target to convert them into GEPs.

Er, I'm confused by this. Why would not knowing the size of a pointer case a GEP to be converted to a ptr <-> int conversion?

Or do you mean that after the change conversions in the original input IR are more likely to be recognized?

My understanding is that the only current way to do this would be to not specify a DataLayout. (And hack a few places with built in assumptions. Let's ignore that for the moment.) With your proposed change, would there be a clean way to express something like this?

I think your GC placement algorithm needs to handle inttoptr and ptrtoint, whichever way this discussion goes. Sorry. I'd be happy to hear others chime in -- I know I'm not an expert in this area or about GCs -- but I don't find this rationale compelling.

The key assumption I didn't initially explain is that the initial IR couldn't contain conversions. With that added, do you still see concerns? I'm fairly sure I don't need to handle general ptr <-> int conversions. If I'm wrong, I'd really like to know it.

p.s. From reading the mailing list a while back, I suspect that the SPIR folks might have similar needs. (i.e. hiding pointer sizes, etc..) Pure speculation on my part though.

The SPIR spec specifies two target datalayouts, one for 32 bits and one for 64 bits.

Good to know. Thanks.

Nick

Philip

Nick Lewycky

unread,

Feb 10, 2014, 8:25:16 PM2/10/14

to Philip Reames, LLVM Developers Mailing List

On 5 February 2014 09:45, Philip Reames <list...@philipreames.com> wrote:

On 1/31/14 5:23 PM, Nick Lewycky wrote:

On 30 January 2014 09:55, Philip Reames <list...@philipreames.com> wrote:

On 1/29/14 3:40 PM, Nick Lewycky wrote:

The LLVM Module has an optional target triple and target datalayout. Without them, an llvm::DataLayout can't be constructed with meaningful data. The benefit to making them optional is to permit optimization that would work across all possible DataLayouts, then allow us to commit to a particular one at a later point in time, thereby performing more optimization in advance.

This feature is not being used. Instead, every user of LLVM IR in a portability system defines one or more standardized datalayouts for their platform, and shims to place calls with the outside world. The primary reason for this is that independence from DataLayout is not sufficient to achieve portability because it doesn't also represent ABI lowering constraints. If you have a system that attempts to use LLVM IR in a portable fashion and does it without standardizing on a datalayout, please share your experience.

Nick, I don't have a current system in place, but I do want to put forward an alternate perspective.

We've been looking at doing late insertion of safepoints for garbage collection. One of the properties that we end up needing to preserve through all the optimizations which precede our custom rewriting phase is that the optimizer has not chosen to "hide" pointers from us by using ptrtoint and integer math tricks. Currently, we're simply running a verification pass before our rewrite, but I'm very interested long term in constructing ways to ensure a "gc safe" set of optimization passes.

As a general rule passes need to support the whole of what the IR can support. Trying to operate on a subset of IR seems like a losing battle, unless you can show a mapping from one to the other (ie., using code duplication to remove all unnatural loops from IR, or collapsing a function to having a single exit node).

What language were you planning to do this for? Does the language permit the user to convert pointers to integers and vice versa? If so, what do you do if the user program writes a pointer out to a file, reads it back in later, and uses it?

Java - which does not permit arbitrary pointer manipulation. (Well, without resorting to mechanism like JNI and sun.misc.Unsafe. Doing so would be explicitly undefined behavior though.) We also use raw pointer manipulations in our implementation (which is eventually inlined), but this happens after the safepoint insertion rewrite.

We strictly control the input IR. As a result, I can insure that the initial IR meets our subset requirements. In practice, all of the opto passes appear to preserve these invariants (i.e. not introducing inttoptr), but we'd like to justify that a bit more.

One of the ways I've been thinking about - but haven't actually implemented yet - is to deny the optimization passes information about pointer sizing.

Right, pointer size (address space size) will become known to all parts of the compiler. It's not even going to be just the optimizations, ConstantExpr::get is going to grow smarter because of this, as lib/Analysis/ConstantFolding.cpp merges into lib/IR/ConstantFold.cpp. That is one of the major benefits that's driving this. (All parts of the compiler will also know endian-ness, which means we can constant fold loads, too.)

I would argue that all of the pieces you mentioned are performing optimizations. :) However, the exact semantics are unimportant for the overall discussion.

Under the assumption that an opto pass can't insert an ptrtoint cast without knowing a safe integer size to use, this seems like it would outlaw a class of optimizations we'd be broken by.

Optimization passes generally prefer converting ptrtoint and inttoptr to GEPs whenever possible.

This is good to hear and helps us.

I expect that we'll end up with *fewer* ptr<->int conversions with this change, because we'll know enough about the target to convert them into GEPs.

Er, I'm confused by this. Why would not knowing the size of a pointer case a GEP to be converted to a ptr <-> int conversion?

Having target data means we can convert inttoptr/ptrtoint into GEPs, particularly in constant expression folding.

Or do you mean that after the change conversions in the original input IR are more likely to be recognized?

My understanding is that the only current way to do this would be to not specify a DataLayout. (And hack a few places with built in assumptions. Let's ignore that for the moment.) With your proposed change, would there be a clean way to express something like this?

I think your GC placement algorithm needs to handle inttoptr and ptrtoint, whichever way this discussion goes. Sorry. I'd be happy to hear others chime in -- I know I'm not an expert in this area or about GCs -- but I don't find this rationale compelling.

The key assumption I didn't initially explain is that the initial IR couldn't contain conversions. With that added, do you still see concerns? I'm fairly sure I don't need to handle general ptr <-> int conversions. If I'm wrong, I'd really like to know it.

So we met at the social and talked about this at length. I'll repeat most of the conversation so that it's on the mailing list, and also I've had some additional thoughts since then.

You're using the llvm type system to detect when something is a pointer, and then you rely on knowing what's a pointer to deduce garbage collection roots. We're supposed to have the llvm.gcroots intrinsic for this purpose, but you note that it prevents gc roots from being in registers (they must be in memory somewhere, usually on the stack), and that fixing it is more work than is reasonable.

Your IR won't do any shifty pointer-int conversion shenanigans, and you want some assurance that an optimization won't introduce them, or that if one does then you can call it out as a bug and get it fixed. I think that's reasonable, but I also think it's something we need to put forth before llvm-dev.

Note that pointer-to-int conversions aren't necessarily just the ptrtoint/inttoptr instructions (and constant expressions), there's also casting between { i64 }* and { i8* }* and such. Are there legitimate reasons an optz'n would introduce a cast? I think that anywhere in the mid-optimizer, conflating integers and pointers is only going to be bad for both the integer optimizations and the pointer optimizations.

It may make sense as part of lowering -- suppose we find two alloca's, one i64 and one i8* and find that their lifetimes are distinct, and i64 and i8* are the same size, so we merge them. Because of how this would interfere, I don't think this belongs anywhere in the mid-optimizer, it would have to happen late, after lowering. That suggests that there's a point in the pass pipeline where the IR is "canonical enough" that this will actually work.

Is that reasonable? Can we actually guarantee that, that any pass which would break this goes after a common gc-root insertion spot? Do we need (want?) to push back and say "no, sorry, make GC roots better instead"?

Nick

David Chisnall

unread,

Feb 11, 2014, 6:27:23 AM2/11/14

to Nick Lewycky, LLVM Developers Mailing List

On 11 Feb 2014, at 01:25, Nick Lewycky <nlew...@google.com> wrote:

> Your IR won't do any shifty pointer-int conversion shenanigans, and you want some assurance that an optimization won't introduce them, or that if one does then you can call it out as a bug and get it fixed. I think that's reasonable, but I also think it's something we need to put forth before llvm-dev.
>
> Note that pointer-to-int conversions aren't necessarily just the ptrtoint/inttoptr instructions (and constant expressions), there's also casting between { i64 }* and { i8* }* and such. Are there legitimate reasons an optz'n would introduce a cast? I think that anywhere in the mid-optimizer, conflating integers and pointers is only going to be bad for both the integer optimizations and the pointer optimizations.
>
> It may make sense as part of lowering -- suppose we find two alloca's, one i64 and one i8* and find that their lifetimes are distinct, and i64 and i8* are the same size, so we merge them. Because of how this would interfere, I don't think this belongs anywhere in the mid-optimizer, it would have to happen late, after lowering. That suggests that there's a point in the pass pipeline where the IR is "canonical enough" that this will actually work.
>
> Is that reasonable? Can we actually guarantee that, that any pass which would break this goes after a common gc-root insertion spot? Do we need (want?) to push back and say "no, sorry, make GC roots better instead"?

I am not currently working on GC, but I am working on a back end for an architecture in which pointer integrity is enforced in hardware and pointers and integers have different representations and different representations and so I would also find much of this contract for optimisations useful. Round tripping via an int involves data loss on my architecture and having optimisations insert these can be annoying (and break security properties). I imagine that the situation is similar for most software-enforced memory safety tools, not just GC.

David

Chandler Carruth

unread,

Feb 11, 2014, 6:37:45 AM2/11/14

to David Chisnall, LLVM Developers Mailing List

While I find all of these things very interesting from the perspective of security and/or hardware constraints, I don't think we should try to deal with that here.

Today, even without a datalayout, I suspect LLVM is not providing nearly the guarantee that either of these use cases is looking for. It may well work by happenstance, but hope isn't a strategy. If we want to add this constraint to LLVM, let's discuss that separately. I don't think we have it today, and I don't think making datalayout mandatory meaningfully moves us further from having it. At worst it causes already possible random failures to become more common.

David Chisnall

unread,

Feb 11, 2014, 6:41:22 AM2/11/14

to Chandler Carruth, LLVM Developers Mailing List

On 11 Feb 2014, at 11:37, Chandler Carruth <chan...@google.com> wrote:

> I find all of these things very interesting from the perspective of security and/or hardware constraints, I don't think we should try to deal with that here.
>
> Today, even without a datalayout, I suspect LLVM is not providing nearly the guarantee that either of these use cases is looking for. It may well work by happenstance, but hope isn't a strategy. If we want to add this constraint to LLVM, let's discuss that separately. I don't think we have it today, and I don't think making datalayout mandatory meaningfully moves us further from having it. At worst it causes already possible random failures to become more common.

I agree. Having the DataLayout present makes these easier to enforce. It would also be nice if the data layout could encode a little bit more information about pointers (is an integer representation, casts to these address spaces are valid), but that's a separate discussion.

Philip Reames

unread,

Feb 14, 2014, 1:56:39 PM2/14/14

to Chandler Carruth, David Chisnall, LLVM Developers Mailing List

I agree that this is not the right place to continue this discussion. I had intended to write up a proposal last week, but instead got distracted by actually writing code. I should have separate proposal along these lines to the mailing list today. I think we found a working middle ground in offline discussion, I'm hoping it won't be particularly controversial.

Philip

Philip Reames

unread,

Feb 14, 2014, 1:59:38 PM2/14/14

to Nick Lewycky, LLVM Developers Mailing List

Nick,

Thanks for writing up the summary of our conversation. I have a couple of small clarifications to make, but I'm going to move that into a separate thread since the discussion has largely devolved from the original topic.

To repeat my comment from last week, I support your proposed change w.r.t. DataLayout.

Philip

Philip Reames

unread,

Feb 14, 2014, 8:55:24 PM2/14/14

to LLVM Developers Mailing List

Splitting out a conversation which started in "make DataLayout a mandatory part of Module" since the topic has decidedly changed. This also relates to the email "RFC: GEP as canonical form for pointer addressing" I just sent.

Correct.

We're supposed to have the llvm.gcroots intrinsic for this purpose, but you note that it prevents gc roots from being in registers (they must be in memory somewhere, usually on the stack), and that fixing it is more work than is reasonable.

This is slightly off, but probably close to what I actually said even if not quite what I meant. :)

I'm going to skip this and respond with a fuller explanation Monday. I'd written an explanation once, realized it was wrong, and decided I should probably revisit when fully awake.

Fundamentally, I believe that gc.roots could be made to work, even with decent (but not optimal) performance in the end. We may even contribute some patches towards fixing issues with the gc.root mechanism just to make a fair comparison. I just don't believe it's the right approach or the best way to reach the end goal.

Your IR won't do any shifty pointer-int conversion shenanigans, and you want some assurance that an optimization won't introduce them, or that if one does then you can call it out as a bug and get it fixed. I think that's reasonable, but I also think it's something we need to put forth before llvm-dev.

Correct and agreed. I split this part off into a separate proposal under the subject "RFC: GEP as canonical form for pointer addressing".

Note that pointer-to-int conversions aren't necessarily just the ptrtoint/inttoptr instructions (and constant expressions), there's also casting between { i64 }* and { i8* }* and such. Are there legitimate reasons an optz'n would introduce a cast? I think that anywhere in the mid-optimizer, conflating integers and pointers is only going to be bad for both the integer optimizations and the pointer optimizations.

It may make sense as part of lowering -- suppose we find two alloca's, one i64 and one i8* and find that their lifetimes are distinct, and i64 and i8* are the same size, so we merge them. Because of how this would interfere, I don't think this belongs anywhere in the mid-optimizer, it would have to happen late, after lowering. That suggests that there's a point in the pass pipeline where the IR is "canonical enough" that this will actually work.

I agree this is possible, even with my proposal. In fact, we already have a stack colouring pass in tree which does exactly what your example illustrates. However, this is done well after CodeGenPrepare and is thus after we start relaxing canonical form anyway.

A couple of other transforms which could potentially be problematic:
- load widening
- vectorization (when the vector element type looses the 'pointerness')

In each of these cases, we have clear ways of expressing the transformation in ways which preserve type information. (i.e. struct types, vector element types, etc..) I would hope we could move towards these cleaner representations. (Note: I haven't checked the current implementations. I should do so.)

My view of this is that any optimization which lost type information in such a manner without good cause would be poor style to begin with. I would hope that patches to remove such information loss would be accepted so long as there was a reasonable alternative. (I'm assuming this is already true; if it's not, let me know.)

(In case it's not clear, being past CodeGenPrepare and lowering for a specific target would be a "good reason".)

Is that reasonable? Can we actually guarantee that, that any pass which would break this goes after a common gc-root insertion spot? Do we need (want?) to push back and say "no, sorry, make GC roots better instead"?

I think it is, but am open to being convinced otherwise. :)

Philip

Philip Reames

unread,

Feb 18, 2014, 2:55:17 PM2/18/14

to LLVM Developers Mailing List

One thing I thought of over the weekend: all of the transformations discussed above are already illegal unless they explicitly preserve the address space of the pointer.

Doesn't prevent them from existing, but it does increase the odds they're already buggy.

Philip

Philip Reames

unread,

Feb 21, 2014, 1:37:52 PM2/21/14

to LLVM Developers Mailing List

On 02/14/2014 05:55 PM, Philip Reames wrote:

Splitting out a conversation which started in "make DataLayout a mandatory part of Module" since the topic has decidedly changed. This also relates to the email "RFC: GEP as canonical form for pointer addressing" I just sent.

On 02/10/2014 05:25 PM, Nick Lewycky wrote:

...

We're supposed to have the llvm.gcroots intrinsic for this purpose, but you note that it prevents gc roots from being in registers (they must be in memory somewhere, usually on the stack), and that fixing it is more work than is reasonable.

This is slightly off, but probably close to what I actually said even if not quite what I meant. :)

I'm going to skip this and respond with a fuller explanation Monday. I'd written an explanation once, realized it was wrong, and decided I should probably revisit when fully awake.

Fundamentally, I believe that gc.roots could be made to work, even with decent (but not optimal) performance in the end. We may even contribute some patches towards fixing issues with the gc.root mechanism just to make a fair comparison. I just don't believe it's the right approach or the best way to reach the end goal.

So, not quite on Monday, but I did get around to writing up an explanation of what's wrong with using gcroot. It turned out to be much longer than I expected, so I turned it into a blog post:
http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/

The very short version: gcroot loses roots (for any GC) due to bad interaction with the optimizer, and gcroot doesn't capture all copies of a pointer root which fundamentally breaks collectors which relocate roots. The only way I know to make gcroot (in its current form) work reliably for all collectors is to insert safepoints very early, which has highly negative performance impacts. There are some (potentially) cheaper but ugly hacks available if you don't need to relocate roots.

There's also going to be a follow up post on implementation problems, but that's completely separate from the fundamental problems.

Philip

Andrew Trick

unread,

Feb 24, 2014, 3:45:16 AM2/24/14

to Philip Reames, LLVM Developers Mailing List

Thanks for the writeup. FWIW my understanding of gcroot has always been that the call to invoke GC is “extern” and not readonly, so we can’t do store->load forwarding on the escaped pointer across it. I have never used gcroot myself.

-Andy

Philip Reames

unread,

Feb 24, 2014, 2:17:19 PM2/24/14

to Andrew Trick, LLVM Developers Mailing List

Andy, I'm not clear what you're trying to say here. Could you rephrase? In particular, what do you mean by "call to invoke GC"?

Philip

Andrew Trick

unread,

Feb 24, 2014, 2:27:10 PM2/24/14

to Philip Reames, LLVM Developers Mailing List

I mean a call site that we think of as a safepoint could potentially call to the runtime and block while GC runs. We can’t let LLVM optimize hoist loads or sink stores across that call.

-Andy

Philip Reames

unread,

Feb 24, 2014, 4:26:28 PM2/24/14

to Andrew Trick, LLVM Developers Mailing List

Ah, okay. I think get where you're coming from.

For call safepoints, if you assume the call itself prevents the optimization, you're mostly okay. This is problematic if you want to have a safepoint on a read-only call (for example), but could be hacked around.

The problem comes up with backedge, function entry, and function return safepoints. Given there is no example in tree (or out of tree that I know of) which uses these, it's a little hard to tell how it's supposed to work. My belief is that the findCustomSafePoints callback on GCStrategy is supposed to insert these. The problem is that this pass is a MachineFunction pass and this runs long after optimization.

The alternate approach - which I believe you're assuming - is to insert calls for each safepoint explicitly before optimization. (In the blog post, I refer to this as "early safepoint insertion".) Unless I'm badly misreading both documentation and code, that's not how gcroot is expecting to be used. I agree that e.s.p. would work from a correctness standpoint for the first example I listed. It doesn't solve the second case though.

Philip

Andrew Trick

unread,

Feb 24, 2014, 4:48:13 PM2/24/14

to Philip Reames, LLVM Developers Mailing List

On Feb 24, 2014, at 1:26 PM, Philip Reames <list...@philipreames.com> wrote:

On 02/24/2014 11:27 AM, Andrew Trick wrote:

On Feb 24, 2014, at 11:17 AM, Philip Reames <list...@philipreames.com> wrote:

On 02/24/2014 12:45 AM, Andrew Trick wrote:

On Feb 21, 2014, at 10:37 AM, Philip Reames <list...@philipreames.com> wrote:

On 02/14/2014 05:55 PM, Philip Reames wrote:

Splitting out a conversation which started in "make DataLayout a mandatory part of Module" since the topic has decidedly changed. This also relates to the email "RFC: GEP as canonical form for pointer addressing" I just sent.

On 02/10/2014 05:25 PM, Nick Lewycky wrote:

...

We're supposed to have the llvm.gcroots intrinsic for this purpose, but you note that it prevents gc roots from being in registers (they must be in memory somewhere, usually on the stack), and that fixing it is more work than is reasonable.

This is slightly off, but probably close to what I actually said even if not quite what I meant. :)

I'm going to skip this and respond with a fuller explanation Monday. I'd written an explanation once, realized it was wrong, and decided I should probably revisit when fully awake.

Fundamentally, I believe that gc.roots could be made to work, even with decent (but not optimal) performance in the end. We may even contribute some patches towards fixing issues with the gc.root mechanism just to make a fair comparison. I just don't believe it's the right approach or the best way to reach the end goal.

So, not quite on Monday, but I did get around to writing up an explanation of what's wrong with using gcroot. It turned out to be much longer than I expected, so I turned it into a blog post:
http://www.philipreames.com/Blog/2014/02/21/why-not-use-gcroot/

The very short version: gcroot loses roots (for any GC) due to bad interaction with the optimizer, and gcroot doesn't capture all copies of a pointer root which fundamentally breaks collectors which relocate roots. The only way I know to make gcroot (in its current form) work reliably for all collectors is to insert safepoints very early, which has highly negative performance impacts. There are some (potentially) cheaper but ugly hacks available if you don't need to relocate roots.

There's also going to be a follow up post on implementation problems, but that's completely separate from the fundamental problems.

Thanks for the writeup. FWIW my understanding of gcroot has always been that the call to invoke GC is “extern” and not readonly, so we can’t do store->load forwarding on the escaped pointer across it. I have never used gcroot myself.

Andy, I'm not clear what you're trying to say here. Could you rephrase? In particular, what do you mean by "call to invoke GC"?

I mean a call site that we think of as a safepoint could potentially call to the runtime and block while GC runs. We can’t let LLVM optimize hoist loads or sink stores across that call.

Ah, okay. I think get where you're coming from.

For call safepoints, if you assume the call itself prevents the optimization, you're mostly okay. This is problematic if you want to have a safepoint on a read-only call (for example), but could be hacked around.

The problem comes up with backedge, function entry, and function return safepoints. Given there is no example in tree (or out of tree that I know of) which uses these, it's a little hard to tell how it's supposed to work. My belief is that the findCustomSafePoints callback on GCStrategy is supposed to insert these. The problem is that this pass is a MachineFunction pass and this runs long after optimization.

I claim that findCustomSafePoints needs to be limited to non-readonly calls. If you don’t already have a call where you want a safepoint, you need to insert one early, before optimization, as you say below.

-Andy

Philip Reames

unread,

Feb 24, 2014, 4:55:29 PM2/24/14

to Andrew Trick, LLVM Developers Mailing List

I think we're in agreement here with regards to correctness.

I'll note for the record, two points:
1) Early safepoint placement does not appear to be the intended usage of gcroot from the documentation.
2) Doing early safepoint insertion is really really bad for optimization/performance. Consider:

int sum = 0;
for(int i = 0; i < 2000; i++) {
sum++;
gc_safepoint();

Reply all

Reply to author

Forward