This RFC is to ask whether the community is interested in further discussion of iN bytes support. Last time the issue was on the agenda in May and the discussion was triggered by Jesper Antonsson's patches (see https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html).
It seems that, while some downstream areas benefit from non-8-bit bytes support, this feature is barely maintainable given the lack of utilization targets in the upstream. The reason why I would like to again raise the matter is that we, the TON Labs team, would like to upstream our backend solution.
The backend generates code for TON virtual machine designed to run smart contracts in TON blockchain (see the original specifications for TVM and TON respectively at https://test.ton.org/tvm.pdf and at https://test.ton.org/tblkch.pdf).
The target has the following key particularities:
Given that the TVM only operates with 257 bits wide numbers, we changed LLVM in downstream to get a 257 bits byte. At the moment, we have a hacky implementation with a new byte size hardcoded. For a reference: the scope was to change approximately 20 files in LLVM and about a dozen in Clang. Later on, we plan to integrate the new byte size with data layout according to https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/. And if the community decides to move on, we will upstream and maintain it.
We realize that a 257 bits byte is quite unusual, but for smart contracts it is ok to have at least 256 bits numbers. The leading VM for smart contracts, Ethereum VM, introduced this practice and other blockchain VMs followed. Thus, while TVM might be the first LLVM-based target for blockchain that needs the feature, it is not necessarily the last one. We also found mentions of 12, 16 and 24 bits wide bytes in non-8-bits byte discussions in the past (in reverse chronological order: https://lists.llvm.org/pipermail/llvm-dev/2019-May/132080.html, http://lists.llvm.org/pipermail/llvm-dev/2017-January/109335.html, http://lists.llvm.org/pipermail/llvm-dev/2017-January/108901.html, http://lists.llvm.org/pipermail/llvm-dev/2015-March/083177.html, http://lists.llvm.org/pipermail/llvm-dev/2014-September/076543.html, http://lists.llvm.org/pipermail/llvm-dev/2009-September/026027.html).
Our Toolchain is going to be based only on OSS. It allows using the backend without getting any proprietary software. Also, we hope that implementation for a target similar to TVM would help to generalize some concepts in LLVM and to make the whole framework better suit non-mainstream architectures.
Aside from non-i8 bytes, we would like to bring stack machine support in the Target Independent Code generator. The matter will be discussed at the developers' meeting, see http://llvm.org/devmtg/2019-10/talk-abstracts.html#bof2.
LLVM and Clang for TVM are available at (https://github.com/tonlabs/TON-Compiler). It is currently under LLVM 7 and it can only produce assembler; we have not specified our object file format yet). Moreover, we have introduced custom IR types to model Tuples, Slices, Builders, Cells from the specification. We are going to do an LLVM update and consider using opaque types before starting to upstream.
--
Kind regards, Dmitry Borisenkov
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
To add to what JF says:
Typically, a byte means some combination of:
1. The smallest unit that can be indexed in memory (irrelevant for you,
you have no memory).
2. The smallest unit that can be stored in a register in such a way that
its representation is opaque to software (i.e. you can't tell the bit
order of a byte in a multi-byte word). For you, it's not clear if this
is 257 bits or something smaller.
3. The smallest unit that is used to build complex types in software.
Since you have no memory, it's not clear that you can build structs or
arrays, and therefore this doesn't seem to apply.
From your description of your VM, it doesn't sound as if you can
translate from any language with a vaguely C-like abstract machine, so
I'm not certain why the size of a byte actually matters to you. LLVM IR
has a quite C-like abstract machine, and several of these features seem
like they will be problematic for you. There is quite a limited subset
of LLVM IR that can be expressed for your VM and it would be helpful if
you could enumerate what you expect to be able to support (and why going
via LLVM is useful, given that you are unlikely to be able to take
advantage of any existing front ends, many optimisations, or most of the
target-agnostic code generator.
David
Right. A 257-bit target is a bit crazy, but there are lots of other targets that only have 16-bit or 32-bit addressable memory. I’ve heard various people saying that they all have out-of-tree patches to support non-8-bit-byte targets, but because there is no in-tree target that uses them, it is very difficult to merge these patches up stream.
I for one would love to see some of these patches get upstreamed. If the only problem is one of testing, then maybe we could make a virtual target exist, or maybe we could accept the patches without test cases (so long as they doesn’t break 8-bit-byte targets obviously).
-Chris
I'm not great at history, are there any historically iconic targets
that aren't 8-bit but are otherwise sane? I'd prefer to spend the
project's resources supporting something like that than either an
invented target or a speculative crypto-currency oddity.
Cheers.
Tim.
PDP10 is iconic enough?
Joerg
> On Oct 29, 2019, at 3:39 PM, Joerg Sonnenberger via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On Tue, Oct 29, 2019 at 07:19:25PM +0000, Tim Northover via llvm-dev wrote:
>> On Tue, 29 Oct 2019 at 19:11, Dmitriy Borisenkov via llvm-dev
>> <llvm...@lists.llvm.org> wrote:
>>> 2. Test with a dummy target. It might work if we have a group of contributors who is willing to rewrite and upstream some of their downstream tests as well as to design and implement the target itself. The issue here might be in functional tests, so we'd probably need to implement a dummy virtual machine to run them because lit tests are unlikely to catch all issues from paragraphs (2) and (3) of the scope described.
>>> 3. TON labs can provide its crazy target or some lightweight version of it. From the testing point of view, it works similar to the second solution, but it doesn't require any inventions. I could create a separate RFC about the target to find out if the community thinks it's appropriate.
>>
>> I'm not great at history, are there any historically iconic targets
>> that aren't 8-bit but are otherwise sane? I'd prefer to spend the
>> project's resources supporting something like that than either an
>> invented target or a speculative crypto-currency oddity.
>
> PDP10 is iconic enough?
Is it relevant to any modern compiler though?
I strongly agree with Tim. As I said in previous threads, unless people will have actual testable targets for this type of thing, I think we shouldn’t add maintenance burden. This isn’t really C or C++ anymore because so much code assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re supporting a different language. IMO they should use a different language, and C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of CHAR_BIT).
> On Oct 29, 2019, at 3:39 PM, Joerg Sonnenberger via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> On Tue, Oct 29, 2019 at 07:19:25PM +0000, Tim Northover via llvm-dev wrote:
>> On Tue, 29 Oct 2019 at 19:11, Dmitriy Borisenkov via llvm-dev
>> <llvm...@lists.llvm.org> wrote:
>>> 2. Test with a dummy target. It might work if we have a group of contributors who is willing to rewrite and upstream some of their downstream tests as well as to design and implement the target itself. The issue here might be in functional tests, so we'd probably need to implement a dummy virtual machine to run them because lit tests are unlikely to catch all issues from paragraphs (2) and (3) of the scope described.
>>> 3. TON labs can provide its crazy target or some lightweight version of it. From the testing point of view, it works similar to the second solution, but it doesn't require any inventions. I could create a separate RFC about the target to find out if the community thinks it's appropriate.
>>
>> I'm not great at history, are there any historically iconic targets
>> that aren't 8-bit but are otherwise sane? I'd prefer to spend the
>> project's resources supporting something like that than either an
>> invented target or a speculative crypto-currency oddity.
>
> PDP10 is iconic enough?
Is it relevant to any modern compiler though?
I strongly agree with Tim. As I said in previous threads, unless people will have actual testable targets for this type of thing, I think we shouldn’t add maintenance burden.
This isn’t really C or C++ anymore because so much code assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re supporting a different language. IMO they should use a different language, and C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of CHAR_BIT).
Strongly agreed here as well.
Philip
Best,--Mehdi
> On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere <Jeroen.D...@synopsys.com> wrote:
>
>
>>
>> From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of JF Bastien via
> [..]
>> Is it relevant to any modern compiler though?
>>
>> I strongly agree with Tim. As I said in previous threads, unless people will have
>> actual testable targets for this type of thing, I think we shouldn’t add
>> maintenance burden. This isn’t really C or C++ anymore because so much code
>> assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re
>> supporting a different language. IMO they should use a different language, and
>> C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of
>> CHAR_BIT).
>
> We (Synopsys ASIP Designer team) and our customers tend to disagree: our customers do create plenty of cpu architectures
> with non-8-bit characters (and non-8-bit addressable memories). We are able to provide them with a working c/c++ compiler solution.
> Maybe some support libraries are not supported out of the box, but for these kind of architectures that is acceptable.
That’s the kind of use case I’d happily support if we had upstream testing, say though a backend. I’m also happy if we remove magic numbers.
Can you share the values you see for CHAR_BIT?
> (Besides that, llvm is also more than just c/c++)
Agreed, I bring up C and C++ because they were the languages discussed in the previous proposals.
My main concern in this discussion is that we're conflating several
concepts of a 'byte':
- The smallest unit that can be loaded / stored at a time.
- The smallest unit that can be addressed with a raw pointer in a
specific address space.
- The largest unit whose encoding is opaque to anything above the ISA.
- The type used to represent `char` in C.
- The type that has a size that all other types are a multiple of.
In POSIX C (which imposes some extra constraints not found in ISO C),
when lowered to LLVM IR, all of these are the same type:
- Loads and stores of values smaller than i8 or not a multiple of i8
may be widened to a multiple of i8. Bitfield fields that are smaller
than i8 must use i8 or wider operations and masking.
- GEP indexes are not well defined for anything that is not a multiple
of i8.
- There is no defined bit order of i8 (or bit order for larger types,
only an assumption that, for example, i32 is 4 i8s in a specific order
specified by the data layout).
- char is lowered to i8.
- All ABI-visible types have a size that is a multiple of 8 bits.
It's not clear to me that saying 'a byte is 257 bits' means changing all
of these to 257 or changing only some of them to 257 (which?). For
example, when compiling C for 16-byte-addressible historic
architectures, typically:
- char is 8 bytes.
- char* and void* is represented as a pointer plus a 1-bit offset
(sometimes encoded in the low bit, so the load / store sequence is a
right shift one, a load, and then a mask or mask and shift depending on
the low bit).
- Other pointer types are 16-bit aligned.
IBM's 36-bit word machines use a broadly similar strategy, though with
some important differences and I would imagine that most Synopsis cores
are going to use some variation on this approach.
This probably involves a quite different design to a model with 257-bit
registers, but most of the concerns don't exist if you don't have memory
that can store byte arrays and so involve very different design decisions.
TL;DR: A proposal for supporting non-8-bit bytes needs to explain what
their expected lowerings are and what they mean by a byte.
David
> On Oct 30, 2019, at 3:07 AM, Jeroen Dobbelaere via llvm-dev <llvm...@lists.llvm.org> wrote:
>
>> From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of JF Bastien via
> [..]
>> Is it relevant to any modern compiler though?
>>
>> I strongly agree with Tim. As I said in previous threads, unless people will have
>> actual testable targets for this type of thing, I think we shouldn’t add
>> maintenance burden. This isn’t really C or C++ anymore because so much code
>> assumes CHAR_BIT == 8, or at a minimum CHAR_BIT % 8 == 0, that we’re
>> supporting a different language. IMO they should use a different language, and
>> C / C++ should only allow CHAR_BIT % 8 == 0 (and only for small values of
>> CHAR_BIT).
>
> We (Synopsys ASIP Designer team) and our customers tend to disagree: our customers do create plenty of cpu architectures
> with non-8-bit characters (and non-8-bit addressable memories). We are able to provide them with a working c/c++ compiler solution.
> Maybe some support libraries are not supported out of the box, but for these kind of architectures that is acceptable.
> (Besides that, llvm is also more than just c/c++)
I agree - there are a lot of weird accelerators with LLVM backends, many of them aren’t targeted by C compilers/code. The ones that do have C frontends often use weird dialects or lots of builtins, but they are still useful to support.
I find this thread to be a bit confusing: it seems that people are aware that such chips exists (even today) but some folks are reticent to add generic support for them. While I can see the concern about inventing new backends just for testing, I don’t see an argument against generalizing the core and leaving it untested (in master). If any bugs creep in, then people with downstream targets can fix them in core.
-Chris
So, if I understand correctly, your memory is a key-value store where
the keys are 257-bit values and the values are arrays of 257-bit values?
Or they values are 257-bit values? To help the discussion, please can
you explain how the following are represented:
- A pointer to an object.
- A pointer to a field in an object.
- An arbitrary void*.
- The C string "hello world"
> So regarding concepts of byte, all 5 statements you gave are true for
> our target. Either due to the specification or because of
> performance (gas consumption) issues. But if there are architectures
> that need less from the notion of byte, we should try to figure out the
> common denominator. It's probably ok to be less restrictive about a byte.
It seems odd to encode a C string as an array of 257-bit values, rather
than as an array of 8-bit values that are stored in 32-char chunks.
A summary of the discussion so far.
It seems that there are two possible solutions on how to move forward with non 8 bits byte:
1. Commit changes without tests. Chris Lattner, Mikael Holmen, Jeroen Dobbelaere, Jesper Antonsson support this idea.James Y Knight says that at least magic numbers should be removed "at least where it arguably helps code clarity". This might be not exactly the scope of the changes discussed, but it's probably worth do discuss code clarity having concrete patches.
GCC (according to James Y Knight) has the same practice meaning non-8 bits byte is supported but there are no tests in upstream and we have downstream contributors who will fix the bugs if they appear in the LLVM core.David Chisnall raised a question about what to count as a byte (which defines the scope of the changes) and we suggest to use all 5 criteria he granted:
> - The smallest unit that can be loaded / stored at a time.> - The smallest unit that can be addressed with a raw pointer in a specific address space.
> - The largest unit whose encoding is opaque to anything above the ISA.
> - The type used to represent `char` in C.
> - The type that has a size that all other types are a multiple of.
But if DSPs are less restrictive about byte, some of the criteria could be removed.
2. Use an iconic target. PDP10 was suggested as a candidate. This opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI, Philip Reames. It's not clear though does this opinion oppose upstreaming non-8-bits byte without tests or just a dummy and TVM targets options.
So if there is no strong opposition to the solution 1 from the people supporting an iconic target option, we could probably move to the patches.
I also support option 1, although having an in-tree target would certainly be better. That having been said, I'm assuming that the byte size will be part of the DataLayout and, as a result, we can certainly have unit tests and tests for IR-level changes
in the usual way. We should only omit tests where we can't currently have them.
-Hal
-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
> On Nov 1, 2019, at 3:41 AM, Dmitriy Borisenkov via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> A summary of the discussion so far.
>
> It seems that there are two possible solutions on how to move forward with non 8 bits byte:
>
> 1. Commit changes without tests. Chris Lattner, Mikael Holmen, Jeroen Dobbelaere, Jesper Antonsson support this idea.
> James Y Knight says that at least magic numbers should be removed "at least where it arguably helps code clarity". This might be not exactly the scope of the changes discussed, but it's probably worth do discuss code clarity having concrete patches.
> GCC (according to James Y Knight) has the same practice meaning non-8 bits byte is supported but there are no tests in upstream and we have downstream contributors who will fix the bugs if they appear in the LLVM core.
> David Chisnall raised a question about what to count as a byte (which defines the scope of the changes) and we suggest to use all 5 criteria he granted:
> > - The smallest unit that can be loaded / stored at a time.
> > - The smallest unit that can be addressed with a raw pointer in a specific address space.
> > - The largest unit whose encoding is opaque to anything above the ISA.
> > - The type used to represent `char` in C.
> > - The type that has a size that all other types are a multiple of.
> But if DSPs are less restrictive about byte, some of the criteria could be removed.
>
> 2. Use an iconic target. PDP10 was suggested as a candidate. This opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI, Philip Reames. It's not clear though does this opinion oppose upstreaming non-8-bits byte without tests or just a dummy and TVM targets options.
>
> So if there is no strong opposition to the solution 1 from the people supporting an iconic target option, we could probably move to the patches.
I'm in camp (2). Any changes that are not tested are an invitation to upstream developers to "simplify" the code, not knowing that those changes are important. Anyone who commits untested changes to LLVM will inevitably face an uphill battle against benevolent NFC refactorings that break these changes because the expectation of how the code is supposed to behave is not codified in a test. In the short term option (1) sounds more appealing because they can start right away, but I'm going to predict that it will be more expensive for the downstream maintainers of non 8-bit targets in the long term.
-- adrian
A summary of the discussion so far.It seems that there are two possible solutions on how to move forward with non 8 bits byte:1. Commit changes without tests. Chris Lattner, Mikael Holmen, Jeroen Dobbelaere, Jesper Antonsson support this idea.James Y Knight says that at least magic numbers should be removed "at least where it arguably helps code clarity". This might be not exactly the scope of the changes discussed, but it's probably worth do discuss code clarity having concrete patches.GCC (according to James Y Knight) has the same practice meaning non-8 bits byte is supported but there are no tests in upstream and we have downstream contributors who will fix the bugs if they appear in the LLVM core.David Chisnall raised a question about what to count as a byte (which defines the scope of the changes) and we suggest to use all 5 criteria he granted:
> - The smallest unit that can be loaded / stored at a time.> - The smallest unit that can be addressed with a raw pointer in a specific address space.
> - The largest unit whose encoding is opaque to anything above the ISA.
> - The type used to represent `char` in C.
> - The type that has a size that all other types are a multiple of.
But if DSPs are less restrictive about byte, some of the criteria could be removed.
2. Use an iconic target. PDP10 was suggested as a candidate. This opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI, Philip Reames. It's not clear though does this opinion oppose upstreaming non-8-bits byte without tests or just a dummy and TVM targets options.
On Fri, Nov 1, 2019 at 8:40 AM Adrian Prantl via llvm-dev <llvm...@lists.llvm.org> wrote:> On Nov 1, 2019, at 3:41 AM, Dmitriy Borisenkov via llvm-dev <llvm...@lists.llvm.org> wrote:
> It seems that there are two possible solutions on how to move forward with non 8 bits byte:
>
> 1. Commit changes without tests. Chris Lattner, Mikael Holmen, Jeroen Dobbelaere, Jesper Antonsson support this idea.
> James Y Knight says that at least magic numbers should be removed "at least where it arguably helps code clarity". This might be not exactly the scope of the changes discussed, but it's probably worth do discuss code clarity having concrete patches.
> GCC (according to James Y Knight) has the same practice meaning non-8 bits byte is supported but there are no tests in upstream and we have downstream contributors who will fix the bugs if they appear in the LLVM core.
> David Chisnall raised a question about what to count as a byte (which defines the scope of the changes) and we suggest to use all 5 criteria he granted:
> > - The smallest unit that can be loaded / stored at a time.
> > - The smallest unit that can be addressed with a raw pointer in a specific address space.
> > - The largest unit whose encoding is opaque to anything above the ISA.
> > - The type used to represent `char` in C.
> > - The type that has a size that all other types are a multiple of.
> But if DSPs are less restrictive about byte, some of the criteria could be removed.
>
> 2. Use an iconic target. PDP10 was suggested as a candidate. This opinion found support from Tim Northover, Joerg Sonenberger, Mehdi AMINI, Philip Reames. It's not clear though does this opinion oppose upstreaming non-8-bits byte without tests or just a dummy and TVM targets options.
>
> So if there is no strong opposition to the solution 1 from the people supporting an iconic target option, we could probably move to the patches.
I'm in camp (2). Any changes that are not tested are an invitation to upstream developers to "simplify" the code, not knowing that those changes are important. Anyone who commits untested changes to LLVM will inevitably face an uphill battle against benevolent NFC refactorings that break these changes because the expectation of how the code is supposed to behave is not codified in a test. In the short term option (1) sounds more appealing because they can start right away, but I'm going to predict that it will be more expensive for the downstream maintainers of non 8-bit targets in the long term.I've worked on multiple codebases where an option existed in order to satisfy an extremely small userbase, with little or no testing,
and as such, I'm adamantly opposed to repeating it.
In addition to what Adrian said, where the weird option exists but is constantly being incidentally broken, I've seen the opposite problem: people become afraid to refactor a section of code because it might break the weird option. You might say "if there aren't any tests, people should feel free to refactor the code; their only responsibility is to make sure the tests will still work", but honestly, I've seen the opposite: without tests, it's just presumed that touching certain parts of code is likely to break things, and so after a time, an aura of untouchability starts to surround regions of the code. And then, the more time goes by, the more that code becomes unfamiliar to everyone (because no one is actively maintaining it). In the long run, the cost of an unmaintained option may be far more than the cost of a maintained one.
In short: please don't commit changes without tests. Even if the test is nothing but making sure this works:int main(int argc, char *argv[]) {return argv[argc - 1][0];}That at least would give some freedom from the guilt of breaking something important.
-- Jorg
Hi,To throw my hat into the ring, we also maintain a downstream target that has a subset of this problem - it has non-8-bit-addressable memory. This means that %uglygeps don't work, pointers can't be casted to i8* and expect to be byte-addressed (conversions to memcpy/memset that use i8* aren't welcome either), and GEP lowering code is slightly different.We maintain this currently as a pile of known technical debt. We have a CGP pass to decompose GEPs and custom-expand them taking our word size into account, but everything before that is just waiting for InstCombine to break something. AliasAnalysis also makes offsetting assumptions that are totally bogus because our pointers are word-addressed and therefore so are pointer offsets.We'd be really keen to help here. We're keen to upstream anything we possibly can (and have been, over the past few months). We've have several discussions about how best to approach upstream with this, and the sticking point has always been lack of a testing target. It's always felt to me that the idea of addressable granule should be a fairly reasonable DataLayout addition; We can test DataLayout changes purely via opt without requiring a target that uses them. Lowering to instructions was always the testing sticking point.We'd be keen to help out what the community decides to do here. I personally feel it's reasonable that:- LangRef/DataLayout is updated with semantically coherent changes.- The midend optimizer is updated by someone who cares about those changes and tests are added that use the new DataLayout.- Developers that don't care about those changes maintain a best-effort approach, which is exactly what we do right now; there are features that are tested but are still esoteric enough that I might reasonably break them without realising (OperandBundles come to mind), so I don't think there's any change in mindset here.- Developers that care perform downstream testing and provide review feedback / revert if really bad / fixes. Again, this is how LLVM works right now - I'd guess that >80% of our real-world test coverage comes from downstream users deploying ToT LLVM rather than the upstream LIT tests / builders.
On Mon, Nov 4, 2019 at 4:00 PM James Molloy <ja...@jamesmolloy.co.uk> wrote:Hi,To throw my hat into the ring, we also maintain a downstream target that has a subset of this problem - it has non-8-bit-addressable memory. This means that %uglygeps don't work, pointers can't be casted to i8* and expect to be byte-addressed (conversions to memcpy/memset that use i8* aren't welcome either), and GEP lowering code is slightly different.We maintain this currently as a pile of known technical debt. We have a CGP pass to decompose GEPs and custom-expand them taking our word size into account, but everything before that is just waiting for InstCombine to break something. AliasAnalysis also makes offsetting assumptions that are totally bogus because our pointers are word-addressed and therefore so are pointer offsets.We'd be really keen to help here. We're keen to upstream anything we possibly can (and have been, over the past few months). We've have several discussions about how best to approach upstream with this, and the sticking point has always been lack of a testing target. It's always felt to me that the idea of addressable granule should be a fairly reasonable DataLayout addition; We can test DataLayout changes purely via opt without requiring a target that uses them. Lowering to instructions was always the testing sticking point.We'd be keen to help out what the community decides to do here. I personally feel it's reasonable that:- LangRef/DataLayout is updated with semantically coherent changes.- The midend optimizer is updated by someone who cares about those changes and tests are added that use the new DataLayout.- Developers that don't care about those changes maintain a best-effort approach, which is exactly what we do right now; there are features that are tested but are still esoteric enough that I might reasonably break them without realising (OperandBundles come to mind), so I don't think there's any change in mindset here.- Developers that care perform downstream testing and provide review feedback / revert if really bad / fixes. Again, this is how LLVM works right now - I'd guess that >80% of our real-world test coverage comes from downstream users deploying ToT LLVM rather than the upstream LIT tests / builders.
This last bit is the bit I'd worry about if the bug were only visible in an out-of-tree target. If it's visible with a DataLayout change that can be tested upstream, then I'd be OK with an upstream patch being reverted given a test case with custom DataLayout showing the failure.
But if the failure is for, say, non-8-bit-bytes that are only actually exercised downstream, I'm less OK with patches being reverted upstream no matter the severity of the breakage to such targets downstream.
On Mon, Nov 4, 2019 at 4:07 PM David Blaikie via llvm-dev <llvm...@lists.llvm.org> wrote:On Mon, Nov 4, 2019 at 4:00 PM James Molloy <ja...@jamesmolloy.co.uk> wrote:Hi,To throw my hat into the ring, we also maintain a downstream target that has a subset of this problem - it has non-8-bit-addressable memory. This means that %uglygeps don't work, pointers can't be casted to i8* and expect to be byte-addressed (conversions to memcpy/memset that use i8* aren't welcome either), and GEP lowering code is slightly different.We maintain this currently as a pile of known technical debt. We have a CGP pass to decompose GEPs and custom-expand them taking our word size into account, but everything before that is just waiting for InstCombine to break something. AliasAnalysis also makes offsetting assumptions that are totally bogus because our pointers are word-addressed and therefore so are pointer offsets.We'd be really keen to help here. We're keen to upstream anything we possibly can (and have been, over the past few months). We've have several discussions about how best to approach upstream with this, and the sticking point has always been lack of a testing target. It's always felt to me that the idea of addressable granule should be a fairly reasonable DataLayout addition; We can test DataLayout changes purely via opt without requiring a target that uses them. Lowering to instructions was always the testing sticking point.We'd be keen to help out what the community decides to do here. I personally feel it's reasonable that:- LangRef/DataLayout is updated with semantically coherent changes.- The midend optimizer is updated by someone who cares about those changes and tests are added that use the new DataLayout.- Developers that don't care about those changes maintain a best-effort approach, which is exactly what we do right now; there are features that are tested but are still esoteric enough that I might reasonably break them without realising (OperandBundles come to mind), so I don't think there's any change in mindset here.- Developers that care perform downstream testing and provide review feedback / revert if really bad / fixes. Again, this is how LLVM works right now - I'd guess that >80% of our real-world test coverage comes from downstream users deploying ToT LLVM rather than the upstream LIT tests / builders.
This last bit is the bit I'd worry about if the bug were only visible in an out-of-tree target. If it's visible with a DataLayout change that can be tested upstream, then I'd be OK with an upstream patch being reverted given a test case with custom DataLayout showing the failure.
But if the failure is for, say, non-8-bit-bytes that are only actually exercised downstream, I'm less OK with patches being reverted upstream no matter the severity of the breakage to such targets downstream.Suppose that the failure is build breakage for the downstream target. This could cause the entirety of the downstream LLVM/Clang build to break in a way that isn’t easy to untangle or fix. This could prevent an entire organization from integrating LLVM for an extended period of time. So it isn’t necessarily “no matter the severity of the breakage for such *targets* downstream” but could be the entire organization unless a tactical downstream revert of the offending change can be made.
This makes perfect sense to me - if you want the mid-level optimizer to be aware of this, then it makes sense for it to be part of DataLayout.
Question though: given that approach, wouldn’t that make this whole thing in-tree-testable?
What would not be testable given the appropriate data layout extensions?
-Chris
Hi Chris,SelectionDAGBuilder, DAGCombine and anything in CodeGen (but I can't think of any examples there offhand, come to think of it). I must admit that until I wrote up that email, it hadn't quite occurred to me how much of the change would be testable. After the writeup, it doesn't sounds quite as contentious as I considered it.SelectionDAGBuilder is a simple non-invasive change to visitGetElementPtrInst(); DAGCombiner may have some nasties lurking in it. Inability to test a DAGCombine is hardly a new problem though :)