Separately, teaching LLVM about unusual-sized bytes seems fine to me, if the maintenance burden is low enough and the targets are supported in-tree and are maintained. I agree that you can’t just plop in a target without support, so it makes sense to first clean things up and then land a target. However, I don’t think a mock target makes sense. I’d much rather see a real target.
Are we only talking about powers-of-two here, or “anything goes”? What restrictions are you proposing to impose?
I’m really not convinced by this “magic number” argument. 8 really isn’t that bad to see.
> On May 2, 2019, at 10:21 AM, paul.r...@sony.com wrote:
>
>> -----Original Message-----
>> From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of JF
>> Bastien via llvm-dev
>>
>> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
>> Realistically, C and C++ on such targets are different languages from 8-
>> bit-per-byte C and C++, and therefore code isn’t portable from one to the
>> other.
>
> Having done it, I promise you that it's reasonable to write portable C
> targeting both 7-bit and 8-bit 'char'. It was too long ago to remember
> anything in detail, but the brain cells still remaining from that era
> believe it was pretty clean.
I agree it’s *possible*, same way I’ve seen some correct uses of volatile, and the same way I’m sure some code supported non-two’s-complement machines correctly. What I’m saying is that most code simply isn’t, often in subtle ways. The code you wrote is therefore a subset of C which is incompatible with C at large.
Disclaimer
The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.
This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more Click Here.
Jeroen Dobbelaere via llvm-dev <llvm...@lists.llvm.org> writes:Hi Jesper,Thanks, these are interesting differences. The CHAR_BIT and byte relation is established in the C standard and I would prefer the byte terminology. It means the same thing as addressable unit but is a bit shorter and probably more widely known.Looking purely from a c/c++ language viewpoint, this makes sense. We settled on using 'addressable unit size', but any abstraction will already be helpful.Given that f18 has just been accepted as an LLVM project, we probably shouldn't be using C/C++ or any specific language terminology in LLVM.
Regardless of f18, we shouldn't anyway, unless it's terminology we independently define in our language reference. LLVM has supported many different language frontends for a long time :-)
-Hal
I agree, addressable unit size is probably a better abstraction.
However, in the lib/CodeGen directory alone, there's some 785 uses of
the word "byte" and a significant fraction of the code that we want to
modify is using the byte terminology today. An example of unmodified
code from my showcase patch set:
assert(!(Shift & 0x7) == 0 &&
"Shifts not aligned on Bytes are not supported.");
uint64_t Offset = Shift / 8;
unsigned TySizeInBytes = Origin->getValueSizeInBits(0) / 8;
assert(!(Origin->getValueSizeInBits(0) & 0x7) == 0 &&
"The size of the original loaded type is not a
"multiple of a byte.");
How would you prefer we handle this? If we only remove the magic
numbers using getAddressableUnitSize() instead of getBitsPerByte() we'd
get some mixed terminology. If the community is ok with that, we're
happy to do this. If we would go for changing the terminology overall,
then the work and the patch sizes would grow considerably.
I don't think you'll have much luck trying to make that stick for a
general audience, or even a general compiler-writer audience. Byte is
far too strongly associated with 8 bits these days.
> Octets are only ever 8 bits.
You might be able to convert all uses of byte to octet and abandon
byte entirely, but at that point why bother? It feels like a change
just for the sake of pedantry.
I like the "addressable unit" name, though it's a bit long (AddrUnit
seems OK). It at least signals to a reader that there might be
something weird going on. Getting someone writing new code to think in
those terms is a different matter, of course, but I don't think any of
the changes under discussion really help there.
BTW, is there an open source backend (in a fork, I assume) that does
this? So that we can get some kind of idea of the real scope of the
changes needed.
Cheers.
Tim.
Strongly agreed.
My personal take is this is an invasive enough change with enough likely
ongoing maintenance fall out to require substantial justification before
the work was undertaken upstream. A open source backend proposed for
inclusion upstream would be one part of that. Active contribution from
the sponsors in other areas would also be a key factor.
> Am 08.05.2019 um 09:52 schrieb Jesper Antonsson via llvm-dev <llvm...@lists.llvm.org>:
>
>> What are you using for the executable file format for machines whose
>> byte size is not 8? Looks like the ELF spec assumes that a byte is 8
>> bits long.
>
> We use ELF. Architectures can have a different byte-size to the on-disk
> representation in ELF/DWARF, and the ELF/DWARF specs are not good at
> differentiating between octets and bytes. Thus it's probably easier to
> keep ELF/DWARF in the 8-bit byte world and we have to convert from
> machine byte width to 8-bit bytes/octets at some point. This might be
> one additional reason to use the "addressable unit" terminology
> instead.
We don't even have an explicit executable file format. Basically the executable is the OS on the core. And we use the term "bits per addressable unit" (BPAU), because to us a byte is still 8 bits wide.
Thanks,
Boris
Ok.
p.s. If my wording came across as implying any disrespect, sorry! I was
making a general point, not thinking about how it might be read in
context.
On 3 May 2019, at 5:18, Jesper Antonsson via llvm-dev wrote:
> On Thu, 2019-05-02 at 09:43 -0700, JF Bastien wrote:
>> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
>> Realistically, C and C++ on such targets are different languages from
>> 8-bit-per-byte C and C++, and therefore code isn’t portable from
>> one
>> to the other. I intend to propose that C++23 support only 8 bits per
>> byte, ditto C. I’m therefore not a fan of teaching clang about
>> this.
>
> On portability, the same is true for byte order and more. Also, the
> standard is what it is currently and the non-8-bit byte targets do
> exist. However, we don't suggest clang changes for now.
Clang already largely does not make assumptions about 8-bit bytes
outside of LLVM IR generation. I'm sure assumptions continue to
sneak in here and there, but the bulk of this work is already done
for the frontend.
John.
-eric
On Thu, May 9, 2019 at 10:30 AM JF Bastien via llvm-dev
I don’t think you have consensus to move forward at this point in time. My expectation, which I think represents LLVM’s historical approach, is that a path to full support be planned out before this effort starts. Concretely, I expect a real-world backend to be committed to LLVM as a necessary step. What I meant upthread was: yes it makes sense to do cleanups before landing a backend, but someone has to commit to upstreaming a backend before you start the cleanups. When I say a backend I don’t mean a toy, I mean a real backend.
Right now we have no commitment on anybody landing a backend, and we don’t really have a concrete idea of what you’re even proposing to change or how. You’re focusing on “magic numbers” like everyone agrees 8 is the root of all evil, and it’s really not. Let’s say someone promises to upstream a backend, what concretely do you need to change, and in which projects, to get there? Are you changing clang, and how? What about libc++? Linker? LLVM, and how? Is IR going to change? If not, do you keep all the i8* around, and how do you work around not having void* in IR?
Just to point out, we already have a lot of precedent for cover
functions to hide bits to bytes conversions where doing so improves
readability. Changes to improve readability of existing code would be
welcome, regardless of the outcome of the rest of this discussion. The
concern being raised is about introducing new concepts, not about
cleaning up existing code. There's a big difference between using
getSizeInBytes vs getSizeInBits as a stylistic cleanup vs requiring it
for correctness reasons.
A good starting place would be to start with places where we already
have the appropriate cover functions (lots of DataLayout accessors for
instance), and audit their users.
Keep in mind that patches of this variety will be evaluated purely on
readability/style grounds until a broader consensus around the proposed
direction has been reached.
Philip
In practice, we generally use a loose consensus model. We don't require
full consensus, but we do require there to be a) an agreement on
direction being generally valuable to more than the contributor, b) a
lack of strong objections from established contributors, and c) an
(often implicit) commitment from the proposer to invest enough effort in
the project to justify any downsides other contributors might experience.
Point (c) is frequently where a lot of large proposals from new
contributors fail. Unless there's someone else strongly motivated to
drive it forward who can step in to satisfy (c), then a proposal is
likely to die on the vine. Some of the most important feedback any
proposal gets is about how to reduce the cost to other contributors. If
that feedback is ignored, then the proposal is almost certain dead. (I
see this all the time.)
Point (b) is a major stumbling block, but can almost always be worked
through. There's a couple of implicit points worth noting. 1) We
require that objections be largely technical in nature. 2) Anyone
objecting strongly is expect to be themselves a long term contributor
whose been through the process before. 3) There's a lot of horse trading
which goes on behind the scenes, and 4) because a strong objection is so
powerful, it's frequently waived. Objections based on (1) can
frequently be addressed through offline direct conversation, and
frequently lead to revisions in proposal. (Usually, a reduction in
initial scope, sometimes the opposite.) (3) is a practical necessity as
no one has the time to review and think through *everything*; I'm much
more likely to invest time in a proposal coming from someone who I've
worked closely with in the past than a stranger. (4) shows up in subtle
ways. If you've seen me or someone else say something to the effect of
"Drive by comment", "minor concern", "deferring to X on point Y", those
are all indications that I've chosen explicitly *not* to express a
strong objection.
Point (a) may seem like the core point, but it's generally the easiest.
Once (b) has been addressed, (a) almost always follows.
Again, speaking only for myself.
Philip