[llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes

86 views
Skip to first unread message

Jesper Antonsson via llvm-dev

unread,
May 2, 2019, 8:20:42 AM5/2/19
to llvm...@lists.llvm.org
A. This RFC outlines a proposal regarding non-8-bit-byte support that
got positive reception at a Round Table at EuroLLVM19. The general
topic has been brought up several times before and one good overview
can be found in a FOSDEM 2017 presentation by Jones and Cook:
https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/

In a nutshell, the proposal is for the llvm community to
allow/encourage interested parties to gradually remove "magic numbers",
e.g. assumptions on the size of bytes from the codebase. Overview,
rationale and some example refactorings follows.

Overview:

LLVM currently assumes 8-bit bytes, while there exist a few out-of-tree
llvm targets that utilize bytes of other sizes, including our
(Ericsson's) proprietary target. The main issues are the magic number 8
and "/8" and "*8" all over the place and the use of i8 pointers.

There's considerable agreement that the use of magic numbers is not
good coding style, and removing these ones would be of particular
benefit, even though the effort would not be complete and no in-tree
target with tests exist to guarantee that all gains are maintained.

Ericsson is willing to drive this effort. During EuroLLVM19, there
seemed to be sufficient positive interest from other companies for us
to expect help with reviewing patch sets. Ericsson has been performing
nightly integration towards top-of-tree with this backend for years,
catching and fixing new 8-bit-byte continuously. Thus we're able to
commit to doing similar upstream fixes for the long haul in a no-drama
way.

Rationale:

Benefits of moving toward a byte-size agnostic llvm include:
* Less magic numbers in the codebase.
* A reduced effort to maintain out-of-tree targets with non-8-bit bytes
as contributors follow the established patterns. (One company has told
us that they created but eventually gave up on a 16-bit byte target due
to too-high integration burden.)
* A reduction in duplicate efforts as some of the adaptation work would
happen in-tree rather than in several out-of-tree targets.
* For up-and-coming targets that have non-8-bit-byte sizes, time to
market using llvm would be far quicker.
* A higher probability of LLVM being the compiler of choice for such
targets.
* Eventually, as the patch set required to make llvm fully byte size
agnostic becomes small enough, the effort to provide a mock in-tree
target with some other byte size should be surmountable.

As cons, one could see a burden for the in-tree community to maintain
whatever gains that have been had. However the onus should be on
interested parties to mend any bit-rot. The impact of not having as
much magic numbers and such should if anything make the code more easy
to understand. The permission to go ahead would be under the condition
that significant added complexities are avoided. Another con would be
added compilation time e.g. in cases where the byte size is a run-time
variable rather than a constant. However, this cost seems negligible in
practice.

Refactoring examples:
https://reviews.llvm.org/D61432

Best Regards,
Jesper
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

JF Bastien via llvm-dev

unread,
May 2, 2019, 12:43:49 PM5/2/19
to Jesper Antonsson, llvm...@lists.llvm.org
I’m not a fan of C and C++ supporting anything but 8 bits per byte. Realistically, C and C++ on such targets are different languages from 8-bit-per-byte C and C++, and therefore code isn’t portable from one to the other. I intend to propose that C++23 support only 8 bits per byte, ditto C. I’m therefore not a fan of teaching clang about this.

Separately, teaching LLVM about unusual-sized bytes seems fine to me, if the maintenance burden is low enough and the targets are supported in-tree and are maintained. I agree that you can’t just plop in a target without support, so it makes sense to first clean things up and then land a target. However, I don’t think a mock target makes sense. I’d much rather see a real target.

Are we only talking about powers-of-two here, or “anything goes”? What restrictions are you proposing to impose?

I’m really not convinced by this “magic number” argument. 8 really isn’t that bad to see.

via llvm-dev

unread,
May 2, 2019, 1:21:43 PM5/2/19
to jfba...@apple.com, jesper.a...@ericsson.com, llvm...@lists.llvm.org
> -----Original Message-----
> From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of JF
> Bastien via llvm-dev
>
> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
> Realistically, C and C++ on such targets are different languages from 8-
> bit-per-byte C and C++, and therefore code isn’t portable from one to the
> other.

Having done it, I promise you that it's reasonable to write portable C
targeting both 7-bit and 8-bit 'char'. It was too long ago to remember
anything in detail, but the brain cells still remaining from that era
believe it was pretty clean.

> I intend to propose that C++23 support only 8 bits per byte, ditto
> C. I’m therefore not a fan of teaching clang about this.

My impression is that non-8-bit-byte machines are (these days) generally
small and likely for embedded or other special purposes, so a proposal
to stop trying to squeeze the bloated monster that C++ has become onto
those is probably fine. C, on the other hand, appears to be the language
of choice for all sorts of weird things, and that's less likely to fly.
(Just sayin'. I'm not on either committee and have no vested interest.)
--paulr

JF Bastien via llvm-dev

unread,
May 2, 2019, 1:25:56 PM5/2/19
to paul.r...@sony.com, llvm...@lists.llvm.org

> On May 2, 2019, at 10:21 AM, paul.r...@sony.com wrote:
>
>> -----Original Message-----
>> From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of JF
>> Bastien via llvm-dev
>>
>> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
>> Realistically, C and C++ on such targets are different languages from 8-
>> bit-per-byte C and C++, and therefore code isn’t portable from one to the
>> other.
>
> Having done it, I promise you that it's reasonable to write portable C
> targeting both 7-bit and 8-bit 'char'. It was too long ago to remember
> anything in detail, but the brain cells still remaining from that era
> believe it was pretty clean.

I agree it’s *possible*, same way I’ve seen some correct uses of volatile, and the same way I’m sure some code supported non-two’s-complement machines correctly. What I’m saying is that most code simply isn’t, often in subtle ways. The code you wrote is therefore a subset of C which is incompatible with C at large.

via llvm-dev

unread,
May 2, 2019, 1:31:54 PM5/2/19
to jfba...@apple.com, llvm...@lists.llvm.org
I don't think it was really that severe of a "subset" and would
dispute that a "subset" is inherently "incompatible" but that
is straying too far from the topic at hand to tolerate.
--paulr

Pavel Šnobl via llvm-dev

unread,
May 2, 2019, 1:55:05 PM5/2/19
to Jesper Antonsson, llvm...@lists.llvm.org
Hi Jesper,

thank you for working on this. My company (Codasip) would definitely be interested in having this feature upstream. I think that this is actually important for a suprisingly large number of people who currently have to maintain their changes downstream. I have a couple of questions and comments:

1. Do you plan on supporting truly arbitrary values as the byte size or are there in fact going to be limitations (e.g. the value has to be a multiple of 8 and lower or equal to 64)? I recall that we had a customer asking about 36-bit bytes.
2. If you define a byte to be e.g. 16 bits wide, does it mean that "char" is also 16 bits wide? If yes then how to do you define types like int8_t from stdint.h?
3. Have you thought about the possibility to support different byte sizes for data and code?
4. I realize that this is a separate issue but fully supporting non-8-bit bytes requires also changes to other parts of a typical toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you maintain out-of-tree changes in this area as well?

Thank you,
Pavel

David Greene via llvm-dev

unread,
May 2, 2019, 5:05:00 PM5/2/19
to JF Bastien via llvm-dev
JF Bastien via llvm-dev <llvm...@lists.llvm.org> writes:

> I’m not a fan of C and C++ supporting anything but 8 bits per
> byte. Realistically, C and C++ on such targets are different languages
> from 8-bit-per-byte C and C++, and therefore code isn’t portable from
> one to the other. I intend to propose that C++23 support only 8 bits
> per byte, ditto C. I’m therefore not a fan of teaching clang about
> this.

It depends on what you mean by "byte." Is "byte" a unit of addressing
or a unit of data?

We have worked with machines that had word-level (32-bit) addressing.
8-bit data works just fine on such machines, you simply treat them like
bitfields. Types like sint8_t are reasonable on such machines. Types
like sint8_t * can also be reasonable.

I think it would be pretty bad to restrict either language to 8-bit
bytes.

-David

Sean Kilmurray via llvm-dev

unread,
May 3, 2019, 4:27:31 AM5/3/19
to llvm...@lists.llvm.org
Hi Jesper,

My company (CML Microsystems) would definitely be interested in having this feature upstream too.

We currently maintain an out of tree backend that has a minimum addressable size of 16 bits and this is implemented using the method outlined by Jones and Cook of Embecosm that you refer to in the RFC.

Our implementation is slightly different than the one you’ve proposed in that we used the concept of bitPerChar and only support multiples of 8 bits for that char width.

I would be happy to help out with the work in any way I can.

Regards
Sean Kilmurray

Disclaimer

The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more Click Here.

CML disclaimer.txt

Jesper Antonsson via llvm-dev

unread,
May 3, 2019, 5:18:30 AM5/3/19
to jfba...@apple.com, llvm...@lists.llvm.org
On Thu, 2019-05-02 at 09:43 -0700, JF Bastien wrote:
> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
> Realistically, C and C++ on such targets are different languages from
> 8-bit-per-byte C and C++, and therefore code isn’t portable from one
> to the other. I intend to propose that C++23 support only 8 bits per
> byte, ditto C. I’m therefore not a fan of teaching clang about this.

On portability, the same is true for byte order and more. Also, the
standard is what it is currently and the non-8-bit byte targets do
exist. However, we don't suggest clang changes for now.

> Separately, teaching LLVM about unusual-sized bytes seems fine to me,
> if the maintenance burden is low enough and the targets are supported
> in-tree and are maintained. I agree that you can’t just plop in a
> target without support, so it makes sense to first clean things up
> and then land a target. However, I don’t think a mock target makes
> sense. I’d much rather see a real target.

I'd also much rather see a real target. Hopefully, the cleanup will
make it more likely to happen.

> Are we only talking about powers-of-two here, or “anything goes”?
> What restrictions are you proposing to impose?

We're proposing "anything goes" larger than 8 as that's what the
standards says, and as we've talked to people having non-power-of-two
architectures. Also, we feel there's no major disadvantage of going for
that. Yes, we can't use masks and shifts the same way, but we feel that
won't have a big impact. (However, our target has 16-bit bytes, so if
the community would rather see powers-of-two, we could live with that.)

> I’m really not convinced by this “magic number” argument. 8 really
> isn’t that bad to see.

Though the meaning isn't always clear, i.e. if it's handling bytes or
octets. And perhaps it doesn't have to be, for as long as you have an
8-bit byte architecture, but when you start to clean up for another
architecture, it becomes a pain and is not always obvious. Especially
not when you're mucking around with Dwarf. Also, there's "& 7", ">> 3"
as well for instance. Not that bad either (as magic numbers often
aren't in context), but if you grep for them, you often have to look a
bit extra to see if it's e.g. a flag, a byte or an octet.

Regards,
Jesper


>
>
> > On May 2, 2019, at 5:20 AM, Jesper Antonsson via llvm-dev <
> > llvm...@lists.llvm.org> wrote:
> >
> > A. This RFC outlines a proposal regarding non-8-bit-byte support
> > that
> > got positive reception at a Round Table at EuroLLVM19. The
> > general
> > topic has been brought up several times before and one good
> > overview
> > can be found in a FOSDEM 2017 presentation by Jones and Cook:
> >
https://protect2.fireeye.com/url?k=b58a506e-e9015b52-b58a10f5-86ef624f95b6-937d68ba77c32042&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/

Jeroen Dobbelaere via llvm-dev

unread,
May 3, 2019, 6:33:15 AM5/3/19
to Jesper Antonsson, llvm...@lists.llvm.org
Hi Jesper,

We (ASIP Designer team, Synopsys) are also interested in a cleaner approach without those magic constants.

Instead of 'n-bit bytes', we tend to talk about 'word addressing' and 'addressable unit size'.

We support C/C++ for various architectures that our customers create with our tools.
Some of those have multiple memories where the addressable unit size depends on the memory (address space).

NOTE: the addressable unit size can be any value (like 10 or 41 or 2048)

We also keep the bits in 'char' separate from the addressable unit size. As such, an 8 bit char can
have a 'storage size' of 24 bits in one memory and 64 bits in another.
(but a char can also be 24 bits, or 10 or something else).

I think that a cleaner abstraction in datalayout is in general useful.

We use:
unsigned getAddresssableUnitSize(unsigned AddressSpace=0);

which, for the main llvm, could be implemented as :
unsigned getAddresssableUnitSize(unsigned AS=0) { return 8; }

Greetings,

Jeroen Dobbelaere


> -----Original Message-----
> From: llvm-dev <llvm-dev...@lists.llvm.org> On Behalf Of Jesper
> Antonsson via llvm-dev
> Sent: Thursday, May 2, 2019 14:21
> To: llvm...@lists.llvm.org
> Subject: [llvm-dev] RFC: On removing magic numbers assuming 8-bit bytes
>

Jesper Antonsson via llvm-dev

unread,
May 3, 2019, 7:22:58 AM5/3/19
to sn...@codasip.com, llvm...@lists.llvm.org
On Thu, 2019-05-02 at 19:54 +0200, Pavel Šnobl wrote:

> Hi Jesper,
>
> thank you for working on this. My company (Codasip) would definitely
> be interested in having this feature upstream. I think that this is
> actually important for a suprisingly large number of people who
> currently have to maintain their changes downstream. I have a couple
> of questions and comments:
>
> 1. Do you plan on supporting truly arbitrary values as the byte size
> or are there in fact going to be limitations (e.g. the value has to
> be a multiple of 8 and lower or equal to 64)? I recall that we had a
> customer asking about 36-bit bytes.

We plan on supporting arbitrary sizes with a lower limit of 8, not
necessarily power-of-two or multiples of 8. I have to admit that I
haven't thought very much about what the upper limit might be. We might
leave it up to other interested parties to explore that and if we
receive suggestions on how to generalize also in that respect, we'll
certainly consider them.

> 2. If you define a byte to be e.g. 16 bits wide, does it mean that
> "char" is also 16 bits wide? If yes then how to do you define types
> like int8_t from stdint.h?

Yes, char is the same. The int8_t type is optional according to the
standard and we don't define it for our OOT target. The int_least8_t is
required, but we just define it to be byte sized.

> 3. Have you thought about the possibility to support different byte
> sizes for data and code?

Not really, but I saw that Jeroen Dobbelaere just suggested supporting
memory spaces with different byte sizes.

> 4. I realize that this is a separate issue but fully supporting non-
> 8-bit bytes requires also changes to other parts of a typical
> toolchain, namely linker (ld/lld) and debugger (gdb/lldb). Do you
> maintain out-of-tree changes in this area as well?

That's true, we do. I've also seen some community interest in those
areas, e.g. from Embecosm:
https://www.embecosm.com/2018/02/26/how-much-does-a-compiler-cost/

and from within Ericsson:
https://www.youtube.com/watch?v=HAqtEZmci70

Thanks,
Jesper

Jesper Antonsson via llvm-dev

unread,
May 3, 2019, 10:52:05 AM5/3/19
to Jeroen.D...@synopsys.com, llvm...@lists.llvm.org
Hi Jeroen,

Thanks, these are interesting differences. The CHAR_BIT and byte
relation is established in the C standard and I would prefer the byte
terminology. It means the same thing as addressable unit but is a bit
shorter and probably more widely known.

Do you suggest any in-tree changes at some call sites using the
suggested AdressSpace parameter or would we rely on the default value
always? I could definitely include this, but if the parameter is never
used, perhaps we can leave it out-of-tree for now?

Best Regards,
Jesper
https://protect2.fireeye.com/url?k=2d1c7421-7195ae31-2d1c34ba-0cc47ad93e2a-4cae3fa5d066a45a&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/

Jeroen Dobbelaere via llvm-dev

unread,
May 3, 2019, 2:01:00 PM5/3/19
to Jesper Antonsson, llvm...@lists.llvm.org
Hi Jesper,

> Thanks, these are interesting differences. The CHAR_BIT and byte
> relation is established in the C standard and I would prefer the byte
> terminology. It means the same thing as addressable unit but is a bit
> shorter and probably more widely known.

Looking purely from a c/c++ language viewpoint, this makes sense.
We settled on using 'addressable unit size', but any abstraction will already be helpful.

>
> Do you suggest any in-tree changes at some call sites using the
> suggested AdressSpace parameter or would we rely on the default value
> always? I could definitely include this, but if the parameter is never
> used, perhaps we can leave it out-of-tree for now?

Whenever optimizations are done on store/load/pointers, the address space should be taken into account.
For us it is fine to manage that part out-of-tree.

Greetings,

Jeroen Dobbelaere

David Greene via llvm-dev

unread,
May 3, 2019, 3:58:50 PM5/3/19
to Jeroen Dobbelaere via llvm-dev
Jeroen Dobbelaere via llvm-dev <llvm...@lists.llvm.org> writes:

> Hi Jesper,
>
>> Thanks, these are interesting differences. The CHAR_BIT and byte
>> relation is established in the C standard and I would prefer the byte
>> terminology. It means the same thing as addressable unit but is a bit
>> shorter and probably more widely known.
>
> Looking purely from a c/c++ language viewpoint, this makes sense. We
> settled on using 'addressable unit size', but any abstraction will
> already be helpful.

Given that f18 has just been accepted as an LLVM project, we probably
shouldn't be using C/C++ or any specific language terminology in LLVM.
It seems useful to distinguish "addressable unit size" from "data size"
and talk about things using such abstract terminology.

-David

Finkel, Hal J. via llvm-dev

unread,
May 3, 2019, 4:18:02 PM5/3/19
to David Greene, Jeroen Dobbelaere via llvm-dev
On 5/3/19 2:58 PM, llvm-dev wrote:
Jeroen Dobbelaere via llvm-dev <llvm...@lists.llvm.org> writes:

Hi Jesper,

Thanks, these are interesting differences. The CHAR_BIT and byte
relation is established in the C standard and I would prefer the byte
terminology. It means the same thing as addressable unit but is a bit
shorter and probably more widely known.
Looking purely from a c/c++ language viewpoint, this makes sense.  We
settled on using 'addressable unit size', but any abstraction will
already be helpful.
Given that f18 has just been accepted as an LLVM project, we probably
shouldn't be using C/C++ or any specific language terminology in LLVM.


Regardless of f18, we shouldn't anyway, unless it's terminology we independently define in our language reference. LLVM has supported many different language frontends for a long time :-)

 -Hal

Jesper Antonsson via llvm-dev

unread,
May 6, 2019, 4:25:58 AM5/6/19
to d...@cray.com, hfi...@anl.gov, llvm...@lists.llvm.org
I agree, addressable unit size is probably a better abstraction.
However, in the lib/CodeGen directory alone, there's some 785 uses of
the word "byte" and a significant fraction of the code that we want to
modify is using the byte terminology today. An example of unmodified
code from my showcase patch set:

assert(!(Shift & 0x7) == 0 &&
"Shifts not aligned on Bytes are not supported.");
uint64_t Offset = Shift / 8;
unsigned TySizeInBytes = Origin->getValueSizeInBits(0) / 8;
assert(!(Origin->getValueSizeInBits(0) & 0x7) == 0 &&
"The size of the original loaded type is not a
"multiple of a byte.");

How would you prefer we handle this? If we only remove the magic
numbers using getAddressableUnitSize() instead of getBitsPerByte() we'd
get some mixed terminology. If the community is ok with that, we're
happy to do this. If we would go for changing the terminology overall,
then the work and the patch sizes would grow considerably.

BR,
Jesper

James Courtier-Dutton via llvm-dev

unread,
May 6, 2019, 5:13:26 AM5/6/19
to Jesper Antonsson, d...@cray.com, llvm...@lists.llvm.org
On Mon, 6 May 2019 at 09:26, Jesper Antonsson via llvm-dev <llvm...@lists.llvm.org> wrote:
I agree, addressable unit size is probably a better abstraction.
However, in the lib/CodeGen directory alone, there's some 785 uses of
the word "byte" and a significant fraction of the code that we want to
modify is using the byte terminology today. An example of unmodified
code from my showcase patch set:

    assert(!(Shift & 0x7) == 0 &&
           "Shifts not aligned on Bytes are not supported.");
    uint64_t Offset = Shift / 8;
    unsigned TySizeInBytes = Origin->getValueSizeInBits(0) / 8;
    assert(!(Origin->getValueSizeInBits(0) & 0x7) == 0 &&
           "The size of the original loaded type is not a
           "multiple of a byte.");

How would you prefer we handle this? If we only remove the magic
numbers using getAddressableUnitSize() instead of getBitsPerByte() we'd
get some mixed terminology. If the community is ok with that, we're
happy to do this. If we would go for changing the terminology overall,
then the work and the patch sizes would grow considerably.



Although the above is mentioning bytes, looking at the "/ 8"   and "& 0x7" makes it look like the author meant octets and not bytes.
Bytes can be any size of bits. Octets are only ever 8 bits.



Tim Northover via llvm-dev

unread,
May 6, 2019, 5:43:34 AM5/6/19
to James Courtier-Dutton, d...@cray.com, llvm...@lists.llvm.org
On Mon, 6 May 2019 at 10:13, James Courtier-Dutton via llvm-dev
<llvm...@lists.llvm.org> wrote:
> Although the above is mentioning bytes, looking at the "/ 8" and "& 0x7" makes it look like the author meant octets and not bytes.
> Bytes can be any size of bits.

I don't think you'll have much luck trying to make that stick for a
general audience, or even a general compiler-writer audience. Byte is
far too strongly associated with 8 bits these days.

> Octets are only ever 8 bits.

You might be able to convert all uses of byte to octet and abandon
byte entirely, but at that point why bother? It feels like a change
just for the sake of pedantry.

I like the "addressable unit" name, though it's a bit long (AddrUnit
seems OK). It at least signals to a reader that there might be
something weird going on. Getting someone writing new code to think in
those terms is a different matter, of course, but I don't think any of
the changes under discussion really help there.

BTW, is there an open source backend (in a fork, I assume) that does
this? So that we can get some kind of idea of the real scope of the
changes needed.

Cheers.

Tim.

Philip Reames via llvm-dev

unread,
May 6, 2019, 6:57:05 PM5/6/19
to Tim Northover, James Courtier-Dutton, d...@cray.com, llvm...@lists.llvm.org

On 5/6/19 2:43 AM, Tim Northover via llvm-dev wrote:
> On Mon, 6 May 2019 at 10:13, James Courtier-Dutton via llvm-dev
> <llvm...@lists.llvm.org> wrote:
>> Although the above is mentioning bytes, looking at the "/ 8" and "& 0x7" makes it look like the author meant octets and not bytes.
>> Bytes can be any size of bits.
> I don't think you'll have much luck trying to make that stick for a
> general audience, or even a general compiler-writer audience. Byte is
> far too strongly associated with 8 bits these days.
+1 Please don't try; insisting on a distinction will confuse many new
contributors.

>
>> Octets are only ever 8 bits.
> You might be able to convert all uses of byte to octet and abandon
> byte entirely, but at that point why bother? It feels like a change
> just for the sake of pedantry.
>
> I like the "addressable unit" name, though it's a bit long (AddrUnit
> seems OK). It at least signals to a reader that there might be
> something weird going on. Getting someone writing new code to think in
> those terms is a different matter, of course, but I don't think any of
> the changes under discussion really help there.
>
> BTW, is there an open source backend (in a fork, I assume) that does
> this? So that we can get some kind of idea of the real scope of the
> changes needed.

Strongly agreed.

My personal take is this is an invasive enough change with enough likely
ongoing maintenance fall out to require substantial justification before
the work was undertaken upstream.  A open source backend proposed for
inclusion upstream would be one part of that.  Active contribution from
the sponsors in other areas would also be a key factor.

Rui Ueyama via llvm-dev

unread,
May 7, 2019, 1:24:16 AM5/7/19
to Jesper Antonsson, llvm...@lists.llvm.org
From: Jesper Antonsson via llvm-dev <llvm...@lists.llvm.org>
Date: Fri, May 3, 2019 at 8:23 PM
To: sn...@codasip.com
Cc: llvm...@lists.llvm.org

What are you using for the executable file format for machines whose byte size is not 8? Looks like the ELF spec assumes that a byte is 8 bits long.

Jesper Antonsson via llvm-dev

unread,
May 8, 2019, 3:52:19 AM5/8/19
to ru...@google.com, llvm...@lists.llvm.org
We use ELF. Architectures can have a different byte-size to the on-disk
representation in ELF/DWARF, and the ELF/DWARF specs are not good at
differentiating between octets and bytes. Thus it's probably easier to
keep ELF/DWARF in the 8-bit byte world and we have to convert from
machine byte width to 8-bit bytes/octets at some point. This might be
one additional reason to use the "addressable unit" terminology
instead.

Jesper Antonsson via llvm-dev

unread,
May 8, 2019, 4:04:25 AM5/8/19
to t.p.no...@gmail.com, james....@gmail.com, d...@cray.com, llvm...@lists.llvm.org
On Mon, 2019-05-06 at 10:43 +0100, Tim Northover wrote:
> On Mon, 6 May 2019 at 10:13, James Courtier-Dutton via llvm-dev
> <llvm...@lists.llvm.org> wrote:
> > Although the above is mentioning bytes, looking at the "/ 8" and
> > "& 0x7" makes it look like the author meant octets and not bytes.
> > Bytes can be any size of bits.
>
> I don't think you'll have much luck trying to make that stick for a
> general audience, or even a general compiler-writer audience. Byte is
> far too strongly associated with 8 bits these days.
>
> > Octets are only ever 8 bits.
>
> You might be able to convert all uses of byte to octet and abandon
> byte entirely, but at that point why bother? It feels like a change
> just for the sake of pedantry.
>
> I like the "addressable unit" name, though it's a bit long (AddrUnit
> seems OK). It at least signals to a reader that there might be
> something weird going on. Getting someone writing new code to think
> in
> those terms is a different matter, of course, but I don't think any
> of
> the changes under discussion really help there.
>
> BTW, is there an open source backend (in a fork, I assume) that does
> this? So that we can get some kind of idea of the real scope of the
> changes needed.

Unfortunately, there is not AFAIK. If you like, we can try to expand on
the showcase patch to show more, but the changes are numerous enough to
make it difficult to assemble something that will show it all.

Thanks,
Jesper

Boris Boesler via llvm-dev

unread,
May 8, 2019, 4:05:34 AM5/8/19
to llvm...@lists.llvm.org
Hi

> Am 08.05.2019 um 09:52 schrieb Jesper Antonsson via llvm-dev <llvm...@lists.llvm.org>:
>
>> What are you using for the executable file format for machines whose
>> byte size is not 8? Looks like the ELF spec assumes that a byte is 8
>> bits long.
>
> We use ELF. Architectures can have a different byte-size to the on-disk
> representation in ELF/DWARF, and the ELF/DWARF specs are not good at
> differentiating between octets and bytes. Thus it's probably easier to
> keep ELF/DWARF in the 8-bit byte world and we have to convert from
> machine byte width to 8-bit bytes/octets at some point. This might be
> one additional reason to use the "addressable unit" terminology
> instead.

We don't even have an explicit executable file format. Basically the executable is the OS on the core. And we use the term "bits per addressable unit" (BPAU), because to us a byte is still 8 bits wide.

Thanks,
Boris

Jesper Antonsson via llvm-dev

unread,
May 8, 2019, 4:25:20 AM5/8/19
to t.p.no...@gmail.com, james....@gmail.com, list...@philipreames.com, d...@cray.com, llvm...@lists.llvm.org
On Mon, 2019-05-06 at 15:56 -0700, Philip Reames via llvm-dev wrote:
> >

> On 5/6/19 2:43 AM, Tim Northover via llvm-dev wrote:
> > On Mon, 6 May 2019 at 10:13, James Courtier-Dutton via llvm-dev
> > <llvm...@lists.llvm.org> wrote:
> > > Although the above is mentioning bytes, looking at the "/
> > > 8" and "& 0x7" makes it look like the author meant octets and
> > > not bytes.
> > > Bytes can be any size of bits.
> >
> > I don't think you'll have much luck trying to make that stick for a
> > general audience, or even a general compiler-writer audience. Byte
> > is
> > far too strongly associated with 8 bits these days.
>
> +1 Please don't try; insisting on a distinction will confuse many new
> contributors.

Yes, my interpretation is that the community is leaning toward
addressable unit as terminology.

> > > Octets are only ever 8 bits.
> >
> > You might be able to convert all uses of byte to octet and abandon
> > byte entirely, but at that point why bother? It feels like a change
> > just for the sake of pedantry.
> >
> > I like the "addressable unit" name, though it's a bit long
> > (AddrUnit
> > seems OK). It at least signals to a reader that there might be
> > something weird going on. Getting someone writing new code to think
> > in
> > those terms is a different matter, of course, but I don't think any
> > of
> > the changes under discussion really help there.
> >
> > BTW, is there an open source backend (in a fork, I assume) that
> > does
> > this? So that we can get some kind of idea of the real scope of the
> > changes needed.
>
> Strongly agreed.
>
> My personal take is this is an invasive enough change with enough
> likely
> ongoing maintenance fall out to require substantial justification
> before
> the work was undertaken upstream.

My hope and belief is that having good names instead of these magical
numbers won't be a burden but rather an improvement long-term.

> A open source backend proposed for
> inclusion upstream would be one part of that.

That is not on the table right now. However, as the work required to
make such an inclusion happen will be reduced by this cleanup, the
likelihood that it happens in the future should increase.

> Active contribution from
> the sponsors in other areas would also be a key factor.

I'm not sure how to interpret that, but our team here at Ericsson is
fairly large, we have been working with this out-of-tree backend since
2011 and as a group, we contribute to upstream e.g. by helping out with
the fixedpoint upstreaming, by solving and filing TRs (we're pretty
good at testing I'd say), improving debug information and more.

Philip Reames via llvm-dev

unread,
May 8, 2019, 2:12:55 PM5/8/19
to Jesper Antonsson, t.p.no...@gmail.com, james....@gmail.com, d...@cray.com, llvm...@lists.llvm.org
Given this, I'm not sure the community as a whole should take on the
burden of supporting non b-byte addressable units.  I see this as a
precondition.  To be clear, I don't care *which* backend there is,
doesn't have to be yours, but the presence of at least one would seem
necessary for testing if nothing else.

>
>> Active contribution from
>> the sponsors in other areas would also be a key factor.
> I'm not sure how to interpret that, but our team here at Ericsson is
> fairly large, we have been working with this out-of-tree backend since
> 2011 and as a group, we contribute to upstream e.g. by helping out with
> the fixedpoint upstreaming, by solving and filing TRs (we're pretty
> good at testing I'd say), improving debug information and more.

Ok.

p.s. If my wording came across as implying any disrespect, sorry!  I was
making a general point, not thinking about how it might be read in
context. 

Jesper Antonsson via llvm-dev

unread,
May 9, 2019, 8:29:38 AM5/9/19
to t.p.no...@gmail.com, james....@gmail.com, list...@philipreames.com, d...@cray.com, llvm...@lists.llvm.org
I agree that an in-tree target is needed for actual support. However,
we're merely suggesting a gradual cleanup of magic numbers in order to
make the code a bit more readable and make life easier for a number of
downstream targets. It will not result in support, but it would make
any effort to create support (or maintain support downstream)
significantly smaller. This would also make it a bit more likely that
LLVM is the compiler of choice for such targets, some of which might
want to upstream eventually.

The onus is on interested parties to maintain any gains, and Ericsson
is offering to do that in a no-drama way with the help of other
companies that have voiced their interest. We continuously merge and
test against top of tree and would act accordingly, if allowed.

As the discussion is subsiding, I'm unsure about how to conclude this
RFC. Several parties have said they support this effort, others have
pitched in with suggestions on terminology and such (which perhaps
indicates that they are not opposed in general). JF Bastien and you ask
for in-tree targets, although JF did indicate that it made sense to
first clean up.

On "byte" vs "addressable unit", we've been thinking a bit and are
leaning toward staying with the prevalent "byte" terminology for as
long as upstream is 8-bit-only to avoid mixed terminology or larger
patches. However, we're flexible on this, and I've uploaded a twin
patch (in D61725) to my original showcase showing how "addressable
unit" could look.

> >
> > > Active contribution from
> > > the sponsors in other areas would also be a key factor.
> >
> > I'm not sure how to interpret that, but our team here at Ericsson
> > is
> > fairly large, we have been working with this out-of-tree backend
> > since
> > 2011 and as a group, we contribute to upstream e.g. by helping out
> > with
> > the fixedpoint upstreaming, by solving and filing TRs (we're pretty
> > good at testing I'd say), improving debug information and more.
>
> Ok.
>
> p.s. If my wording came across as implying any disrespect, sorry! I
> was
> making a general point, not thinking about how it might be read in
> context.

No problem, thanks!

JF Bastien via llvm-dev

unread,
May 9, 2019, 1:30:36 PM5/9/19
to Jesper Antonsson, d...@cray.com, llvm...@lists.llvm.org
I don’t think you have consensus to move forward at this point in time. My expectation, which I think represents LLVM’s historical approach, is that a path to full support be planned out before this effort starts. Concretely, I expect a real-world backend to be committed to LLVM as a necessary step. What I meant upthread was: yes it makes sense to do cleanups before landing a backend, but someone has to commit to upstreaming a backend before you start the cleanups. When I say a backend I don’t mean a toy, I mean a real backend.

Right now we have no commitment on anybody landing a backend, and we don’t really have a concrete idea of what you’re even proposing to change or how. You’re focusing on “magic numbers” like everyone agrees 8 is the root of all evil, and it’s really not. Let’s say someone promises to upstream a backend, what concretely do you need to change, and in which projects, to get there? Are you changing clang, and how? What about libc++? Linker? LLVM, and how? Is IR going to change? If not, do you keep all the i8* around, and how do you work around not having void* in IR?

The above is, I think, necessary but not sufficient to moving forward.

John McCall via llvm-dev

unread,
May 9, 2019, 3:10:16 PM5/9/19
to Jesper Antonsson, llvm...@lists.llvm.org

On 3 May 2019, at 5:18, Jesper Antonsson via llvm-dev wrote:

> On Thu, 2019-05-02 at 09:43 -0700, JF Bastien wrote:
>> I’m not a fan of C and C++ supporting anything but 8 bits per byte.
>> Realistically, C and C++ on such targets are different languages from
>> 8-bit-per-byte C and C++, and therefore code isn’t portable from
>> one
>> to the other. I intend to propose that C++23 support only 8 bits per
>> byte, ditto C. I’m therefore not a fan of teaching clang about
>> this.
>
> On portability, the same is true for byte order and more. Also, the
> standard is what it is currently and the non-8-bit byte targets do
> exist. However, we don't suggest clang changes for now.

Clang already largely does not make assumptions about 8-bit bytes
outside of LLVM IR generation. I'm sure assumptions continue to
sneak in here and there, but the bulk of this work is already done
for the frontend.

John.

Eric Christopher via llvm-dev

unread,
May 9, 2019, 3:47:14 PM5/9/19
to JF Bastien, d...@cray.com, llvm...@lists.llvm.org
I agree that consensus seems to be missing. There's definitely some
assumptions, and more in particular, API and usage assumptions around
8 bit bytes in the backends. Also: How do you plan on keeping these
assumptions from creeping back in?

-eric

On Thu, May 9, 2019 at 10:30 AM JF Bastien via llvm-dev

James Y Knight via llvm-dev

unread,
May 9, 2019, 4:00:12 PM5/9/19
to JF Bastien, d...@cray.com, llvm...@lists.llvm.org
From: JF Bastien via llvm-dev <llvm...@lists.llvm.org>
Date: Thu, May 9, 2019 at 1:30 PM
To: Jesper Antonsson
Cc: d...@cray.com, llvm...@lists.llvm.org

I don’t think you have consensus to move forward at this point in time. My expectation, which I think represents LLVM’s historical approach, is that a path to full support be planned out before this effort starts. Concretely, I expect a real-world backend to be committed to LLVM as a necessary step. What I meant upthread was: yes it makes sense to do cleanups before landing a backend, but someone has to commit to upstreaming a backend before you start the cleanups. When I say a backend I don’t mean a toy, I mean a real backend.

IMO, the argument of removing "magic constants", and replacing them with a semantically meaningful value does have some merit, even lacking any backend that uses a number other than "8".

I guess I'd say that my feeling is that if llvm's codebase already had a "bitsPerUnit()" accessor (as GCC calls this concept), I would probably not say that we should replace all the calls with literal '8's across the code-base simply because no current target uses any value other than 8. It's an abstraction which does make sense on its own, even if it's currently always 8.

So, I'm somewhat agnostic here -- the change itself has little value upstream, but if the state of the codebase afterwards is not degraded by the change (and on the contrary, can be ever-so-marginally improved), that seems like it could be basically okay.

Right now we have no commitment on anybody landing a backend, and we don’t really have a concrete idea of what you’re even proposing to change or how. You’re focusing on “magic numbers” like everyone agrees 8 is the root of all evil, and it’s really not. Let’s say someone promises to upstream a backend, what concretely do you need to change, and in which projects, to get there? Are you changing clang, and how? What about libc++? Linker? LLVM, and how? Is IR going to change? If not, do you keep all the i8* around, and how do you work around not having void* in IR?

This is a great list of questions that it would be good to have answers to, though.

Jesper Antonsson via llvm-dev

unread,
May 10, 2019, 6:58:04 AM5/10/19
to jfba...@apple.com, jykn...@google.com, d...@cray.com, llvm...@lists.llvm.org
This is a salient point, I think: If we had the abstraction, I could
never hope to get consensus to replace it with 8. While there is no
agreement, it seems to me that the community reaction leans toward the
abstraction-positive.

As to the argument that a path to full support is a prerequisite, and
that that is how it has always been done, I think every suggestion is
worth considering on it's own merits. If this abstraction is a slight
improvement in itself, is closing a gap towards gcc and represents a
significant improvement for OOT backends, is there sufficient rationale
for not allowing it?

> > Right now we have no commitment on anybody landing a backend, and
> > we don’t really have a concrete idea of what you’re even proposing
> > to change or how. You’re focusing on “magic numbers” like everyone
> > agrees 8 is the root of all evil, and it’s really not. Let’s say
> > someone promises to upstream a backend, what concretely do you need
> > to change, and in which projects, to get there? Are you changing
> > clang, and how? What about libc++? Linker? LLVM, and how? Is IR
> > going to change? If not, do you keep all the i8* around, and how do
> > you work around not having void* in IR?
> >
>
> This is a great list of questions that it would be good to have
> answers to, though.

By putting a patch in Phabricator and linking it from the RFC, I tried
to convey a very concrete idea of what we propose to change and how. We
don't propose changes outside of llvm.

The 8's are not the root of all evil, but again, magic numbers are
generally considered bad style. Also, these 8's represent the bulk of
our changes to handle 16-bit bytes in llvm and as they are typically
straightforward to remove, they represent low-hanging fruit. John
McCall mentioned that clang needs fewer changes, which is something we
can confirm and we also experience very few merge conflicts there.

The questions are indeed good in a sense, and I could make a writeup on
them, e.g llvm IR is general and good as it is, except that you may
want an additon to the DataLayout string to signal AU width. However,
as we don't suggest providing support, only this abstraction, it seems
premature to go into these other details. Of course, when someone is
ready to upstream a target with non-8-bit addressable units, we'll
engage and help out with concrete suggestions as we've been doing with
fixedpoint numbers recently.

Jesper Antonsson via llvm-dev

unread,
May 10, 2019, 7:34:55 AM5/10/19
to jfba...@apple.com, echr...@gmail.com, d...@cray.com, llvm...@lists.llvm.org
On Thu, 2019-05-09 at 12:46 -0700, Eric Christopher wrote:
> I agree that consensus seems to be missing. There's definitely some
> assumptions, and more in particular, API and usage assumptions around
> 8 bit bytes in the backends. Also: How do you plan on keeping these
> assumptions from creeping back in?

We propose gradually abstracting the magic numbers only. We merge and
test our OOT backend nightly toward top-of-tree so we would catch new
ones and fix them. However, contributors tend to follow the example of
pre-existing code, so it should happen less frequently than it does
today.

BR, Jesper

JF Bastien via llvm-dev

unread,
May 10, 2019, 12:18:25 PM5/10/19
to Jesper Antonsson, d...@cray.com, llvm...@lists.llvm.org
A single patch isn’t a path forward, it’s one point in what I understand to be a long list. Sure your change sounds nice, but what else is needed? If all you do is upstream this then you’re leading other implementors into believing that everything else works for them. It works for you, but really nobody else, even if you’re making their lives slightly easier. You’re making it easy to break your own expectations because we have no idea what assumptions your upstream changes have. You’re going to come in to random patches, and ask for changes that are undocumented guarantees that we provide for the sake of your platform (or you’re going to proactively fix up breakage after). You say you’re going to maintain things, I want to believe you (the person), but without an actual backend upstream I find it hard to believe a company is actually committed to maintaining this.

That’s not a good way to go about.

I asked a handful of questions, I think you should answer them before moving forward. I’m sure it’s not comprehensive, therefore please document all changes you’d require. Further, I really think someone needs to commit to a real-world backend going upstream.

Jesper Antonsson via llvm-dev

unread,
May 10, 2019, 1:47:21 PM5/10/19
to jfba...@apple.com, d...@cray.com, llvm...@lists.llvm.org
There is no long list. We only propose and plan to do what's in the
patch and very similar magic-8-removals. Honestly, no mission creep.

> Sure your change sounds nice, but what else is needed? If all you do
> is upstream this then you’re leading other implementors into
> believing that everything else works for them.

Yes, it does sound nice! And you're right, we should make a big
disclaimer at the getter, and also clearly spell out in the
documentation that there's no general support for address units/bytes
of different sizes. The risk that people who would like to use llvm for
e.g. 16-bit AUs are misled by the abstraction has to be weighed against
the fact that the very same people will be significantly helped by it,
should they choose llvm.

> It works for you, but really nobody else, even if you’re making their
> lives slightly easier.

Everyone I've talked to with some involvement in non-8-bit addressable
unit backends have been quite positive. I think it will work similarly
for us.

> You’re making it easy to break your own expectations because we have
> no idea what assumptions your upstream changes have. You’re going to
> come in to random patches, and ask for changes that are undocumented
> guarantees that we provide for the sake of your platform (or you’re
> going to proactively fix up breakage after).

This worry about broken assumptions again seems to imply some mission
creep. To my mind, the RFC's magic number removal proposal doesn't
carry significant complexities. Yes, we will fix new magic 8-s and if
we happen to encounter patches that introduce them, we will gently ask
for them to use the abstraction. But again, no drama.

> You say you’re going to maintain things, I want to believe you (the
> person), but without an actual backend upstream I find it hard to
> believe a company is actually committed to maintaining this.
>
> That’s not a good way to go about.

I agree it's not optimal, at least, but my hope is that it's good
enough for the community to accept. Please consider the ease with which
other interested parties can help out and the low impact if a few 8-s
make their way in again.

> I asked a handful of questions, I think you should answer them before
> moving forward. I’m sure it’s not comprehensive, therefore please
> document all changes you’d require.

With all due respect, I have to insist that this request is outside the
scope of the RFC.

> Further, I really think someone needs to commit to a real-world
> backend going upstream.
>
>
> > The 8's are not the root of all evil, but again, magic numbers are
> > generally considered bad style. Also, these 8's represent the bulk
> > of
> > our changes to handle 16-bit bytes in llvm and as they are
> > typically
> > straightforward to remove, they represent low-hanging fruit. John
> > McCall mentioned that clang needs fewer changes, which is something
> > we
> > can confirm and we also experience very few merge conflicts there.
> >
> > The questions are indeed good in a sense, and I could make a
> > writeup on
> > them, e.g llvm IR is general and good as it is, except that you may
> > want an additon to the DataLayout string to signal AU width.
> > However,
> > as we don't suggest providing support, only this abstraction, it
> > seems
> > premature to go into these other details. Of course, when someone
> > is
> > ready to upstream a target with non-8-bit addressable units, we'll
> > engage and help out with concrete suggestions as we've been doing
> > with
> > fixedpoint numbers recently.
>
>

Philip Reames via llvm-dev

unread,
May 10, 2019, 3:31:51 PM5/10/19
to Eric Christopher, JF Bastien, d...@cray.com, llvm...@lists.llvm.org
Jasper,

Just to point out, we already have a lot of precedent for cover
functions to hide bits to bytes conversions where doing so improves
readability.  Changes to improve readability of existing code would be
welcome, regardless of the outcome of the rest of this discussion.  The
concern being raised is about introducing new concepts, not about
cleaning up existing code.  There's a big difference between using
getSizeInBytes vs getSizeInBits as a stylistic cleanup vs requiring it
for correctness reasons. 

A good starting place would be to start with places where we already
have the appropriate cover functions (lots of DataLayout accessors for
instance), and audit their users. 

Keep in mind that patches of this variety will be evaluated purely on
readability/style grounds until a broader consensus around the proposed
direction has been reached. 

Philip

Jesper Antonsson via llvm-dev

unread,
May 14, 2019, 5:02:28 AM5/14/19
to jfba...@apple.com, list...@philipreames.com, echr...@gmail.com, d...@cray.com, llvm...@lists.llvm.org
Philip, thank you. It's true that there's some of that, and we're
certainly willing to explore whatever the community will agree to in
terms of cleanups.

It seems the discussion is fizzling out and I'll send out an email to
try to summarize.

BR, Jesper

Jesper Antonsson via llvm-dev

unread,
May 14, 2019, 7:09:02 AM5/14/19
to llvm...@lists.llvm.org
Thanks to everyone that have weighed in on this! There have been
remarks along the lines that the proposal lacks consensus to move
forward, and that might very well be true. However, the decision model
is not fully clear to me, so before giving up, I'd like to summarize
the discussion to the best of my ability and ask for clarification on
how the community makes its decisions.

(First, I would like to emphasize that this is merely my
interpretation, which may be flawed. If so, my apologies.) The RFC
tl;dr is "without supporting non-8-bit bytes, we want to abstract the
magic 8's used for bytesize to a DataLayout getter".

It garnered some five supporters, including four companies representing
out-of-tree backends: my own Ericsson, Synopsys, CML Microsystems and
Codasip. I also count James Y Knight's (Google) remarks as positive.

Two commenters I interpret as clearly against, among other things
citing the need for a committment for an in-tree-backend using non-8-
bit bytes to undertake such a change: JF Bastien and Philip Reames.

Nine additional commenters provided side remarks to the discussion, and
my interpretation was that five were in some small regard abstraction-
positive as they provided input on the details of the abstraction,
whereas two were more skeptical, and two were neither.


Now to the decision model: If full consensus is required, we clearly
don't have it, and we can stop here. However, if the abstraction were
in place, we'd have a large majority against replacing them with 8s.
And I guess it would be hard to move forward on very many things if
everyone in a huge community can veto?

Or should we take an insider-outsider perspective, then perhaps the
opposers are in majority as their in-tree voices are what counts?

Or should we take a company perspective, then there's four companies
backing this, while I'm unsure if e.g. JF is speaking for Apple?

Or should we apply some reputation weights, so some people can veto?

Or is it an informal mix of all of the above? I'm sorry if I'm being
obnoxious, I really don't mean to. I'd just sleep better with clarity
in closure, so that the discussion don't just fizzle out and I give up
while having a sense that the community is leaning toward the positive
on this.

BR,
Jesper


On Thu, 2019-05-02 at 12:20 +0000, Jesper Antonsson via llvm-dev wrote:
> A. This RFC outlines a proposal regarding non-8-bit-byte support
> that
> got positive reception at a Round Table at EuroLLVM19. The
> general
> topic has been brought up several times before and one good
> overview
> can be found in a FOSDEM 2017 presentation by Jones and Cook:
>
https://protect2.fireeye.com/url?k=0092ea36-5c19e105-0092aaad-865bb277df6a-f2acface76df5bb9&u=https://archive.fosdem.org/2017/schedule/event/llvm_16_bit/
> * A higher probability of LLVM being the compiler of choice for such
> targets.
> * Eventually, as the patch set required to make llvm fully byte size
> agnostic becomes small enough, the effort to provide a mock in-tree
> target with some other byte size should be surmountable.
>
> As cons, one could see a burden for the in-tree community to maintain
> whatever gains that have been had. However the onus should be on
> interested parties to mend any bit-rot. The impact of not having as
> much magic numbers and such should if anything make the code more
> easy
> to understand. The permission to go ahead would be under the
> condition
> that significant added complexities are avoided. Another con would be
> added compilation time e.g. in cases where the byte size is a run-
> time
> variable rather than a constant. However, this cost seems negligible
> in
> practice.
>
> Refactoring examples:
> https://reviews.llvm.org/D61432
>
> Best Regards,
> Jesper

James Y Knight via llvm-dev

unread,
May 23, 2019, 5:02:47 PM5/23/19
to Jesper Antonsson, llvm...@lists.llvm.org
You have raised an interesting question here about model for decision-making, which unfortunately I don't think anyone really has an answer for. I certainly don't. 

Anyways, yes, my response was indeed intended to be mostly-positive encouragement.

It seems to me that LLVM, as a project, generally encourages doing as much development in the upstream project as possible, while also being generally welcoming and helpful to closed-source downstream forks. That said, we cannot ignore the critical requirement that upstream developers continue to be able to make changes without having to remember all of the arbitrary untested/undocumented constraints closed-source downstream project may have.

That's why I like the formulation of this proposal as a "code clean-up", rather than a "feature". As a feature, it really cannot exist without a backend using it upstream, because we have no ability to test it, so it will be necessarily be broken. On the other hand, as a cleanup, it's entirely acceptable that only some of the code has abstracted away the number 8.

I'd suggest that this proposed cleanup should be allowed to happen -- but that upstream, we do NOT expose it as an overrideable value (e.g., don't put it in a target hook). Instead, make a non-overrideble function which always just returns 8. Document it as returning the number of bits in a character -- and that it's always 8 right now, but with the possibility that someday llvm may handle other bit-widths.

Any downstream users who wish to change the value now, will need to patch it to be a target hook (or whatever) in their fork -- and also fix anything that's not using the abstraction properly, because we don't actually support any other values upstream (due to no ability to test them).

Philip Reames via llvm-dev

unread,
May 28, 2019, 2:20:48 PM5/28/19
to Jesper Antonsson, llvm...@lists.llvm.org
Jesper, on the abstract decision model side of things, let me share my
understanding.  Note that I'm specifically not speaking for the project
as a whole, just the way I think about things.

In practice, we generally use a loose consensus model.  We don't require
full consensus, but we do require there to be a) an agreement on
direction being generally valuable to more than the contributor, b) a
lack of strong objections from established contributors, and c) an
(often implicit) commitment from the proposer to invest enough effort in
the project to justify any downsides other contributors might experience. 

Point (c) is frequently where a lot of large proposals from new
contributors fail.  Unless there's someone else strongly motivated to
drive it forward who can step in to satisfy (c), then a proposal is
likely to die on the vine.   Some of the most important feedback any
proposal gets is about how to reduce the cost to other contributors.  If
that feedback is ignored, then the proposal is almost certain dead.  (I
see this all the time.)

Point (b) is a major stumbling block, but can almost always be worked
through.  There's a couple of implicit points worth noting.  1) We
require that objections be largely technical in nature.  2) Anyone
objecting strongly is expect to be themselves a long term contributor
whose been through the process before. 3) There's a lot of horse trading
which goes on behind the scenes, and 4) because a strong objection is so
powerful, it's frequently waived.  Objections based on (1) can
frequently be addressed through offline direct conversation, and
frequently lead to revisions in proposal.  (Usually, a reduction in
initial scope, sometimes the opposite.)  (3) is a practical necessity as
no one has the time to review and think through *everything*; I'm much
more likely to invest time in a proposal coming from someone who I've
worked closely with in the past than a stranger.  (4) shows up in subtle
ways.  If you've seen me or someone else say something to the effect of
"Drive by comment", "minor concern", "deferring to X on point Y", those
are all indications that I've chosen explicitly *not* to express a
strong objection.

Point (a) may seem like the core point, but it's generally the easiest. 
Once (b) has been addressed, (a) almost always follows. 

Again, speaking only for myself. 

Philip

Jesper Antonsson via llvm-dev

unread,
May 29, 2019, 5:57:50 AM5/29/19
to jykn...@google.com, llvm...@lists.llvm.org
James, thank you, this sounds reasonable. We'll proceed eventually with
some suggestions for cleanup patches along these lines and along the
lines Philip Reames suggested earlier. We'll aim for transparency and
to not overstep the bounds given by this discussion.

BR, Jesper

Jesper Antonsson via llvm-dev

unread,
May 29, 2019, 7:41:47 AM5/29/19
to list...@philipreames.com, llvm...@lists.llvm.org
Philip, thank you for your eludication. The loose consensus model that
you outline probably works well in most cases. It does seem to favor
status quo while disfavoring newcomers so I imagine a number of useful
efforts will be left by the wayside and this to suppress community
growth somewhat. OTOH, it's easy to see advantages and there are
perhaps no better decison models or tradeoffs to be had.

Thanks again,
Jesper
Reply all
Reply to author
Forward
0 new messages