Bit Manipulation and Big Endian support

1,494 views
Skip to first unread message

Allen Baum

unread,
Jun 8, 2018, 5:02:38 PM6/8/18
to RISC-V ISA Dev
i've heard more than once that in Japan (and perhaps China), the overwhelming majority of microcontrollers and embedded processors are Big-Endian. This is, I believe, inhibiting the support of RiscV in those geopgraphies (not eliminating it, but certainly slowing it down ).

The only support that I've heard of for big-endian is the currently defunct BitManipulation WG, and even there the support was for swapping bytes after they've been loaded and before they are store.

a. Is that adequate?
b. If not, do we expect anyone who wants native BigEndian support to develop their own custom extension?
c. if not - have there been any discussions for a standard BigEndian discussion?

Tommy Thorn

unread,
Jun 8, 2018, 5:05:54 PM6/8/18
to Allen Baum, RISC-V ISA Dev
Hi Allen,

One data point: RISC-V was originally bi-endian, but overwhelmingly the western world have settled on little and it greatly simplified the standard to drop it.  I don't think adding native BigEndian makes sense but adding support for various swaps does make a ton of sense.

Tommy

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAF4tt%3DCtyw7%2B8vbgVj3rkod7tOnPt-huj8v4WdKg%3DuhPY%3DtHxw%40mail.gmail.com.

Samuel Falvo II

unread,
Jun 8, 2018, 5:43:54 PM6/8/18
to Tommy Thorn, Allen Baum, RISC-V ISA Dev
On Fri, Jun 8, 2018 at 2:05 PM, Tommy Thorn
<tommy...@esperantotech.com> wrote:
> Hi Allen,
>
> One data point: RISC-V was originally bi-endian, but overwhelmingly the
> western world have settled on little and it greatly simplified the standard
> to drop it. I don't think adding native BigEndian makes sense but adding
> support for various swaps does make a ton of sense.

Just my opinion on this matter.

While adding swaps is useful in some cases, more useful is having
native big- and little-endian memory accessors. If you look at the
overwhelming majority of use-cases for byte-swap operations, it's
always to "fix" the endianness of a fetched field before subsequent
processing, and again prior to storing results back into the field
(e.g., BSD sockets' hton-family of macros). Eliminating those swaps
seems more useful. Being able to declare inside structures which
fields are explicitly big- or little-endian also vastly improves the
readability of program source listings.

--
Samuel A. Falvo II

Allen Baum

unread,
Jun 8, 2018, 5:50:21 PM6/8/18
to Samuel Falvo II, Tommy Thorn, RISC-V ISA Dev
So 4 possible levels of support:
 - none
 - swap instructions
 - BigEndian load/store mode
 - BigEndian load/store instructions

Luke Kenneth Casson Leighton

unread,
Jun 8, 2018, 8:17:32 PM6/8/18
to Allen Baum, RISC-V ISA Dev
On Fri, Jun 8, 2018 at 10:02 PM, Allen Baum
<allen...@esperantotech.com> wrote:

> i've heard more than once that in Japan (and perhaps China), the
> overwhelming majority of microcontrollers and embedded processors are
> Big-Endian.

in japan: PowerPC, yes. at barcelona i met someone from japan who
turned out to be the unofficial host of the powerpc-be debian port.
he was specifically there to guage the practicality of making RISC-V
bi-endian. ( also, just worth observing: Andes V3 is bi-endian )

note that that's not *if* to make RISC-V bi-endian, but *how* to make
RISC-V bi-endian.

i did not take notes unfortunately, so i do not know his name. he
mentioned that he was returning to japan with a report, with a view to
applying for government funding to get this done. i introduced him to
manuel (mafm on OFTC #debian-riscv) so he could get a rough idea of
how much work would be involved in debootstrapping a riscv-be debian
port, and i noticed he was talking to yunsup as well, i did not take
part in that conversation.

basically the powerpc-be community in japan is so enormous and the
software base so large that they cannot just "drop everything" and
convert to little-endian architectures: they *need* bi-endian-ness (in
some fashion).

l.

Shumpei Kawasaki

unread,
Jun 8, 2018, 8:23:04 PM6/8/18
to Luke Kenneth Casson Leighton, Allen Baum, RISC-V ISA Dev

I made that comment in the marketing members' meeting. 

99 percent of PC, mobile and data center applications are little-endian but it is also true that 90 percent industrial and infrastructure applications are big-endian. ARM users actively use its big-endian mode and will continue to do so. 

The bi-endian feature can improve performance or simplify the logic of networking devices and software. Many architectures (ARM, PowerPC, Alpha, SPARC V9, MIPS, PA-RISC, SuperH SH-4 and IA-64) feature a setting which allows for switchable endianness in data segments, code segments or both (Source: https://en.wikipedia.org/wiki/Endianness).  

GNU Compiler Collections, binutils, Linux, UEFI and other cross tools and OSes support bi-endian in clean manners. It is more cross tool work that is needed and work in hardware. We can start some ground work to provide a bi-endian platform for RISC-V and RISC-V GCC shows no prior bi-endian work so developers will need to work with community.   We know that this reduces porting work involved in convert applications to RISC-V. 

This feature on RISC-V will creates an easier transition path from PowerPC, SH, 68K, and Coldfire. 

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Jacob Bachmeyer

unread,
Jun 8, 2018, 9:39:55 PM6/8/18
to Shumpei Kawasaki, Luke Kenneth Casson Leighton, Allen Baum, RISC-V ISA Dev
Shumpei Kawasaki wrote:
> I made that comment in the marketing members' meeting.
>
> 99 percent of PC, mobile and data center applications are
> little-endian but it is also true that 90 percent industrial and
> infrastructure applications are big-endian. ARM users actively use its
> big-endian mode and will continue to do so.
>
> The bi-endian feature can improve performance or simplify the logic of
> networking devices and software. Many architectures (ARM, PowerPC,
> Alpha, SPARC V9, MIPS, PA-RISC, SuperH SH-4 and IA-64) feature a
> setting which allows for switchable endianness in data segments, code
> segments or both (Source: https://en.wikipedia.org/wiki/Endianness).
>
> GNU Compiler Collections, binutils, Linux, UEFI and other cross tools
> and OSes support bi-endian in clean manners. It is more cross tool
> work that is needed and work in hardware. We can start some ground
> work to provide a bi-endian platform for RISC-V and RISC-V GCC shows
> no prior bi-endian work so developers will need to work with
> community. We know that this reduces porting work involved in
> convert applications to RISC-V.

Would big-endian data memory access opcodes be a good solution? There
would be the small complexity that RISC-V program text would always be
little endian, but that is needed due to the instruction length encoding.

There is no room in the 32-bit opcode space to put big-endian
LOAD/STORE, but an extension could easily add these as 48-bit or 64-bit
opcodes. The big advantage I see from such a bi-endian extension is
that it would make RISC-V truly bi-endian with native memory access in
either order as needed. (For the extreme embedded case, new standard
long-form big-endian memory access opcodes could be "aliased" into
CUSTOM-0/CUSTOM-1 to fit them on a 32-bit-instruction-only machine.)


-- Jacob

Luke Kenneth Casson Leighton

unread,
Jun 8, 2018, 10:01:38 PM6/8/18
to Jacob Bachmeyer, Shumpei Kawasaki, Allen Baum, RISC-V ISA Dev
On Sat, Jun 9, 2018 at 2:39 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:

> There is no room in the 32-bit opcode space to put big-endian LOAD/STORE,
> but an extension could easily add these as 48-bit or 64-bit opcodes.

the consequences of that are that it would make big-endian a "second
rate citizen"... although at this point it's almost too late.
https://en.wikipedia.org/wiki/Endianness#Bi-endianness seems to me to
imply that the equivalent of CSRs may have been used historically to
set endian-ness.

l.

Jacob Bachmeyer

unread,
Jun 8, 2018, 10:13:15 PM6/8/18
to Luke Kenneth Casson Leighton, Shumpei Kawasaki, Allen Baum, RISC-V ISA Dev
I do not see a serious problem here, since that proverbial ship has
arguably already sailed: RISC-V program text is little-endian, and
changing *that* would make a huge mess. Further, the use of additional
big-endian memory access opcodes would make RISC-V truly bi-endian, with
the big-endian/little-endian distinction being made at runtime and
encoded into the program text, rather than being an implicit parameter.
I argue that this is a better fit, since it would allow/require the
expected byte order for data to be explicitly stated in the program.

Lastly, (and this ties back to the extensible assembler database I
proposed earlier) standardizing big-endian memory access as 48-bit or
64-bit opcodes does not preclude implementations from "aliasing" those
long-form standard opcodes into the 32-bit opcode space as non-standard
encodings of standard instructions.


-- Jacob

Bruce Hoult

unread,
Jun 8, 2018, 10:27:10 PM6/8/18
to Jacob Bachmeyer, Shumpei Kawasaki, Luke Kenneth Casson Leighton, Allen Baum, RISC-V ISA Dev
I imagine it might be possible to find room for simple sized big endian load and store without any offset (or indexing). It would need 14 bits of opcode space: 1 for load/store, 3 for size/type, 5 for pointer register, 5 for src/dest register.  Or 2x 13 bits, obviously.

The performance-critical uses (where a swap instruction MIGHT not be enough) are likely to be stepping through data in a loop, and not need an offset.

Jim Wilson

unread,
Jun 8, 2018, 10:27:17 PM6/8/18
to Jacob Bachmeyer, Shumpei Kawasaki, Luke Kenneth Casson Leighton, Allen Baum, RISC-V ISA Dev
On Fri, Jun 8, 2018 at 6:39 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
> Would big-endian data memory access opcodes be a good solution? There would
> be the small complexity that RISC-V program text would always be little
> endian, but that is needed due to the instruction length encoding.

On ARMv7 and later, the code is always little endian, even when the
processor is in big-endian mode. It is only the data accesses that
change, as it is only the data accesses that matter to end users. I
think ARMv6 has support for both big and little endian code, depending
on a mode bit, but the big-endian code stuff was only for backwards
compatibility with older ARM processors, and was dropped in ARMv7.

Jim

Shumpei Kawasaki

unread,
Jun 8, 2018, 10:27:54 PM6/8/18
to jcb6...@gmail.com, Luke Kenneth Casson Leighton, Allen Baum, RISC-V ISA Dev, Oleg Endo, Akira Tsukamoto

Compared to the languages like Pascal, C would let you access a data in more than one way e.g. structure, union, etc.. Handling these constracts involve bit arrangements on the top of byte arrangements. Microsoft and Hitachi ported .NET Micro-framework, originally little-endian, to big-endian SHs. It took engineers four times longer from what we initially anticipated taking very long time to shake out issues. The programmers involved were all systems programmers.

SH has swap instruction. ARM and PowerPC have endian swap instructions for handling endian, and all also offer bi-endian options. Enabling endian swap instructions in high-level language programming is not that straightforward. Linux network driver code is layered in such a way high-level functions abstract out endian and then endian-aware code at the bottom layer of functions. 

-Shumpei

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Andrew Waterman

unread,
Jun 8, 2018, 11:31:06 PM6/8/18
to jcb6...@gmail.com, Allen Baum, Luke Kenneth Casson Leighton, RISC-V ISA Dev, Shumpei Kawasaki
IMO, bi-endianness isn’t enough of a goal to give the big-endian loads and stores 12-bit offsets. If they are just register-indirect, they can be encoded more cheaply in the 32-bit space.

(FWIW, I still favor byte-swap instructions for this purpose. That’s what we/Tommy proposed in the original B extension proposal years ago.)




-- Jacob


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Samuel Falvo II

unread,
Jun 8, 2018, 11:35:50 PM6/8/18
to Andrew Waterman, Jacob Bachmeyer, Allen Baum, Luke Kenneth Casson Leighton, RISC-V ISA Dev, Shumpei Kawasaki
On Fri, Jun 8, 2018 at 8:30 PM, Andrew Waterman <and...@sifive.com> wrote:
> (FWIW, I still favor byte-swap instructions for this purpose. That’s what
> we/Tommy proposed in the original B extension proposal years ago.)

I won't contest this. My comment did clearly state that it was an
opinion, and thus, not really backed by any kind of science. I
suppose it's possible to fuse load-then-swap and swap-then-store
sequences to get comparable performance benefits; the disadvantage, of
course, would be greater space consumption.

Richard Herveille

unread,
Jun 9, 2018, 12:11:59 AM6/9/18
to Allen Baum, Samuel Falvo II, Tommy Thorn, RISC-V ISA Dev, Richard Herveille

 

On 08/06/2018, 23:50, "Allen Baum" <allen...@esperantotech.com> wrote:

 

So 4 possible levels of support:

 - none

 - swap instructions

 - BigEndian load/store mode

 - BigEndian load/store instructions

 

 

I doubt little vs big-endian is an issue for adoption. We see a lot of requests from China, admittedly less from Japan. But then Japan is considered conservative.

Instead of declaring new opcodes for big-endian access, wouldn’t declaring the memory space/region big-endian be sufficient? That could be endoded in the MMU record, the PMA and/or the PMP records.

This requires no changes to the CPU pipeline. The CPU then just always works in little-endian mode. There’s no need for new opcodes or additional byte-swap instructions. The data is just loaded/stored in little/big endian format.

 

Cheers,

Richard

 

--

You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Albert Cahalan

unread,
Jun 9, 2018, 12:27:43 AM6/9/18
to Tommy Thorn, Allen Baum, RISC-V ISA Dev
On 6/8/18, Tommy Thorn <tommy...@esperantotech.com> wrote:

> One data point: RISC-V was originally bi-endian, but overwhelmingly the
> western world have settled on little and it greatly simplified the standard
> to drop it. I don't think adding native BigEndian makes sense but adding
> support for various swaps does make a ton of sense.

I have three suggestions for what "various swaps" might be.

The first is that an immediate value determines the swapping.
Viewing the register as an array of bits, the source index for
each destination bit is determined by XORing the immediate
value with the destination index of that bit. Thus a value of 0x00
does nothing, a value of 0x01 swaps adjacent bits, a value of
0x04 swaps adjacent nibbles, a value of 0x08 swaps adjacent
bytes (a vector htons), a value of 0x18 does 1 to 4 of htonl,
a value of 0x07 does bit reversal within bytes, etc.

The second is that sign extension might commonly follow a
swapping operation. The above would need an extra 3 bits
to specify the size, for a total of 10 on 128-bit RISC-V. Alone,
it only takes 2 bits. It is less important to have unsigned versions
because sign bits are conveniently cleared by smaller-sized
stores to memory, though that would just take another bit.
Since the RISC-V immediates tend to be 11-bit, it is available.

The third is that shuffle instructions can handle byte swapping.

Jacob Bachmeyer

unread,
Jun 9, 2018, 12:44:47 AM6/9/18
to Shumpei Kawasaki, Luke Kenneth Casson Leighton, Allen Baum, RISC-V ISA Dev, Oleg Endo, Akira Tsukamoto
Shumpei Kawasaki wrote:
> Compared to the languages like Pascal, C would let you access a data
> in more than one way e.g. structure, union, etc.. Handling these
> constracts involve bit arrangements on the top of byte arrangements.
> Microsoft and Hitachi ported .NET Micro-framework, originally
> little-endian, to big-endian SHs. It took engineers four times longer
> from what we initially anticipated taking very long time to shake out
> issues. The programmers involved were all systems programmers.
>
> SH has swap instruction. ARM and PowerPC have endian swap instructions
> for handling endian, and all also offer bi-endian options. Enabling
> endian swap instructions in high-level language programming is not
> that straightforward. Linux network driver code is layered in such a
> way high-level functions abstract out endian and then endian-aware
> code at the bottom layer of functions.

I appear to have been misunderstood. I am proposing additional
big-endian LOAD/STORE opcodes. In high-level code (C is high-level
enough) endianness would be indicated per-datum, possibly using
attributes and defaulting to little-endian if unspecified. The compiler
then uses the big-endian memory access instructions when accessing data
that is big-endian according to its type.

For example, a TCP header could be a simple struct with
attribute((big_endian)) applied. GCC would then know to access
multi-byte fields in that struct using big-endian opcodes.


-- Jacob

Jacob Bachmeyer

unread,
Jun 9, 2018, 12:51:49 AM6/9/18
to Andrew Waterman, Allen Baum, Luke Kenneth Casson Leighton, RISC-V ISA Dev, Shumpei Kawasaki
Andrew Waterman wrote:
> [...]
> IMO, bi-endianness isn’t enough of a goal to give the big-endian loads
> and stores 12-bit offsets. If they are just register-indirect, they
> can be encoded more cheaply in the 32-bit space.
>
> (FWIW, I still favor byte-swap instructions for this purpose. That’s
> what we/Tommy proposed in the original B extension proposal years ago.)

I only offered the suggestion because it appears that there is interest
from parties who seem to find byte-swap insufficient and I was trying to
keep them "at parity" with the baseline LOAD/STORE as much as possible.

This suggests that big-endian LOAD/STORE could be assembler
pseudo-instructions combining a byte-swap and an ordinary LOAD/STORE.
Those assembler pseudo-instructions could be overridden using the
extensible assembler database for hardware that actually does define
(non-standard) encodings for big-endian LOAD/STORE. As Sam Falvo seems
to have suggested in his reply, the "standard 64-bit encoding" for
big-endian LOAD/STORE could be a pair of 32-bit instructions.


-- Jacob

ron minnich

unread,
Jun 9, 2018, 1:09:22 AM6/9/18
to jcb6...@gmail.com, Shumpei Kawasaki, Luke Kenneth Casson Leighton, Allen Baum, RISC-V ISA Dev, Oleg Endo, Akira Tsukamoto
On Fri, Jun 8, 2018 at 9:44 PM Jacob Bachmeyer <jcb6...@gmail.com> wrote:


For example, a TCP header could be a simple struct with
attribute((big_endian)) applied.  GCC would then know to access
multi-byte fields in that struct using big-endian opcodes.


as regards something like this, I've never seen a convincing argument re performance that we need to tag data with endian attributes. 

And I'm mentioned as one of the guys who pushed such a bad idea in the now-withdrawn https://standards.ieee.org/findstds/standard/1596.5-1993.html, so in my dark past, I even believed in this kind of thing. Oops.

We did a test a few years back and as of gcc 6, it's pretty smart about turning certain sequences of byte access into single word load/store.  

As regards most code that thinks it needs to be endian-aware, this particular note is useful:

I've found that Rob's note is correct far more often than not. 

Andrew Waterman

unread,
Jun 9, 2018, 4:46:50 AM6/9/18
to Samuel Falvo II, Allen Baum, Jacob Bachmeyer, Luke Kenneth Casson Leighton, RISC-V ISA Dev, Shumpei Kawasaki
Yeah, I agree with the intuition behind your previous email. What underpins my preference for the byte-swap instruction approach is that it gets the lion’s share of the benefit, and it’s an easier ask of both HW implementors and software-stack maintainers.

The fusion argument is relevant, since big-endian memory ops will either be 48-bit instructions or 32-bit instructions with limited addressing modes. The addressing mode might make it become a 48- or 64-bit sequence, anyway. So fusing an RVI or RVC memory access with a 32-bit byte-swap instruction could be similarly efficient in many cases.

Luke Kenneth Casson Leighton

unread,
Jun 9, 2018, 5:06:12 AM6/9/18
to Andrew Waterman, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki
it does mean that big-endian would come with an instruction-cache and
power-usage hit. would anyone have an idea of what kind of ratios
such big-endian load/stores would be in terms of total numbers of
instructions executed?

l.

Andrew Waterman

unread,
Jun 9, 2018, 5:31:11 AM6/9/18
to Luke Kenneth Casson Leighton, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki
My point is that there will be such a hit, no matter which approach is
taken. There's effectively no room in RVC to encode new big-endian
loads and stores. There's effectively no room in RVI to encode new
big-endian loads and stores with 12-bit offsets. So, you're left
either wider instructions or two-instruction sequences. Both are
defensible, though the latter is less onerous.

>
> l.

Luke Kenneth Casson Leighton

unread,
Jun 9, 2018, 5:45:43 AM6/9/18
to Andrew Waterman, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki
On Sat, Jun 9, 2018 at 10:30 AM, Andrew Waterman <and...@sifive.com> wrote:

> My point is that there will be such a hit, no matter which approach is
> taken. There's effectively no room in RVC to encode new big-endian
> loads and stores. There's effectively no room in RVI to encode new
> big-endian loads and stores with 12-bit offsets. So, you're left
> either wider instructions or two-instruction sequences.

there is another option: the conflict-resolution scheme. it was
discussed a couple months back, and is "effectively" as if 32-bit (or
other sized) opcodes had been extended (by some hidden bits that are
set with a CSR).

using that scheme the actual meaning of existing opcodes may be
"redirected" to a completely different execution engine, *without*
impact on the pipeline speed or introducing extra latency [1], and,
crucially, allowing the processor to be switched back to "standard"
meanings very very quickly.

conceptually it's exactly like c++ namespaces "using ABC".

... now that i think about it, any existing processor that switches
implicitly between big-endian and litte-endian execution meanings of
its instructions probably has something near-identical to this going
on under the hood.

would there be anything in RISC-V that prevented or prohibited the
creation of a "using bigendian" namespace, such that the select few
instructions which needed different behaviour would be redirected to
alternative execution engines?

l.

[1] several people raised the concern during the discussion that extra
latency would be introduced into the decode phase: (a) this isn't true
as the decode muxer just has a couple of extra hidden bits into the
selection AND gate (b) MISA *already* enables/disables instructions so
the concept of switching instructions on / off is required and
well-understood, and there have been no complaints from implementors
about MISA introducing pipeline latency.

Luke Kenneth Casson Leighton

unread,
Jun 9, 2018, 5:53:17 AM6/9/18
to Andrew Waterman, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki
On Sat, Jun 9, 2018 at 10:45 AM, Luke Kenneth Casson Leighton
<lk...@lkcl.net> wrote:
> On Sat, Jun 9, 2018 at 10:30 AM, Andrew Waterman <and...@sifive.com> wrote:
>
>> My point is that there will be such a hit, no matter which approach is
>> taken. There's effectively no room in RVC to encode new big-endian
>> loads and stores. There's effectively no room in RVI to encode new
>> big-endian loads and stores with 12-bit offsets. So, you're left
>> either wider instructions or two-instruction sequences.
>
> there is another option: the conflict-resolution scheme. it was
> discussed a couple months back, and is "effectively" as if 32-bit (or
> other sized) opcodes had been extended (by some hidden bits that are
> set with a CSR).

p.s. jacob already came up with a corresponding / matching scheme for
compilers / binutils, which takes the hidden prefix into account and
walks it through from gcc to binutils to actual assembler.

Andrew Waterman

unread,
Jun 9, 2018, 5:54:50 AM6/9/18
to Luke Kenneth Casson Leighton, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki
On Sat, Jun 9, 2018 at 2:45 AM, Luke Kenneth Casson Leighton
<lk...@lkcl.net> wrote:
> On Sat, Jun 9, 2018 at 10:30 AM, Andrew Waterman <and...@sifive.com> wrote:
>
>> My point is that there will be such a hit, no matter which approach is
>> taken. There's effectively no room in RVC to encode new big-endian
>> loads and stores. There's effectively no room in RVI to encode new
>> big-endian loads and stores with 12-bit offsets. So, you're left
>> either wider instructions or two-instruction sequences.
>
> there is another option: the conflict-resolution scheme. it was
> discussed a couple months back, and is "effectively" as if 32-bit (or
> other sized) opcodes had been extended (by some hidden bits that are
> set with a CSR).
>
> using that scheme the actual meaning of existing opcodes may be
> "redirected" to a completely different execution engine, *without*
> impact on the pipeline speed or introducing extra latency [1], and,
> crucially, allowing the processor to be switched back to "standard"
> meanings very very quickly.

I agree that extending the opcode by a few bits will not materially
exacerbate decode latency.

But this issue isn't anywhere near important enough to merit such an
elaborate strategy. Either Sam's or my/Tommy's solution is
sufficient.

>
> conceptually it's exactly like c++ namespaces "using ABC".
>
> ... now that i think about it, any existing processor that switches
> implicitly between big-endian and litte-endian execution meanings of
> its instructions probably has something near-identical to this going
> on under the hood.
>
> would there be anything in RISC-V that prevented or prohibited the
> creation of a "using bigendian" namespace, such that the select few
> instructions which needed different behaviour would be redirected to
> alternative execution engines?
>
> l.
>
> [1] several people raised the concern during the discussion that extra
> latency would be introduced into the decode phase: (a) this isn't true
> as the decode muxer just has a couple of extra hidden bits into the
> selection AND gate (b) MISA *already* enables/disables instructions so
> the concept of switching instructions on / off is required and
> well-understood, and there have been no complaints from implementors
> about MISA introducing pipeline latency.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAPweEDwETZqoHqnKQMr1cHfNbxkVYMrBLYXoE%2BT-ZAqhO0ydVA%40mail.gmail.com.

Luke Kenneth Casson Leighton

unread,
Jun 9, 2018, 6:23:08 AM6/9/18
to Andrew Waterman, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Sat, Jun 9, 2018 at 10:54 AM, Andrew Waterman <and...@sifive.com> wrote:
> On Sat, Jun 9, 2018 at 2:45 AM, Luke Kenneth Casson Leighton
> <lk...@lkcl.net> wrote:
>> On Sat, Jun 9, 2018 at 10:30 AM, Andrew Waterman <and...@sifive.com> wrote:
>>
>>> My point is that there will be such a hit, no matter which approach is
>>> taken. There's effectively no room in RVC to encode new big-endian
>>> loads and stores. There's effectively no room in RVI to encode new
>>> big-endian loads and stores with 12-bit offsets. So, you're left
>>> either wider instructions or two-instruction sequences.
>>
>> there is another option: the conflict-resolution scheme. it was
>> discussed a couple months back, and is "effectively" as if 32-bit (or
>> other sized) opcodes had been extended (by some hidden bits that are
>> set with a CSR).
>>
>> using that scheme the actual meaning of existing opcodes may be
>> "redirected" to a completely different execution engine, *without*
>> impact on the pipeline speed or introducing extra latency [1], and,
>> crucially, allowing the processor to be switched back to "standard"
>> meanings very very quickly.
>
> I agree that extending the opcode by a few bits will not materially
> exacerbate decode latency.
>
> But this issue isn't anywhere near important enough to merit such an
> elaborate strategy.

if it's considered elaborate then it's been completely misunderstood:
the scheme is simply a generalisation of a well-used (but probably not
that well-documented) technique. i would go so far as to speculate
that it so *un*elaborate, being quite literally no more than putting a
couple extra bits into the AND gate of a given instruction at decode
phase, that teams using the technique to create dynamic bi-endian
processors didn't see fit to give it a name! :)

> Either Sam's or my/Tommy's solution is sufficient.

... with performance / power penalties that may or may not be
acceptable to an implementor.

luckily the conflict-resolution scheme fits within the RISC-V rules
(which say that even standard opcodes may be given different meanings)
so there is no conflict even with the RISC-V ISA Manual, even to the
point where a processor may apply for (and receive) a Conformance
Certificate. i.e. it doesn't need the RISC-V Foundation's approval to
implement.

l.

Andrew Waterman

unread,
Jun 9, 2018, 6:56:53 AM6/9/18
to Luke Kenneth Casson Leighton, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki
On Sat, Jun 9, 2018 at 3:22 AM, Luke Kenneth Casson Leighton
The hardware's trivial. It's elaborate because dynamically
repurposing opcodes significantly complicates the software story.

>
>> Either Sam's or my/Tommy's solution is sufficient.
>
> ... with performance / power penalties that may or may not be
> acceptable to an implementor.
>
> luckily the conflict-resolution scheme fits within the RISC-V rules
> (which say that even standard opcodes may be given different meanings)
> so there is no conflict even with the RISC-V ISA Manual, even to the
> point where a processor may apply for (and receive) a Conformance
> Certificate. i.e. it doesn't need the RISC-V Foundation's approval to
> implement.
>
> l.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAPweEDxaXn0mCQY0JtxKJQonPuCusQ4hxkfDo2PujYJEtvgmrQ%40mail.gmail.com.

Gavin Stark

unread,
Jun 9, 2018, 7:39:18 AM6/9/18
to RISC-V ISA Dev, and...@sifive.com, sam....@gmail.com, allen...@esperantotech.com, jcb6...@gmail.com, shumpei....@swhwc.com
It is with great hesitation that I join the fray here... I have many scars...

Firstly, endianness is really only about how one interprets units of size X in as another entity of size Y. This is only a hardware issue if there is hardware that does this interpreting.

From what I can tell the instruction stream is already defined for any endianness; this is because the instruction stream regards the memory as a stream of 16-bit entities, and a RISC-V instruction is constructed from 'byte' address Z as its bottom 16 bits and (if required) from 'byte' address Z+2 as its next 16 bits, and so on. This is in section 1.2. How the 16-bits are stored in the two byes at Z is 'according to the implementation’s natural endianness'.
Effectively this means that instructions are units of size 16 bits and may be interpreted as another entity of size 16 bits, or 32 bits, or 48 bits; this is a hardware issue, and the specification is clear on how it should be done.

As a side note, this is an issue for JIT compilers and the toolchain; as noted above, the ARM implementation is similar in that instructions are stored in 'little endian'. 

For the data side I don't believe there is an endianness issue in the ISA, with one caveat: section 2.6 states ' RV32I provides a 32-bit user address space that is byte-addressed and little-endian.' Without this statement RISC-V would be biendian; it is an unnecessary restriction.

Again, having said that, there is in every hardware implementation of the CPU somewhere where memories are accessed and data presented over a bus. If this bus is 32 bits wide, with a 32-bit word address, and a byte write is being performed then a particular subsection of the bus is expected to be written, probably dependent on a byte-enable signal. To generate that byte-enable signal requires a choice of which byte address corresponds to which byte lane, and that therefore means the hardware is interpreting units of size 8 as an entity of size 32.

What is the approach if the bus is 64 bits wide? What if it is 128 bits wide? Or, for the embedded space, just 16 bits wide?

So the statement in section 2.6 is relevant - but it is effectively a statement about the platform, not the ISA.

I've stated the above as background. I've been building embedded CPUs for over 20 years now, and have had the big-endian/little-endian question many times over, from both a platform and CPU perspective.
One of the current implementations I am responsible for is what used to be the Intel micro engine, which is a 32-bit network processor core. The question often comes up as 'is it big-endian or little-endian'. Well, it isn't either. It is a 32-bit word processor. It does not interpret 32-bit words as bytes. All memory transactions are in the form of 32-bit quantities.
Except... The main memory subsystem of the surrounding platform is 64-bit, and it can *sometimes* by accessed using a 'byte' address. In this case *much* of the operation is done LWBE - little-word-big-byte-endian. The byte endianness *only* matters if transactions (which are in terms of 32-bit quantities) are *not* aligned to a 32-bit word boundary; in fact, most memory transactions in our implementation that support such unaligned transactions support both little- and big-endian understanding of the bottom 2 bits of address and of the data buses- but this is not a processor issue, this is a memory module issue. Yet since the memory is 32-bit word-addressed there has to be an 'endianness of the databus', in terms of which of the 32-bits corresponds to odd 32-bit memory addresses and which to even 32-bit memory addresses (hence the 'little-word' endianness).

Now, Allen's initial questions were, therefore, *really good*, as they were not related to doing much in the processor (the ISA should be agnostic...). He asked:

>The only support that I've heard of for big-endian is the currently defunct BitManipulation WG, and even there the support was for swapping bytes after they've been loaded and before they are store.
>
>a. Is that adequate?
>b. If not, do we expect anyone who wants native BigEndian support to develop their own custom extension?
>c. if not - have there been any discussions for a standard BigEndian discussion?

and his follow-on

>So 4 possible levels of support:
> - none
> - swap instructions
> - BigEndian load/store mode
> - BigEndian load/store instructions

And so my answers (or input to the discussion):

* I think that the platform definition should be explicitly for a fully little endian; there might be an additional option for a fully big-endian platform, but the endianness of 128-bit and 64-bit memory subsystems may need to be explicit.

* To provide extra support for interpretation of data as a different endianness a byte swap instruction is handy.

* To add bit-endian mode one still has to be explicit about what it means. I have also in the past seen three solutions.

1. A pin on the processor; this is just an input to the (data) memory access subsystem to tell it how to interpret addresses, and is inflexible, but could be standardised quite easily (from a hardware perspective).

2. A register in the processor (usually a CSR); this is a 'dynamic' input to the (data) memory access subsystem to tell it how to interpret addresses. This impacts the ISA in the RISC-V terminology (since it defines the CSRs) but perhaps just for particular platforms. This seems do-able

3. An MMU bit in the page tables that identify an endianness; this seems to me to be more complex than is required, since the target would (at best) be to support embedded processor designs which would be of a single endianness throughout.

* To add big-endian load/store instructions one has to be explicit about what this means - is it fully big-endian (128-bit bigendian, 64-bit big-endian, 32-bit big-endian, 16-bit big endian (!) etc). This scares me in the sense that it (as Andrew has said) is a lot of instructions to add. It has been suggested that an 'extended instruction encoding CSR' could be used; this would require saving across interrupts and system calls, and would be a source of considerable bugginess in software (since most of the time it has no effect). And this whole path (of instructions knowing endianness) makes it sound like the *processor* has an endianness, when it really doesn't - it is the memory subsystem and the software.

FWIW the networking space was, going back 10-15 years, wholly big-endian. Cisco's IOS was big-endian only, which meant that there was no way they could utilise any x86 technology. There was an IOS port to big-endian ARM (i.e. a port to the ISA, not a port of endianness), but only for specific ARM designs as most of them did not support big-endian at the time. Cisco eventually moved to be biendian, and they reaped the benefits. x86 never moved... :-)
Nowadays the networking space is (I would estimate) 90% little-endian linux.
I'm not suggesting the embedded space should all jump over, but I would say that making big-endian much of a processor issue would be unnecessary; keep it as a memory subsystem issue, and possibly define mechanisms in a platform specification. And if you like, add a swap instruction coz its nice to have.

--Gavin

Guy Lemieux

unread,
Jun 9, 2018, 10:58:03 AM6/9/18
to ron minnich, Akira Tsukamoto, Allen Baum, Luke Kenneth Casson Leighton, Oleg Endo, RISC-V ISA Dev, Shumpei Kawasaki, jcb6...@gmail.com
excellent post!

this needs to be sticky so everyone can find it. 

On Fri, Jun 8, 2018 at 10:09 PM ron minnich <rmin...@gmail.com> wrote:

as regards something like this, I've never seen a convincing argument re performance that we need to tag data with endian attributes. 

And I'm mentioned as one of the guys who pushed such a bad idea in the now-withdrawn https://standards.ieee.org/findstds/standard/1596.5-1993.html, so in my dark past, I even believed in this kind of thing. Oops.

We did a test a few years back and as of gcc 6, it's pretty smart about turning certain sequences of byte access into single word load/store.  

As regards most code that thinks it needs to be endian-aware, this particular note is useful:

I've found that Rob's note is correct far more often than not. 

i like it!

but it only discusses encoded data steams.

one case it didn’t discuss is how to access peripherals which have endian issues. eg, a 24b DAC with a control register (part of an IP block) using a different endian than the host cpu (a different IP block). these aren’t “data streams”, but require loads and stores to do the right thing.

guy

Madhu

unread,
Jun 9, 2018, 11:42:53 AM6/9/18
to Guy Lemieux, ron minnich, Akira Tsukamoto, Allen Baum, Luke Kenneth Casson Leighton, Oleg Endo, RISC-V ISA Dev, Shumpei Kawasaki, jcb6...@gmail.com
In some cases for these kinds of peripherals, it is simpler to add
some interface logic
to the IP block to do the conversion. We just converted an 80s control system
to RISC-V and had to do this for the peripherals.

Even in networking (especially PPC based), accelerators often do the
low level packet manipulation
and it is only control and exception packets that come to the core.
swap support will suffice for this.
Our team is probably the most affected by this since we have to covert
a whole host of
legacy systems to RISC-V but do not yet see any need for anything more
than swap instructions.

In general do not worship at the altar of legacy support !
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CALo5CZyMuOfqEGZAp%3DWdO9_PME7fOiju_eMgud1uhha023whgw%40mail.gmail.com.



--
Regards,
Madhu

Michael Clark

unread,
Jun 9, 2018, 11:48:09 AM6/9/18
to Andrew Waterman, Luke Kenneth Casson Leighton, Samuel Falvo II, Allen Baum, Jacob Bachmeyer, RISC-V ISA Dev, Shumpei Kawasaki


On 9/06/2018, at 9:54 PM, Andrew Waterman <and...@sifive.com> wrote:

On Sat, Jun 9, 2018 at 2:45 AM, Luke Kenneth Casson Leighton
<lk...@lkcl.net> wrote:
On Sat, Jun 9, 2018 at 10:30 AM, Andrew Waterman <and...@sifive.com> wrote:

My point is that there will be such a hit, no matter which approach is
taken.  There's effectively no room in RVC to encode new big-endian
loads and stores.  There's effectively no room in RVI to encode new
big-endian loads and stores with 12-bit offsets.  So, you're left
either wider instructions or two-instruction sequences.

there is another option: the conflict-resolution scheme.  it was
discussed a couple months back, and is "effectively" as if 32-bit (or
other sized) opcodes had been extended (by some hidden bits that are
set with a CSR).

using that scheme the actual meaning of existing opcodes may be
"redirected" to a completely different execution engine, *without*
impact on the pipeline speed or introducing extra latency [1], and,
crucially, allowing the processor to be switched back to "standard"
meanings very very quickly.

I agree that extending the opcode by a few bits will not materially
exacerbate decode latency.

But this issue isn't anywhere near important enough to merit such an
elaborate strategy.  Either Sam's or my/Tommy's solution is
sufficient.

We need a simple BSWAP and it’s quite important based on the amount of code required to swap a 32-bit or 64-bit word on RISC-V presently.

Load store instructions are a nice to have but the relative improvement on a BSWAP instruction is minor.

Compiler attributes are *incredibly* hard to implement. GCC has __attribute__(( scalar_storage(“big-endian”)) but there are all sorts of restrictions due to various complexities such as [what if I take the address of a pointer to a word that is of this endianness and pass it to a function that takes a pointer to a word]. Someone from Intel wrote about implementing bi-endian support in ICC on the LLVM mailing list and the conclusion was “don’t do it”.

Most of the use cases for “portable code” are covered by having a fast instruction for the built ins i.e. __builtin_bswap16, __builtin_bswap32 and __builtin_bswap64 [noting that bswap32 will be very frequent on RV64 in both crypto and network code]. The remainder is supported by idiomatic lifting of swap patterns. i.e. the compiler can detect open coded swaps and lift them into dedicated instructions: https://cx.rv8.io/g/ucLL1v

I would stress that I don’t think we should penalise 32-bit swaps on RV64. We already have some cases where the compiler doesn’t work well with 32-bit types such as int. I’m also still seeing lots of redundant sign extensions and missed shift coalescing opportunities (from sign or zero extension expansion that happen after shift coalescing passes) from the current GCC versions.

We also need __builtin_clz(ll), __builtin_ctz(ll), __builtin_popcount(ll) and rotates.

The rest of the more obscure bit manipulation stuff is quite powerful and very interesting but there is simply no large base of code that uses or will benefit from it. I’m fine with BSWAP being implemented as GREVI, but if I had to choose between that or losing CTZ, I’d favour CTZ simply because I can find a lot more code in the wild that actually uses CTZ and very little that uses GREVI (despite all of the theoretical uses it has). Bit reversal doesn’t show up in typical code also and ANDC is essentially trying to extend the Base ISA. i.e. is 2 instructions and could be macro-op fused.

ron minnich

unread,
Jun 9, 2018, 11:53:24 AM6/9/18
to Madhu, Guy Lemieux, Allen Baum, RISC-V ISA Dev
On Sat, Jun 9, 2018 at 8:42 AM Madhu <ma...@macaque.in> wrote:

In general do not worship at the altar of legacy support !


Yes. In way too many cases (the point I was trying to make with my note, driven from Rob's note) the endian issue is something people worry about that's almost always not worth worrying about, that can be easily addressed without modifying compilers and adding modes to CPUs.  I've seen (fixed) so much code just by removing all the broken attempts to deal with endianness, and the fix is almost always to make it NON-endian aware.

e.g., if you literally have this:
char *a;
uint32_t b;
b = a[0] | a[1]<<8 | a[2] << 16 | a[3] << 24;

because you are doing some kind of endian conversion, we've seen that the compilers are so smart now they'll turn that into a word load if the endianness allows it. I expect the compilers are reasonably smart if you give them the kind of instructions that Andrew mentioned. I'm just not convinced that we need to extend the compiler to allow endianness tags, and further add a bunch of extensions for bi-endianness.

It took me too long but I finally realized it by the end of the 90s.

So do people have the hard numbers, driven from measurement, that show this is a big problem? Or just seems to be a big problem?

ron

Samuel Falvo II

unread,
Jun 9, 2018, 12:34:17 PM6/9/18
to Gavin Stark, RISC-V ISA Dev, Andrew Waterman, Allen Baum, Jacob Bachmeyer, Shumpei Kawasaki
On Sat, Jun 9, 2018 at 4:39 AM, Gavin Stark <atthec...@gmail.com> wrote:
> For the data side I don't believe there is an endianness issue in the ISA,
> with one caveat: section 2.6 states ' RV32I provides a 32-bit user address
> space that is byte-addressed and little-endian.' Without this statement
> RISC-V would be biendian; it is an unnecessary restriction.

If I understood your message correctly, I think this turns out to be a
necessary restriction if you wish to reconcile how to lay out
instructions in a Von Neumann architecture machine. It is also a
requirement if you stipulate binary compatibility across RISC-V
implementations.

> So the statement in section 2.6 is relevant - but it is effectively a
> statement about the platform, not the ISA.

Here's a great reason why section 2.6 makes perfect sense for
belonging in the ISA. Observe that all units are naturally aligned,
and even type-safe (e.g., 32-bit words are *only* accessed with LW/SW,
etc.).

I can write a compiler that translates C into RISC-V assembly
language. C lacks any explicit keywords for specifying big- or
little-endian numbers; a long is a long, and an int is an int, etc.
These correspond to the natural representation of these in memory,
where natural is often taken to mean most run-time efficient, or put
another way, requiring the least amount of instructions to manipulate.
Naturally, that's exactly the kind of code my C compiler will produce.

This compiler can be made completely portable across big- and
little-endian RISC-V variants. And, provided software built from this
compiler runs *only* on the same platform on which the compiler itself
runs, you'll never notice any difference between a big- and
little-endian processor. For all intents and purposes, RISC-V is thus
a "portable" architecture.

The issue comes when I want to run my software (built on a
little-endian RISC-V) on your computer (a big-endian RISC-V
processor). In the best case scenario, the processor will throw an
illegal instruction trap on the very first instruction it looks at,
because my program's instruction layout will be different from what
your hardware expects. In the worst case scenario, my generated code
will have an instruction which just *happens* to form an
unintended-but-valid instruction encoding for your processor. This
possibility guarantees that even software emulation through repeated
illegal instruction traps is not a viable solution, and will lead to
wrong code being executed.

This violates the requirement that all RISC-V implementations support
the unprivileged instruction set corresponding to its XLEN. Ergo, a
firm decision on endianness *is* required to establish compatibility
guarantees between different implementations of the ISA, regardless of
specific platform the ISA is used with.

> * To add big-endian load/store instructions one has to be explicit about
> what this means - is it fully big-endian (128-bit bigendian, 64-bit
> big-endian, 32-bit big-endian, 16-bit big endian (!) etc). This scares me in
> the sense that it (as Andrew has said) is a lot of instructions to add. It

To be clear, you're almost certainly going to need multiple swap
instructions too, especially so as to minimize the overhead of fusion
logic in the instruction decoder; HSWAP to swap the lower bytes of a
halfword, WSWAP to swap the lowest four bytes of a 32-bit word, DSWAP
for the lowest 8-bytes of a 64-bit dword, and so forth.

Allen Baum

unread,
Jun 9, 2018, 12:57:29 PM6/9/18
to Samuel Falvo II, Gavin Stark, RISC-V ISA Dev, Andrew Waterman, Jacob Bachmeyer, Shumpei Kawasaki
The cases you outline are not the problematic ones. The nastiness is dealing with data outside the platform, e.g. network traffic that comes in a specific endian mode that your chip has no control over and isn’t aligned, or data structures that aren’t aligned.
It’s ugly legacy code and protocols- the kind that are deeply entrenched and you don’t get to modify.

-Allen

Samuel Falvo II

unread,
Jun 9, 2018, 1:48:56 PM6/9/18