Counter regs clarifications in User-Level ISA spec

339 views
Skip to first unread message

Clifford Wolf

unread,
Jan 21, 2017, 6:29:51 AM1/21/17
to RISC-V ISA Dev
Hi,

Volume I of the ISA manual handles the User-Level ISA, according to the
title of the document. Usually in ISAs the modes with higher privileges
support a super-set of the user-mode functionality. So a reader with some
experience with ISAs would probably assume that everything in that document
would cover functionality that is available on all RISC-V implementations
in all privilege modes.

However, this is not true! At least the RDCYCLE[H], RDTIME[H], and
RDINSTRET[H] instructions (CYCLE[H], TIME[H], and INSTRET[H] CSRs) are not
available in an M-mode-only system, as I learned today on twitter [1].

I think it would be worth mentioning in sec 2.8 Control and Status Register
Instructions that those are User-Mode CSRs that might not be available on
machines that do not actually implement a user-mode.

Sure, M-mode-only systems provide the mcycle, etc. CSRs instead. But
someone learning about RISC-V by starting reading on page 1 of Volume I of
the ISA manual might be mislead at this point. Especially considering the
strong wording of this Commentary:

--snip--
We mandate these basic counters be provided in all implementations as they
are essential for basic performance analysis, adaptive and dynamic
optimization, and to allow an application to work with real-time streams. [...]
--snap--

Also: Afair the v2.0 User-Level ISA spec did not mention CSRs at all. In
that document those were simply listed as RD... instructions. The v2.1 ISA
spec now lists all CSR.. instructions, but only mandates the timer and
counter CSRs. However, at least for me it is unclear from the wording if
other instruction patterns than the CSRR pseudo-instruction (CSRRS with x0
as rs1) must be supported by a compatible core.

The spec seems to mandate the CSRs, not just the read instructions. But
that would mean that v2.1 and v2.0 of the RV32I and RV63I ISA are
incompatible, as v2.0 only mandated the read (pseudo-)instructions.

regards,
- clifford

[1] https://twitter.com/SiFiveInc/status/822655480149508096

--
"Perfection [in design] is achieved not when there is nothing left to
add, but rather when there is nothing left to take away."
- Antoine de Saint-Exupery

Alex Bradbury

unread,
Jan 21, 2017, 8:07:29 AM1/21/17
to Clifford Wolf, RISC-V ISA Dev
On 21 January 2017 at 11:29, Clifford Wolf <clif...@clifford.at> wrote:
> Hi,
>
> Volume I of the ISA manual handles the User-Level ISA, according to the
> title of the document. Usually in ISAs the modes with higher privileges
> support a super-set of the user-mode functionality. So a reader with some
> experience with ISAs would probably assume that everything in that document
> would cover functionality that is available on all RISC-V implementations
> in all privilege modes.
>
> However, this is not true! At least the RDCYCLE[H], RDTIME[H], and
> RDINSTRET[H] instructions (CYCLE[H], TIME[H], and INSTRET[H] CSRs) are not
> available in an M-mode-only system, as I learned today on twitter [1].
>
> I think it would be worth mentioning in sec 2.8 Control and Status Register
> Instructions that those are User-Mode CSRs that might not be available on
> machines that do not actually implement a user-mode.
>
> Sure, M-mode-only systems provide the mcycle, etc. CSRs instead. But
> someone learning about RISC-V by starting reading on page 1 of Volume I of
> the ISA manual might be mislead at this point.

For what it's worth, as someone who has spent a lot of time with the
RISC-V user spec this is surprising to me also. The RV32E section adds
to the impression that these rdcycle[h], rdtime[h], rdinstret[h]
instructions are mandatory in the phrasing it uses to contrast RV32E
and RV32I:
"A further simplification is that the counter instructions
(rdcycle[h], rdtime[h], rdinstret[h]) are no longer mandatory."

The commentary immediately following that statement further adds to
the impression these instructions are mandatory:
"The mandatory counters require additional registers and logic, and
can be replaced with more application-specific facilities."

If a given "RV32I" implementation may choose to not provide these
counters, perhaps the ISA subset naming conventions in chapter 11
needs to be expanded. Even the verbose RV32I2p0M2p0A2p0F2p0D2p0
description doesn't capture whether code including rdcycle can be
expected to run or not.

Best,

Alex

Bruce Hoult

unread,
Jan 21, 2017, 8:35:03 AM1/21/17
to Clifford Wolf, RISC-V ISA Dev
Whaaaaaaat?

Given that the transistors have in fact been spent to provide counters, the U instructions should at least be aliased to the M ones i there are no separate U counters.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/20170121112949.GA8596%40clifford.at.

Bruce Hoult

unread,
Jan 21, 2017, 8:40:22 AM1/21/17
to Clifford Wolf, RISC-V ISA Dev
If I was designing it, I'd think I'd make the common case of "get counter for the current mode" be always the same instruction, and if you specifically wanted a counter for a less privileged mode than the current mode then you'd ask for that in a special way.


To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

Alex Bradbury

unread,
Jan 21, 2017, 9:20:17 AM1/21/17
to Clifford Wolf, RISC-V ISA Dev
On 21 January 2017 at 11:29, Clifford Wolf <clif...@clifford.at> wrote:
> Hi,
>
> Volume I of the ISA manual handles the User-Level ISA, according to the
> title of the document. Usually in ISAs the modes with higher privileges
> support a super-set of the user-mode functionality. So a reader with some
> experience with ISAs would probably assume that everything in that document
> would cover functionality that is available on all RISC-V implementations
> in all privilege modes.
>
> However, this is not true! At least the RDCYCLE[H], RDTIME[H], and
> RDINSTRET[H] instructions (CYCLE[H], TIME[H], and INSTRET[H] CSRs) are not
> available in an M-mode-only system, as I learned today on twitter [1].
>
> I think it would be worth mentioning in sec 2.8 Control and Status Register
> Instructions that those are User-Mode CSRs that might not be available on
> machines that do not actually implement a user-mode.
>
> Sure, M-mode-only systems provide the mcycle, etc. CSRs instead. But
> someone learning about RISC-V by starting reading on page 1 of Volume I of
> the ISA manual might be mislead at this point

Thinking more about it, if a core doesn't implement rdcycle but can
transparently trap to a handler that provides the functionality via
reading mcycle then the RV32I 'contract' would be upheld. This is
similar to how a core can implement RV32IM, but trap to a handler to
support divide. One thing that isn't clear to me - would a core
claiming to be "RV32IM" have to provide such a handler in the ROM? If
it didn't, you would lose code compatibility without recompiling or at
least relinking.

Best,

Alex

Stefan O'Rear

unread,
Jan 21, 2017, 2:13:50 PM1/21/17
to Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
On Sat, Jan 21, 2017 at 5:40 AM, Bruce Hoult <br...@hoult.org> wrote:
> If I was designing it, I'd think I'd make the common case of "get counter
> for the current mode" be always the same instruction, and if you
> specifically wanted a counter for a less privileged mode than the current
> mode then you'd ask for that in a special way.

(1) There is no such thing as a "counter for a less privileged mode".
There is only one MCYCLE register, and it's encoded as if it were a
M-mode CSR but it's actually visible from multiple privilege levels
(always in M-mode, possibly also in S-mode or U-mode depending on the
value of MUCOUNTEREN and MSCOUNTEREN). (rocket-chip appears to
provide cycle as a read-only alias of mcycle, but only if usingUser is
set?)

(2) Alex, Clifford: there seems to be an expectation that M-mode is
almost, but not quite, a superset of U-mode. I know of one other
place on the Berkeley cores where supersetting fails: U-mode spec says
that misaligned memory access is allowed, but it actually traps to
M-mode on rocket-chip, so if your code is already in M-mode and can't
handle a horizontal trap you need to avoid misaligned accesses.

The mcycle/cycle thing appears to be another place where M-mode is not
quite a superset; software-managed TLBs (with trap to M-mode for
refill) have also been discussed, but AFAIK implemented by no-one.

-s

Michael Clark

unread,
Jan 21, 2017, 3:00:07 PM1/21/17
to Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
Yes.

Implementation M-mode can be a subset of specification M-mode if there is firmware with an illegal instruction trap handler that can emulate the missing M-mode pieces.

This reminds us of the hypothetical M-mode and U-mode implementations that relies on:

- the SYSTEM opcode trapping to a hardwired vector, say 0x1000
- trap code has access to some hidden scratch memory (for CSR emulation)
- trap code has access to an MMIO region with counters and timer compare register
- ability to switch off access to the hidden scratch and MMIO region during a trap.

All of the CSRs could be implemented in software (including mtvec) with a trapping SYSTEM opcode, assuming there is an MMIO aperture that contains these:

- mtime
- mcycle
- minstret
- mtimecmp

There is no technical need for a hardware mtvec register other than trap performance. SYSTEM trapping to 0x1000 and some careful M-mode firmware is enough.

Michael.

Michael Clark

unread,
Jan 21, 2017, 3:08:15 PM1/21/17
to Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
A few more details like the ability to re-enable 1 global interrupt enable flag.

Emulating MRET in software is tricky unless there is a delay of n cycles before interrupts become re-enabled.

Andrew Waterman

unread,
Jan 21, 2017, 4:42:57 PM1/21/17
to Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
Right. It can't be turtles all the way down, and M-mode is where
missing hardware features are emulated, so the missing features are
rightly unavailable there.

I understand the lack of U-mode counters in M-mode is, at the very
least, surprising. The next draft of the privilege spec will not be
mum on this point.

>
> -s
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CADJ6UvPjbYzQB0HJTt2dws5w4uvOqUXKbncjOp7htrcO1DRhBg%40mail.gmail.com.

Alex Bradbury

unread,
Jan 22, 2017, 7:55:17 AM1/22/17
to Andrew Waterman, Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
For an M-mode only RISC-V implementation, why would you not have the
CSR addressed by rdcycle (as defined in the base RV32I ISA) alias the
mcycle counter? If that's not desirable, I'd say there's an argument
that rdmcycle should be in RV32I (possibly instead of the current
rdcycle).

Alex

Andrew Waterman

unread,
Jan 22, 2017, 8:37:14 AM1/22/17
to Alex Bradbury, Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
Some M-mode implementations might choose to alias cycle to mcycle and
instret to minstret. That likely won't be the case for time/mtime,
which may not be backed by a CSR at all.

In any case, M-mode software must be written with knowledge of the
underlying hardware platform, and generally won't be portable, so this
is a very minor quibble.

Alex Bradbury

unread,
Jan 23, 2017, 9:23:07 AM1/23/17
to Andrew Waterman, Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
On 22 January 2017 at 13:36, Andrew Waterman <and...@sifive.com> wrote:
> Some M-mode implementations might choose to alias cycle to mcycle and
> instret to minstret. That likely won't be the case for time/mtime,
> which may not be backed by a CSR at all.
>
> In any case, M-mode software must be written with knowledge of the
> underlying hardware platform, and generally won't be portable, so this
> is a very minor quibble.

Thanks Andrew, I've started a new thread to try to further clarify
this issue of what a developer or compiler can expect from an
M-mode-only implementation
<https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/I3og57V1aHg/WVSjLVOzAQAJ>.
Surely you don't mean that an author of code that might run in M-mode
(e.g. on a RISC-V microcontroller) must be written with the knowledge
of which instructions are handled by trapping on a certain
implementation (and indeed, whether misaligned loads/stores trap)?

Best,

Alex

Andrew Waterman

unread,
Jan 23, 2017, 3:35:45 PM1/23/17
to Alex Bradbury, Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
Unavoidably, the authors of the support code that emulates missing
instructions must know which features are implemented without
trapping. Likewise, the authors of any trap handlers that don't spill
enough of the context to recover from an exception must know.

The authors of application code running in M-mode can be ignorant of
this information, if they are willing to pay for the support code to
be linked in. But some people will want to run M-mode code without
linking in the support library (e.g., to reduce code size, or reduce
the size of the trusted code base), in which case they also must be
aware of what the hardware actually supports. I think we want to
cater to both use cases.

We need to have some way of describing what features the hardware
actually supports. Perhaps the Foundation should create the notion of
implementation profiles, which are standard sets of features that
M-mode programmers can expect to exist. These can form compiler
targets, so C programmers need not know the details.

>
> Best,
>
> Alex

Alex Bradbury

unread,
Jan 23, 2017, 4:49:47 PM1/23/17
to Andrew Waterman, Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
On 23 January 2017 at 20:35, Andrew Waterman <and...@sifive.com> wrote:
> On Mon, Jan 23, 2017 at 6:23 AM, Alex Bradbury <a...@asbradbury.org> wrote:
>> On 22 January 2017 at 13:36, Andrew Waterman <and...@sifive.com> wrote:
>>> Some M-mode implementations might choose to alias cycle to mcycle and
>>> instret to minstret. That likely won't be the case for time/mtime,
>>> which may not be backed by a CSR at all.
>>>
>>> In any case, M-mode software must be written with knowledge of the
>>> underlying hardware platform, and generally won't be portable, so this
>>> is a very minor quibble.
>>
>> Thanks Andrew, I've started a new thread to try to further clarify
>> this issue of what a developer or compiler can expect from an
>> M-mode-only implementation
>> <https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/I3og57V1aHg/WVSjLVOzAQAJ>.
>> Surely you don't mean that an author of code that might run in M-mode
>> (e.g. on a RISC-V microcontroller) must be written with the knowledge
>> of which instructions are handled by trapping on a certain
>> implementation (and indeed, whether misaligned loads/stores trap)?
>
> Unavoidably, the authors of the support code that emulates missing
> instructions must know which features are implemented without
> trapping. Likewise, the authors of any trap handlers that don't spill
> enough of the context to recover from an exception must know.

Fully agreed, trap handlers and the like must be written with
awareness of core implementation details such as which instructions
might cause another trap.

> The authors of application code running in M-mode can be ignorant of
> this information, if they are willing to pay for the support code to
> be linked in. But some people will want to run M-mode code without
> linking in the support library (e.g., to reduce code size, or reduce
> the size of the trusted code base), in which case they also must be
> aware of what the hardware actually supports. I think we want to
> cater to both use cases.

This is obviously something people people can do as a last resort to
try to save some binary size, or on a deeply embedded system where the
implementer willing to pay the cost of moving away from being
'standard' RISC-V. However I'd be really concerned about this becoming
the norm for standalone M-mode-only RISC-V microcontrollers.

If I author, for instance, an MP3 decoder library in a mix of C and
RISC-V assembly then surely I should be confident that same library
will work without source modification across anything that claims to
support a sufficient RISC-V ISA subset. This means ignoring
performance I shouldn't have to care if it gets compiled for an
M-mode-only system, if some instructions are handled by trapping, or
if the library might produce misaligned memory accesses. The
specification seems quite clear that to be RISC-V, misaligned access
must be supported "The base ISA supports misaligned accesses, but
these might run extremely slowly depending on the implementation."
(p18 https://content.riscv.org/wp-content/uploads/2016/06/riscv-spec-v2.1.pdf).
If a core+SDK can't handle misaligned accesses, it loses compatibility
with the rest of the RISC-V ecosystem and I don't see how it can
reasonably be described as being RISC-V compliant.

> We need to have some way of describing what features the hardware
> actually supports. Perhaps the Foundation should create the notion of
> implementation profiles, which are standard sets of features that
> M-mode programmers can expect to exist. These can form compiler
> targets, so C programmers need not know the details.

Having a standard way of describing what does/doesn't trap could
certainly be helpful both for compiler code generation and for people
trying to write efficient assembly routines for M-mode targets.
Echoing my concern above: defining profiles that don't provide minimal
traps for misaligned memory accesses and unimplemented instructions
risks fragmenting the ecosystem by providing additional subsets to the
"base" RV32I/E functionality. Of course it should be possible for
people to do this just as they can pursue other non-standard
extensions or modifications, but calling these subsets "RISC-V" would
rather destroy the idea of RV32I/E providing a common minimal
baseline.

Best,

Alex

Michael Clark

unread,
Jan 23, 2017, 9:08:56 PM1/23/17
to Alex Bradbury, Andrew Waterman, Stefan O'Rear, Bruce Hoult, Clifford Wolf, RISC-V ISA Dev
Yes. Any Operating System level code that reads processor control and status registers such as cycle counters needs to be careful. I noticed that the architectural performance counters on ARM are not even accessible on iOS, and may in fact trap, so one has to use a library routine to get access to high resolution time and a developer may not even get access to the cycle counter. This would be the equivalent of rdcycle being disabled for user mode and requiring the use of an OS interface. There are similar issues on x86 that would make it more convenient to call an OS function like clock_gettime, mach_absolute_time or QueryPerformanceCounter, such as the presence of RDTSC or the newer RDTSCP depending on the version of silicon, the frequency of the core, and the characteristic of the cycle counter rate (dynamic clock speed adjustments). These low level primitives are Operating System level primitives so one would expect errata, workarounds and multiple code paths depending on processor model and extension.
Alignment is particularly interesting.

Most low-level code still works on the assumption of aligned accesses even if the host platform supports misaligned accesses /for most instructions/.

While x86 supports misaligned accesses in the Base ISA, it seems almost all of the tooling produces aligned accesses by default, likely due to the performance penalty for misaligned accesses. One needs to use special structure packing attributes to produce code that will create misaligned accesses, and it is relatively rare to do so. Codegen for aligned accesses is the norm on x86. I recently discovered that Clang on macOS mandates 16-byte alignment from the host allocator due to the compiler emitting SSE4.2 instructions that will trap on misaligned addresses, and this is by default in a present-day OS. I only discovered this because I was using a custom memory allocator that had an 8-byte alignment quantum. I was using a custom allocator for a RISC-V interpreter so I could move the host heap into high memory for a shared address space model (vs soft MMU model). In any case it turned out I couldn’t use the library due to deep assumptions regarding 8-byte alignment so I switched to another malloc implementation which could be configured for 16-byte alignment. Chip that supports misaligned accesses, modern OS, misaligned access traps.

So even on platforms with ISAs that support misaligned accesses, it is a). common that all accesses are typically aligned b). platforms may trap and not emulate some misaligned accesses due to more widespread use of optimised instructions that require aligned memory.

We need to have some way of describing what features the hardware
actually supports.  Perhaps the Foundation should create the notion of
implementation profiles, which are standard sets of features that
M-mode programmers can expect to exist.  These can form compiler
targets, so C programmers need not know the details.

Having a standard way of describing what does/doesn't trap could
certainly be helpful both for compiler code generation and for people
trying to write efficient assembly routines for M-mode targets.
Echoing my concern above: defining profiles that don't provide minimal
traps for misaligned memory accesses and unimplemented instructions
risks fragmenting the ecosystem by providing additional subsets to the
"base" RV32I/E functionality. Of course it should be possible for
people to do this just as they can pursue other non-standard
extensions or modifications, but calling these subsets "RISC-V" would
rather destroy the idea of RV32I/E providing a common minimal
baseline.

Best,

Alex

-- 
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Bruce Hoult

unread,
Jan 24, 2017, 7:22:32 AM1/24/17
to Michael Clark, Alex Bradbury, Andrew Waterman, Stefan O'Rear, Clifford Wolf, RISC-V ISA Dev
16 byte alignment for malloc()'d data and also for the stack pointer is mandated just about everywhere these days, at least on 64 bit CPUs, but also on some 32 bit CPUs.

glibc returns 16 byte aligned pointers on every 64 bit system.

Aarch64 requires 16 byte SP alignment at any point at which memory is accessed via SP. RISC-V requires 16 byte SP alignment. PowerPC requires 16 byte SP alignment on both OS X and Linux.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Alex Bradbury

unread,
Jan 24, 2017, 7:31:32 AM1/24/17
to Bruce Hoult, Michael Clark, Andrew Waterman, Stefan O'Rear, Clifford Wolf, RISC-V ISA Dev
On 24 January 2017 at 12:22, Bruce Hoult <br...@hoult.org> wrote:
> 16 byte alignment for malloc()'d data and also for the stack pointer is
> mandated just about everywhere these days, at least on 64 bit CPUs, but also
> on some 32 bit CPUs.
>
> glibc returns 16 byte aligned pointers on every 64 bit system.
>
> Aarch64 requires 16 byte SP alignment at any point at which memory is
> accessed via SP. RISC-V requires 16 byte SP alignment. PowerPC requires 16
> byte SP alignment on both OS X and Linux.

These would definitely support an argument that for RISC-V to disallow
unaligned memory accesses, but that's not the choice that was made in
what is now meant to be a 'stable' specification. To change it now
would mean a break in compatibility. Unless the versioning policy
suggested in the spec (see section 11.4), this would require
incrementing the major version number to produce a v3.0 spec.

Best,

Alex

Alex Bradbury

unread,
Jan 24, 2017, 7:33:12 AM1/24/17
to Bruce Hoult, Michael Clark, Andrew Waterman, Stefan O'Rear, Clifford Wolf, RISC-V ISA Dev
Sorry, I didn't finish the final sentence properly:

Unless the versioning policy suggested in the spec (see section 11.4)
was also dropped, this would require incrementing the major version

Stefan O'Rear

unread,
Jan 24, 2017, 2:03:21 PM1/24/17
to Alex Bradbury, Bruce Hoult, Michael Clark, Andrew Waterman, Clifford Wolf, RISC-V ISA Dev
Relevant thread from earlier, in which gcc does not assume unaligned
access is permitted:

https://groups.google.com/a/groups.riscv.org/d/msg/sw-dev/gRPLXGWNt2A/19CoEmafHwAJ

-s

Michael Clark

unread,
Jan 24, 2017, 6:38:08 PM1/24/17
to Stefan O'Rear, Alex Bradbury, Bruce Hoult, Andrew Waterman, Clifford Wolf, RISC-V ISA Dev
I think it’s wise to keep the compilers generating code that assumes strict alignment even if the platform allows (slow) unaligned accesses. I don’t know of any platforms today that don’t follow strict structure padding alignment rules unless __attribute__((packed)) is used.

I would assume SLOW_UNALIGNED_ACCESS essentially defaults to generating code that follows strict alignment rules but allows __attribute__((packed)) to override default (strict) structure alignment. I read that gcc will prevent pointers to members of __attribute__((packed)) structures on systems where STRICT_ALIGNMENT is set, as dereferencing them will cause a bus error.


There seems to be nuances on platforms that allow unaligned accesses in their Base ISA but have configured compilers and allocators to assume strict alignment and now use more modern instruction subsets that assume strict alignment e.g. MOVDQA, MOVAPD, MOVAPS on x86_64/SSE Clang on macOS.

Even on hardware that allows misaligned accesses there are penalties. i.e. extra pressure on the L1/L2 cache for unaligned words crossing cache line boundaries, TLB boundaries, etc (causing double latency on loads for L1 cache miss), and likely even lower at the memory bus level so SLOW_UNALIGNED_ACCESS seems to have two magnitudes, as trap and emulate for misaligned accesses is going to be an order of magnitude slower than support for slow misaligned accesses in the memory bus.

It seems we could document that the ABI assumes strict alignment, however I guess this is really a compliance issue for the Base ISA. i.e. if misaligned support is specified then there needs to be test cases that test that misaligned loads and stores succeed.

I would suspect that some riscv (versus RISC-V) systems might fail.
Reply all
Reply to author
Forward
0 new messages