On 19 Dec 2017, at 00:36, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
>
> If David Chisnall's earlier mesage is accurate, we very much *do* want programs to fault in RISC-V if they have misaligned synchronization variables -- x86 allows such operations _but_ _does_ _not_ _guarantee_ _that_ _misaligned_ _"atomics"_ _are_ _actually_ _atomic_ any more. This is a far more fertile source of bugs than simply immediately crashing a program that asks for something that the hardware cannot deliver.
I don’t know that this is the case with recent x86, but it certainly was around 5-6 years ago. Given that they now have hardware transactional memory, it’s entirely possible that they crack atomic RMW instructions into micro-ops that use the transactional hardware if they span multiple cache lines / pages (if they’re in the same page but different cache lines then it’s possible that they might be simply lock two cache lines in the exclusive state, though that adds some complexity to the cache coherency mechanism).
> Further, even if we allow that misaligned AMOs should be permitted, why should they make ordinary unaligned load/store atomic and therefore much more expensive? If you have a synchronization variable that you cannot ensure is aligned, use only AMOs to access it.
If other loads and stores are not (relaxed consistency) atomic then you are going to hit a lot of fun corner cases in trying to implement the atomic versions, because even doing a piecewise atomic compare and exchange can interact with the lack of atomicity in the other loads and stores.
As a colleague recently pointed out to me, it’s not very helpful to think of atomicity in the abstract, without defining what it is atomic with respect to. If an atomic increment is not atomic with respect to a non-atomic load, then this is problematic. There is a lot of C code that assumes that it can do non-atomic loads and stores of variables and get either the ‘before’ or ‘after’ versions[1], so being able to do an atomic increment but load a value from another thread that is neither the incremented or non-incremented value will cause confusion.
It is far better to simply trap if you can’t do this safely than to subtly break code. There are basically three options here:
- Impose very complex requirements on the hardware that will add cost for little benefit to software.
- Make a small amount of software fail in subtle and very difficult to debug ways.
- Make a small amount of software trap and fail with a useful error message.
My vote would be for the third one. Anyone who really wants to do atomic operations spanning cache lines should wait for the transactional memory extension and use that.
David
[1] Yes, this is undefined behaviour in C. If you can find one nontrivial C program that doesn’t rely on undefined behaviour and is not seL4, then I’ll accept this as a counter argument.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/7555A963-A12D-42E0-A36D-CA21AE7B48E9%40cl.cam.ac.uk.
It is far better to simply trap if you can’t do this safely than to subtly break code. There are basically three options here:
- Impose very complex requirements on the hardware that will add cost for little benefit to software.
- Make a small amount of software fail in subtle and very difficult to debug ways.
- Make a small amount of software trap and fail with a useful error message.Door #3 remains our proposal for the standard Unix-like platforms.
Jose Renau wrote:
> X86 supports misaligned and ARMv8 recently changed (8.4) the spec to
> support misaligned. Before 8.4, the spec said exception. The fact that
> they changed
> and that there are some apps around there is a strong hint that we
> should provide some way to support it because apps need it.
We had a way to support misaligned AMO: the misaligned AMO trap can be
delegated to the supervisor. No program truly needs misaligned AMOs --
synchronization variables can always be aligned by simply inserting
padding. Find a counterexample to that before saying "apps need it".
Lazy developers may *like* misaligned AMOs, but apps do not *need*
misaligned AMOs and the current wording removes an option for efficient
handling of ordinary unaligned load/store that many programs actually do
need for hot code paths that must process packed structures such as
network packet headers. Efficient hardware support for unaligned load
replaces a seven instruction sequence: adjust pointer, aligned load,
shift, mask, second aligned load, mask, OR. Trapping to the monitor is
much slower, but that gives an incentive to develop hardware that can
handle (most) unaligned accesses, while leaving the tough edge cases
(like spanning pages) for the monitor to handle.
Handling misaligned AMOs in hardware is *far* more complex than handling
unaligned (and non-atomic!) loads and stores. The current wording
*requires* *all* unaligned accesses to a location to trap if AMOs would
trap. This eliminates an entire class of useful implementations.
> The current solution says that hardware support is optional. The
> software solution can handle it, so not extra complexity if the CPU
> designer wants to avoid besides
> the exception handler which is practically the same hardware as the
> misaligned for not atomics.
The problem is that the current wording attempts to make unaligned
load/store atomic with respect to misaligned AMO. This results in
forbidding the previously encouraged behavior of implementing some
(fast) subset of unaligned load/store in hardware while leaving
misaligned AMOs to trap. Worse, this results in a significant
performance penalty for the monitor code path that handles unaligned
load/store, since it now must serialize access and contain a critical
section. The correct change to this is to restore allowing unaligned
load/store to be entirely non-atomic.
-- Jacob
-- Jacob
The problem for the ecosystem is what happens if one implementation supports unaligned atomic RWM operations in hardware at any location, whereas others only support them within a cache line. This, as with the optional TSO thing, will provide pressure on other implementers to support them at any granularity.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/ee035ffc-f11c-4ea5-aefc-c14564b09963%40groups.riscv.org.
If Thread 1 wrote instead of read, then there could possibly be a tear,thread2 reads old bytes3..0thread1 writes new bytes3..0thread2 writes newer bytes3..1, and old byte0. But the only legal combinations are newer3..1, new0 or new3..0But, can't that be fixed with a simple LR/SC combination?
If unaligned Ld/St is implemented in HW, but non-atomic, the only example i can come up with that can't be emulated is when unaligned St and AMOs overlap in both halves - because you need to reserve two addresses. I think the LRM proposal could solve this if it can be made to work.
(maybe even possible if two aligned stores coverlap both halves of an AMO?)
On Wed, Dec 20, 2017 at 7:38 PM Cesar Eduardo Barros <ces...@cesarb.eti.br> wrote:"If, for a given address and access width, a misaligned LR/SC or AMO
generates a misaligned address exception, then {\em all} loads, stores,
LRs/SCs, and AMOs using that address and access width must generate
misaligned address exceptions."
For PMAs that forbid misaligned AMOs altogether (which, for some platforms, could be for all addresses), these new constraints need not apply. I agree the spec does not permit this as written, but I also agree there’s no reason to forbid option B on platforms that have no need for misaligned AMOs.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5A3C515D.4050900%40gmail.com.
Any boundary for an 8-bit access (locked or otherwise).
16-bit boundary for locked word accesses.
32-bit boundary for locked doubleword accesses.
64-bit boundary for locked quadword accesses.
Locked operations are atomic with respect to all other memory operations and all externally visible events. Only instruction fetch and page table accesses can pass locked instructions. Locked instructions can be used to synchro- nize data written by one processor and read by another processor.
For the P6 family processors, locked operations serialize all outstanding load and store operations (that is, wait for them to complete). This rule is also true for the Pentium 4 and Intel Xeon processors, with one exception. Load operations that reference weakly ordered memory types (such as the WC memory type) may not be serialized.
Locked instructions should not be used to ensure that data written can be fetched as instructions. "
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5A3C515D.4050900%40gmail.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/BE4E806C-EC12-4EB6-A17E-5874F630D622%40mac.com.
--
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/J1udFtmPEwI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/716223B2-C9DA-4887-941C-AFF6567B6447%40mac.com.
> On 22/12/2017, at 9:26 PM, Jonas Oberhauser <s9jo...@gmail.com> wrote:
>
> Not compiling with LOCK here may be a compiler bug.
> This is what I get with clang:
>
> mov qword ptr [rbp - 24], 1
> mov rcx, qword ptr [rbp - 24]
> lock
> xadd qword ptr [rbp - 15], rcx
> mov qword ptr [rbp - 32], rcx
This is what I get from clang, for the relaxed atomics, which in the case of misalignment is ub on x86 (“Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors”) hence the compiler warnings:
> In any case the question is not just "what would we do" but whether there is existing code out there that a RISCV user might want to run which needs emulation of misaligned atomics.
> As far as I understand, the answer to that question is (sadly) yes.
Do you have evidence for this? I find this hard to believe, and if so the buggy code should be fixed. Any pointers to real code with this property?
Trapping seems to be the most sensible thing to do.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/emf9089346-a5f5-41cc-bbaf-f9b7e3b470cf%40desktop-gfmadk4.
2) The PMA forbids [all] AMOs; they raise store access
exceptions. Misaligned loads & stores to this PMA are supported, and
can either be executed non-atomically in HW, or can raise misalignment
exceptions and be emulated non-atomically.
Option 2 is more or less the status quo.