On 17 Jun 2017, at 7:01 PM, Michael Clark <michae...@mac.com> wrote:A load store unit in a single issue CPU is going to take 8 sequential 64-bit stores to zero a 64-bit cache line, but I suspect a cache could mux or broadcast a zero line in one cache transaction, e.g. if the width between the L1 and L2 or the memory system is cache line width.
On 19 Jun 2017, at 2:58 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:The catch is that PMAs are supposed to be baked into hardware, not configuration items, and are unrelated to paging in RISC-V. (PMAs are hardware characteristics; PMP is controlled by M-mode; paging is controlled by S-mode.) Nonetheless, cache policies are not specified in the RISC-V ISA, so instructions must avoid depending on them and if PREZERO is only useful with writeback caches, it needs to be rethought.
The cache-control instructions we are thinking about are summarized in the following table:
VA based |
|
icache |
dcache |
invalidate (+ unlock) |
invalidate (+ unlock) |
|
writeback |
|
writeback & invalidate (+ unlock) |
lock |
lock |
unlock |
unlock |
The PIN or lock operation should have a return status to indicate its success, so that this operation will not lock out all ways of a multi-way cache. And a simple implementation can decide not to support cache locking and always return “fail” for the lock operation.
Guy’s following comment is a good idea for the “INVALIDATE” operation.
“Of the above, only INVALIDATE is a "destructive" operation. The range
specifier will be precise, so rounding the start or ending addresses
to align with cache line boundaries will be benign for FLUSH and
WRITEBACK, but it will have dire consequences with INVALIDATE. The
easy thing to do is to writeback the first cache line and the last
cache line (if they are dirty) before invalidating them, ie behave
like FLUSH on those to cache lines but behave like INVALIDATE in the
middle.”
The PREZERO and PREFETCH operations are not our current focus. Can they be in a separate extension or be optional in the same extension?
-- Chuanhua
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/3ee76b4b-8cf8-45a2-ba26-44a3bef1b10e%40groups.riscv.org.
Thanks for the suggestion Chuanhua.What does "VA based" mean? Virtual Address based?You have 1 operation for icache (invalidate) and 3 operations for dcache (invalidate, writeback, writeback+invalidate).These correspond to ones I have been advocating, except with the name writeback+invalidate = flush.The instructions advocated by Jacob and myself are all range-based. Typical ISAs operate on a single cache line, and require software to know the cache line size, which leads to software bugs (infamously reported earlier, a big.LITTLE ARM system had different cache line sizes in the cores and software forgot to check when it migrated between cores leading to a bug).
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAMU%2BEkxtjF5LbwibFbp2K4gV504uZGBnEHRjZSV7GN2hkNqzGA%40mail.gmail.com.
On 19 Jul 2017, at 10:34 AM, Bruce Hoult <br...@hoult.org> wrote:I'm not in favour of implicit looping solutions either i.e. "invalidate everything between lo and hi". It seems far better to me to have an instruction that only promises to invalidate (for example) *something* starting at lo, and hi, and returns how much work it did (e.g. an updated value for lo).
You should be able to implement it as a move from the “high” source register to the destination register, I believe. Hopefully not that much of a burden? That said, we may be past the point where this could go in the base ISA, just because it can’t be implemented as a NOP (I don’t recall how RISC-V deals with undefined opcodes, if it’s always a trap it might be ok…).
Thanks,
Alex
From: Sean Halle [mailto:sean...@gmail.com]
Sent: Tuesday, July 18, 2017 3:56 PM
To: Bruce Hoult <br...@hoult.org>
Cc: Guy Lemieux <glem...@vectorblox.com>; chuanhua.chang <chuanhu...@gmail.com>; RISC-V ISA Dev <isa...@groups.riscv.org>; Jacob Bachmeyer <jcb6...@gmail.com>
Subject: Re: [isa-dev] Proposal: Explicit cache-control instructions (draft 2 after feedback)
Hi, I have been following this a bit.. it looks like you're making progress. I was wondering what is your thinking about base ISA vs extension? If they are all in the base ISA and the compiler targets them, then.. if we implement them all as nop, we will end up with binaries out there that fail, yes? What is the thinking around scenarios like that?
Thanks,
Sean
On Tue, Jul 18, 2017 at 3:34 PM, Bruce Hoult <br...@hoult.org> wrote:
On Tue, Jul 18, 2017 at 9:01 PM, Guy Lemieux <glem...@vectorblox.com> wrote:
Thanks for the suggestion Chuanhua.
What does "VA based" mean? Virtual Address based?
You have 1 operation for icache (invalidate) and 3 operations for dcache (invalidate, writeback, writeback+invalidate).
These correspond to ones I have been advocating, except with the name writeback+invalidate = flush.
The instructions advocated by Jacob and myself are all range-based. Typical ISAs operate on a single cache line, and require software to know the cache line size, which leads to software bugs (infamously reported earlier, a big.LITTLE ARM system had different cache line sizes in the cores and software forgot to check when it migrated between cores leading to a bug).
It's not a question of forgetting to check. You can't realistically check! An app doesn't get any notification of when it is migrated from one core to another. Even if the app polls for the current core's cache line size immediately before using the cache control instruction there is still a possibility of being migrated between any two instructions.
I'm not in favour of implicit looping solutions either i.e. "invalidate everything between lo and hi". It seems far better to me to have an instruction that only promises to invalidate (for example) *something* starting at lo, and hi, and returns how much work it did (e.g. an updated value for lo).
It is then software's responsibility to check if lo is still lower than hi, and loop to do more work if so.
Some implementations might choose to do everything in one instruction (perhaps only if not interrupted), but I'd expect many or most to only do one cache line at a time.
The important point is that whatever CPU runs the cache invalidate instruction at that moment knows its own cache line size, and thus updates lo appropriately. So it doesn't matter if the process gets migrated in the middle of the loop
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAMU%2BEkxtjF5LbwibFbp2K4gV504uZGBnEHRjZSV7GN2hkNqzGA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at
https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAJ4GwDKHZKppmoQsY-A9DzAkouJK42dCnKm0b2zpgL9dbjsi%2BQ%40mail.gmail.com.
It's not a question of forgetting to check. You can't realistically check! An app doesn't get any notification of when it is migrated from one core to another. Even if the app polls for the current core's cache line size immediately before using the cache control instruction there is still a possibility of being migrated between any two instructions.I'm not in favour of implicit looping solutions either i.e. "invalidate everything between lo and hi". It seems far better to me to have an instruction that only promises to invalidate (for example) *something* starting at lo, and hi, and returns how much work it did (e.g. an updated value for lo).It is then software's responsibility to check if lo is still lower than hi, and loop to do more work if so.Some implementations might choose to do everything in one instruction (perhaps only if not interrupted), but I'd expect many or most to only do one cache line at a time.The important point is that whatever CPU runs the cache invalidate instruction at that moment knows its own cache line size, and thus updates lo appropriately. So it doesn't matter if the process gets migrated in the middle of the loop
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAMU%2BEkxtjF5LbwibFbp2K4gV504uZGBnEHRjZSV7GN2hkNqzGA%40mail.gmail.com.
On 19 Jul 2017, at 5:34 PM, Allen J. Baum <allen...@esperantotech.com> wrote:At 7:15 PM -0500 7/18/17, Jacob Bachmeyer wrote:.....
Instruction cacheline pinning is proposed to address an issue that was raised with the HiFive board that I suspect other implementations may also have, where writes to flash preclude concurrently running from flash, but there is an instruction cache available that will make this work, if the flash-write code is cached before the process starts.
As a data point, Intel processors will load chunks of code into cache and execute there because at boot they would otherwise be looping on (very) slow serial EProm, and boot would take forever. In that particular case, I can’t recall if they actually lock anything or are just very, very careful to make sure the code can't mss (by knowing cache size and wayness, and allocating addresses carefully)
In contrast, wouldn't a TCM/scratchpad RAM almost always solve all of these problems? It can be done on a per-system basis (where needed) without changing the ISA, and without forcing all implementations to carry the baggage of the extra instruction decode, cache coherence state, etc. One thing that may be desired is an easy way to query a system whether it contains a TCM/scratchpad, but this should be done at the OS level, eg in a device tree or such.
For a small code, a scratchpad can be a very significant amount of area.
I don't see why it should be slower; it basically saying that if there is an eviction, choose someone else. That's off the critical path. Muxing in scratchpad data may actually slow down access to the cache, however, since you've just added a mux and a bunch of logic that has to turn off cache accesses (and if you can't do that fast enough, you haven't saved any power)
--
**************************************************
* Allen Baum tel. (908)BIT-BAUM *
* 248-2286 *
**************************************************
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/p06240816d5949e22f86c%40%5B192.168.1.50%5D.
What does "VA based" mean? Virtual Address based?
Also, I have not taken care to distinguish between icache, dcache, or both targets. Presumably, invalidate on any address in the dcache would also need to invalidate the icache. Likewise for flush (writeback+invalidate), which does does an invalidate on the icache. Writeback would only operate on dcache. Do you see a reason there must be explicit targets (icache, dcache, or even both) in the ISA and not let hardware manage this implicitly?
Finally, your Lock proposal is similar to Jacob's Pin. Generally, however, I don't see why locking is necessary or attractive when a TCM/scratchpad is superior in most cases. To add locking/pinning of cache lines, someone has to think through all of the cases of processors with coherent caches, non-coherent caches, multithreaded (sharing), VMs/hypervisors, etc. Also, FPGAs almost always use direct-mapped caches, in which case both obeying the hint or ignoring it may have negative performance consequences, so it becomes difficult to decide which to do. Locking almost always has a negative performance consequences which must be guarded against, and almost always use more power than a TCM/scratchpad.In contrast, wouldn't a TCM/scratchpad RAM almost always solve all of these problems? It can be done on a per-system basis (where needed) without changing the ISA, and without forcing all implementations to carry the baggage of the extra instruction decode, cache coherence state, etc. One thing that may be desired is an easy way to query a system whether it contains a TCM/scratchpad, but this should be done at the OS level, eg in a device tree or such.
Thanks Allen to offer this answer:
“For a small code, a scratchpad can be a very significant amount of area. I don't see why it should be slower; it basically saying that if there is an eviction, choose someone else. That's off the critical path. Muxing in scratchpad data may actually slow down access to the cache, however, since you've just added a mux and a bunch of logic that has to turn off cache accesses (and if you can't do that fast enough, you haven't saved any power)”
To have a competitive commercial product, our customers use different approaches for their designs. Not everyone wants to have both cache and TCM on their chip. In some use cases, the performance of locked code is more important than the non-locked code. The cases vary a lot. Having a cache locking instruction is a good tool for our customers to tune performance. Sure, locking too much will definitely impact performance negatively. But this is an advanced expert feature and users have to use it with care.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CALo5CZx1u%3DGJps%3D8NDJgDVHWzG2H%3DbKp8nrpaCiH3cVxR5wNWg%40mail.gmail.com.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/2319368.TBkSdHU6Et%40arkadios.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2B6G0A8Pg2Fj30T%3DEHM9m_gBiLi7bNDMjEtD_aMVm5Xh%3DhkLw%40mail.gmail.com.