Cache flush/invalidate instructions

Joel Vandergriendt

unread,

Jun 9, 2017, 1:39:56 PM6/9/17

to isa...@groups.riscv.org

Hi all,

I am wondering if anyone has created a core using cache invalidation/flush instructions. I think I read somewhere that the spec assumes that coherency is always handled by hardware. We plan to ignore that for various reasons, But if anyone has a method software controlled caches that they would like to share I think we could benefit from that.

Joel Vandergriendt

Vectorblox Computing Inc.

Guy Lemieux

unread,

Jun 15, 2017, 6:23:17 PM6/15/17

to Joel Vandergriendt, isa...@groups.riscv.org

To follow-up on Joel's query, we are implementing caches for a microcontroller class RISC V and find some of the specs unclear.

Our caches are simple, not intended to be cache coherent / snoopy, but there will be IO devices with DMA.

(1) Is FENCE intended to force a writeback of the data cache in a non-coherent system? This seems to be a requirement so that external IO devices using DMA can observe writes done before the FENCE.

(2) Is FENCE intended to invalidate the data cache in a non-coherent system? The spec is very vague. On the one hand, you could argue this is required so that a DMA device can write fresh data to memory and the CPU will pick it up because its forced to miss the cache. On the other hand, this seems to be implementing a coherence event which is not strictly required by the spec.

(3) Is FENCE.I intended to force a writeback of the data cache AND invalidate the instruction cache in a non-coherent system? This seems to be true, because instruction reads need to pick up the latest values that may have been written by the CPU prior to the FENCE.I. On the other hand, the spec should not necessarily be forcing a coherence event in a non-coherent system, so although it would be prudent to do a data cache writeback it may not be strictly necessary.

(4) We propose adding the following new instructions for cache management:

INVAL rd,rs1,rs2

WBACK rd,rs1,rs2

FLUSH rd,rs1,rs2

where rs1,rs2 defines a Memory Range (cache line starting with rs1, ending with rs2, inclusive)

INVAL invalidates the data cache (dirty data discarded)

WBACK writes dirty lines in data cache, marking them clean and valid

FLUSH writes dirty lines in data cache, marking them invalid

we also think the following instructions will be useful:

FENCE rs1,rs2

FENCE.I rs1,rs2

These FENCE variants work as before, but only across a defined memory region. That is, they do not invalidate the entire cache, only cache lines that hold data within the defined memory region. Thus, you can use FENCE.I rs1,rs2 to invalidate a region of memory that was written by self-modifying code, without destroying the whole i-cache footprint. Likewise, you can FENCE rs1,rs2 to prepare a DMA buffer region prior to issuing a command to an IO device to do a DMA READ from memory.

We have thought about the adding the above instructions by mapping them to CSR writes. Unfortunately, they each require a range of addresses, and we wish to issue that command in an atomic and stateless fashion.

The CPU will stall during the above events.

We can allow INVAL/WBACK/FLUSH instructions to be interrupted, using rd to store intermediate state so we can resume after the interrupt.

Can we allow FENCE and FENCE.I to be interrupted? If they are actually flushing caches, then this could create a very long/nondeterministic interrupt service latency. It's unclear we can interrupt without starting over again after the interrupt is serviced.

Guy

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CABq8-it2rHnTr9N2Hym%2By%3DSFztEYzUJqeCgSOW7tXPA6vSDH-Q%40mail.gmail.com.

Jacob Bachmeyer

unread,

Jun 15, 2017, 7:40:26 PM6/15/17

to Guy Lemieux, Joel Vandergriendt, isa...@groups.riscv.org

I do not see where INVAL could ever be useful in RISC-V.

> WBACK writes dirty lines in data cache, marking them clean and valid
> FLUSH writes dirty lines in data cache, marking them invalid

Why distinguish these cases? Both of these write dirty cachelines to
memory and leave those cachelines available for reallocation.

> we also think the following instructions will be useful:
>
> FENCE rs1,rs2
> FENCE.I rs1,rs2
>
> These FENCE variants work as before, but only across a defined memory
> region. That is, they do not invalidate the entire cache, only cache
> lines that hold data within the defined memory region. Thus, you can
> use FENCE.I rs1,rs2 to invalidate a region of memory that was written
> by self-modifying code, without destroying the whole i-cache
> footprint. Likewise, you can FENCE rs1,rs2 to prepare a DMA buffer
> region prior to issuing a command to an IO device to do a DMA READ
> from memory.

FENCE instructions are I-type and do not have an rs2 field. Changing
them to S-type would break backwards compatibility.

> We have thought about the adding the above instructions by mapping
> them to CSR writes. Unfortunately, they each require a range of
> addresses, and we wish to issue that command in an atomic and
> stateless fashion.
>
> The CPU will stall during the above events.
>
> We can allow INVAL/WBACK/FLUSH instructions to be interrupted, using
> rd to store intermediate state so we can resume after the interrupt.
>
> Can we allow FENCE and FENCE.I to be interrupted? If they are actually
> flushing caches, then this could create a very long/nondeterministic
> interrupt service latency. It's unclear we can interrupt without
> starting over again after the interrupt is serviced.

Interrupts are loosely-specified as I understand and implementations
choose whether to take the interrupt before executing the next
instruction or to discard a partially-completed instruction. Either
way, *epc is loaded with the address of the instruction where execution
should continue after an interrupt. Starting over again should be fine,
as the cache will have fewer dirty lines after the ISR returns.

> On Fri, Jun 9, 2017 at 10:39 AM, Joel Vandergriendt
> <jo...@vectorblox.com <mailto:jo...@vectorblox.com>> wrote:
>
> Hi all,
>
> I am wondering if anyone has created a core using cache
> invalidation/flush instructions. I think I read somewhere that the
> spec assumes that coherency is always handled by hardware. We plan
> to ignore that for various reasons, But if anyone has a method
> software controlled caches that they would like to share I think
> we could benefit from that.
>
>

> *Joel Vandergriendt*

> Vectorblox Computing Inc.
> --
> You received this message because you are subscribed to the Google
> Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it,

> send an email to isa-dev+u...@groups.riscv.org
> <mailto:isa-dev+u...@groups.riscv.org>.

> To post to this group, send email to isa...@groups.riscv.org

> <mailto:isa...@groups.riscv.org>.

> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/

> <https://groups.google.com/a/groups.riscv.org/group/isa-dev/>.

> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CABq8-it2rHnTr9N2Hym%2By%3DSFztEYzUJqeCgSOW7tXPA6vSDH-Q%40mail.gmail.com

> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CABq8-it2rHnTr9N2Hym%2By%3DSFztEYzUJqeCgSOW7tXPA6vSDH-Q%40mail.gmail.com?utm_medium=email&utm_source=footer>.

>
>
> --
> You received this message because you are subscribed to the Google
> Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send

> an email to isa-dev+u...@groups.riscv.org
> <mailto:isa-dev+u...@groups.riscv.org>.

> To post to this group, send email to isa...@groups.riscv.org

> <mailto:isa...@groups.riscv.org>.

> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit

> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CALo5CZw0HV5NZb8Q7Y8oo4fJjhMhXy9Ssfp0Qe_Tci%3DKPGThsA%40mail.gmail.com
> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CALo5CZw0HV5NZb8Q7Y8oo4fJjhMhXy9Ssfp0Qe_Tci%3DKPGThsA%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Guy Lemieux

unread,

Jun 15, 2017, 7:54:33 PM6/15/17

to jcb6...@gmail.com, Joel Vandergriendt, isa...@groups.riscv.org

(4) We propose adding the following new instructions for cache management:
INVAL rd,rs1,rs2
WBACK rd,rs1,rs2
FLUSH rd,rs1,rs2

where rs1,rs2 defines a Memory Range (cache line starting with rs1, ending with rs2, inclusive)

INVAL invalidates the data cache (dirty data discarded)

I do not see where INVAL could ever be useful in RISC-V.

Scenario: a device buffer has been allocated in memory, you should remove its contents from the data cache. However, there is no need to write back any existing content that may be in the data cache, since the IO device will clobber it anyways, so a FLUSH or FENCE would be excessive overhead.

WBACK writes dirty lines in data cache, marking them clean and valid
FLUSH writes dirty lines in data cache, marking them invalid

Why distinguish these cases? Both of these write dirty cachelines to memory and leave those cachelines available for reallocation.

WBACK does not erase the data from the cache footprint, so reads can still be accelerated.

Consider the case where you must ensure all data has been written to a DMA buffer before a device copies it from RAM to disk. Performing a complete FLUSH is excessive overhead, so the WBACK would be a faster version.

we also think the following instructions will be useful:

FENCE rs1,rs2
FENCE.I rs1,rs2

These FENCE variants work as before, but only across a defined memory region. That is, they do not invalidate the entire cache, only cache lines that hold data within the defined memory region. Thus, you can use FENCE.I rs1,rs2 to invalidate a region of memory that was written by self-modifying code, without destroying the whole i-cache footprint. Likewise, you can FENCE rs1,rs2 to prepare a DMA buffer region prior to issuing a command to an IO device to do a DMA READ from memory.

FENCE instructions are I-type and do not have an rs2 field. Changing them to S-type would break backwards compatibility.

We are not changing the existing FENCE instructions which have rd and rs1 fields.

We would be adding new FENCE instructions which would not be I-type. Perhaps I should change the name to FENCEMR and FENCEMR.I to emphasize their different encoding, but I we have not yet determined suitable names.

Can we allow FENCE and FENCE.I to be interrupted? If they are actually flushing caches, then this could create a very long/nondeterministic interrupt service latency. It's unclear we can interrupt without starting over again after the interrupt is serviced.

Interrupts are loosely-specified as I understand and implementations choose whether to take the interrupt before executing the next instruction or to discard a partially-completed instruction. Either way, *epc is loaded with the address of the instruction where execution should continue after an interrupt. Starting over again should be fine, as the cache will have fewer dirty lines after the ISR returns.

You must be guaranteed to make forward progress or experience livelock. The ISR can introduce new dirty lines, which need to be flushed during FENCE/FENCE.I. The FENCE semantics talk about predecessor sets, but it isn't clear that instructions in an interrupt service routine would be deemed a predecessor if the FENCE is interrupted.

Thanks,

Guy

Tommy Thorn

unread,

Jun 15, 2017, 8:05:09 PM6/15/17

to Guy Lemieux, jcb6...@gmail.com, Joel Vandergriendt, isa...@groups.riscv.org

> Scenario: a device buffer has been allocated in memory, you should remove its contents from the data cache. However, there is no need to write back any existing content that may be in the data cache, since the IO device will clobber it anyways, so a FLUSH or FENCE would be excessive overhead.

Common scenario #2: a two space garbage collector when done migrating objects from one half to the other has no need for migrated objects to clobber up the cache. Without this, the cache might evict useful lines while useless lines remain (especially true for a LRU cache).

In his scenario an efficient way to *validate* a cache line and make it dirty would also be very helpful. Of course, for security reasons, the line would probably have to be cleared, thus making this roughly equivalent to

invalidate(dest);
memset(dest, 0, linesize);

but more efficient.

Tommy

Michael Clark

unread,

Jun 15, 2017, 8:28:22 PM6/15/17

to Guy Lemieux, jcb6...@gmail.com, Joel Vandergriendt, isa...@groups.riscv.org

On 16 Jun 2017, at 11:53 AM, Guy Lemieux <glem...@vectorblox.com> wrote:

(4) We propose adding the following new instructions for cache management:
INVAL rd,rs1,rs2
WBACK rd,rs1,rs2
FLUSH rd,rs1,rs2

where rs1,rs2 defines a Memory Range (cache line starting with rs1, ending with rs2, inclusive)

INVAL invalidates the data cache (dirty data discarded)

I do not see where INVAL could ever be useful in RISC-V.

Scenario: a device buffer has been allocated in memory, you should remove its contents from the data cache. However, there is no need to write back any existing content that may be in the data cache, since the IO device will clobber it anyways, so a FLUSH or FENCE would be excessive overhead.

This makes sense. It removes the overhead of WRITEBACK or FLUSH.

INVAL is like an Acquire.

WBACK writes dirty lines in data cache, marking them clean and valid
FLUSH writes dirty lines in data cache, marking them invalid

Why distinguish these cases? Both of these write dirty cachelines to memory and leave those cachelines available for reallocation.

WBACK does not erase the data from the cache footprint, so reads can still be accelerated.

Consider the case where you must ensure all data has been written to a DMA buffer before a device copies it from RAM to disk. Performing a complete FLUSH is excessive overhead, so the WBACK would be a faster version.

I agree. The distinction is important in a cache incoherent system, as successive reads after a FLUSH will cause fetches from the memory system and can be used for synchronisation. WRITEBACK would be used in the case where you want to make sure the memory range is synchronised with the backing store but the contents remain in cache.

WRITEBACK is like a Release.

FLUSH is like a Release-Acquire.

In fact, I don’t like the term “cache incoherent” and prefer “explicit cache control”. Many GPU architectures use explicit cache control combined with tiers of various memory types. e.g. registers or SRAM for local memory, SRAM for cluster/partitioned shared memory, and GDDR for global memory.

we also think the following instructions will be useful:

FENCE rs1,rs2
FENCE.I rs1,rs2

These FENCE variants work as before, but only across a defined memory region. That is, they do not invalidate the entire cache, only cache lines that hold data within the defined memory region. Thus, you can use FENCE.I rs1,rs2 to invalidate a region of memory that was written by self-modifying code, without destroying the whole i-cache footprint. Likewise, you can FENCE rs1,rs2 to prepare a DMA buffer region prior to issuing a command to an IO device to do a DMA READ from memory.

FENCE instructions are I-type and do not have an rs2 field. Changing them to S-type would break backwards compatibility.

We are not changing the existing FENCE instructions which have rd and rs1 fields.

We would be adding new FENCE instructions which would not be I-type. Perhaps I should change the name to FENCEMR and FENCEMR.I to emphasize their different encoding, but I we have not yet determined suitable names.

Can we allow FENCE and FENCE.I to be interrupted? If they are actually flushing caches, then this could create a very long/nondeterministic interrupt service latency. It's unclear we can interrupt without starting over again after the interrupt is serviced.

Interrupts are loosely-specified as I understand and implementations choose whether to take the interrupt before executing the next instruction or to discard a partially-completed instruction. Either way, *epc is loaded with the address of the instruction where execution should continue after an interrupt. Starting over again should be fine, as the cache will have fewer dirty lines after the ISR returns.

You must be guaranteed to make forward progress or experience livelock. The ISR can introduce new dirty lines, which need to be flushed during FENCE/FENCE.I. The FENCE semantics talk about predecessor sets, but it isn't clear that instructions in an interrupt service routine would be deemed a predecessor if the FENCE is interrupted.

Thanks,
Guy

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.

Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CALo5CZwd4w_p5ogwmXEsPj-KCMJYt3YeVus1spZf82X9FmmcHA%40mail.gmail.com.

Jacob Bachmeyer

unread,

Jun 15, 2017, 8:54:57 PM6/15/17

to Guy Lemieux, Joel Vandergriendt, isa...@groups.riscv.org

Guy Lemieux wrote:
>
> (4) We propose adding the following new instructions for cache
> management:
> INVAL rd,rs1,rs2
> WBACK rd,rs1,rs2
> FLUSH rd,rs1,rs2
>
> where rs1,rs2 defines a Memory Range (cache line starting with
> rs1, ending with rs2, inclusive)
>
> INVAL invalidates the data cache (dirty data discarded)
>
>
> I do not see where INVAL could ever be useful in RISC-V.
>
>
> Scenario: a device buffer has been allocated in memory, you should
> remove its contents from the data cache. However, there is no need to
> write back any existing content that may be in the data cache, since
> the IO device will clobber it anyways, so a FLUSH or FENCE would be
> excessive overhead.

Fair enough. I had only been thinking about MP synchronization and had
forgotten about I/O.

>
> WBACK writes dirty lines in data cache, marking them clean and
> valid
> FLUSH writes dirty lines in data cache, marking them invalid
>
>
> Why distinguish these cases? Both of these write dirty cachelines
> to memory and leave those cachelines available for reallocation.
>
>
> WBACK does not erase the data from the cache footprint, so reads can
> still be accelerated.
>
> Consider the case where you must ensure all data has been written to a
> DMA buffer before a device copies it from RAM to disk. Performing a
> complete FLUSH is excessive overhead, so the WBACK would be a faster
> version.

Would a ranged I/O FENCE also be appropriate here?

> we also think the following instructions will be useful:
>
> FENCE rs1,rs2
> FENCE.I rs1,rs2
>
> These FENCE variants work as before, but only across a defined
> memory region. That is, they do not invalidate the entire
> cache, only cache lines that hold data within the defined
> memory region. Thus, you can use FENCE.I rs1,rs2 to
> invalidate a region of memory that was written by
> self-modifying code, without destroying the whole i-cache
> footprint. Likewise, you can FENCE rs1,rs2 to prepare a DMA
> buffer region prior to issuing a command to an IO device to do
> a DMA READ from memory.
>
>
> FENCE instructions are I-type and do not have an rs2 field.
> Changing them to S-type would break backwards compatibility.
>
>
> We are not changing the existing FENCE instructions which have rd and
> rs1 fields.
>
> We would be adding new FENCE instructions which would not be I-type.
> Perhaps I should change the name to FENCEMR and FENCEMR.I to emphasize
> their different encoding, but I we have not yet determined suitable names.

We just got rid of a similar "same mnemonic produces different
instructions" where the assembler would produce ADDI if ADD was given an
immediate instead of a register. I would prefer not to introduce more
of those. On a side note, I just sent a proposal that calls those
instructions FENCE.RD and FENCE.RI to the list.

>
> Can we allow FENCE and FENCE.I to be interrupted? If they are
> actually flushing caches, then this could create a very
> long/nondeterministic interrupt service latency. It's unclear
> we can interrupt without starting over again after the
> interrupt is serviced.
>
>
> Interrupts are loosely-specified as I understand and
> implementations choose whether to take the interrupt before
> executing the next instruction or to discard a partially-completed
> instruction. Either way, *epc is loaded with the address of the
> instruction where execution should continue after an interrupt.
> Starting over again should be fine, as the cache will have fewer
> dirty lines after the ISR returns.
>
>
> You must be guaranteed to make forward progress or experience
> livelock. The ISR can introduce new dirty lines, which need to be
> flushed during FENCE/FENCE.I. The FENCE semantics talk about
> predecessor sets, but it isn't clear that instructions in an interrupt
> service routine would be deemed a predecessor if the FENCE is interrupted.

This depends on interrupt load, but you are correct. In my experience,
however, if a cache flush can't complete between interrupts, interrupt
load is probably excessive and the system design needs to be rethought.
I expect a well-designed system to have interrupts sufficiently
infrequently that "interrupt during cache flush" is rare and simply
repeating the cache flush will be the right answer.

-- Jacob

Reply all

Reply to author

Forward