cache maintenance instruction or operation

553 views
Skip to first unread message

chuanhua.chang

unread,
Jun 8, 2017, 5:23:41 AM6/8/17
to RISC-V ISA Dev
I cannot find any cache maintenance instruction or operations defined in the RISC-V architecture specification.

Why is that? Do you have plan to add this kind of instructions/operations in the near future?

Thanks!

Chuanhua

Bruce Hoult

unread,
Jun 8, 2017, 6:18:54 AM6/8/17
to chuanhua.chang, RISC-V ISA Dev
I can understand why the simplest implementations might not want to have such instructions, but you'd imagine having caches but no cache control instructions might limit commercial success.

I already found this annoying on my HiFive1 board. It has an instruction cache and normally executes user code from addresses in memory-mapped API flash. If you want to write anything to flash -- have some kind of file system, for example -- then you need to temporarily disable the memory mapping. That means you either have to be very sure the code for doing the writing to flash is in the instruction cache and stays there, or else copy it into the 16 KB SRAM and execute it from there. Once you know the range of code addresses you need, copying it to SRAM is approximately as easy as preloading (and locking?) code in the instruction cache would be, given suitable instructions. But then you have that much less space for buffers, program variables etc.

Cache control instructions normally only require a register containing an address and an operation code (preload, flush, lock, unlock), so don't take up much instruction encoding space.

Most of the operations you'd want are semantically no-ops, so they can be in an existing no-op instruction encoding space and thus have zero impact on the complexity of simple cache-less processors. Of the usual operations, only "allocate as zeroed" is not a no-op. Even "invalidate" is maybe ok if it's undefined whether you end up with the old or new contents -- after all it's unsafe to rely on new contents having been definitely discarded, as a form of "undo", because an interrupt, thread switch, or just some other code may already have caused changes to be written to main memory.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/fea83558-920d-4d62-a584-b7df5bac24aa%40groups.riscv.org.

Jacob Bachmeyer

unread,
Jun 9, 2017, 10:37:38 PM6/9/17
to Bruce Hoult, chuanhua.chang, RISC-V ISA Dev
If I understand correctly, FENCE already provides flush semantics. LOAD
to x0 seems like an ideal preload instruction encoding--simple
implementation can execute it as written and it will have the correct
semantics.

I would be concerned about lock and unlock either opening side channels
or being useful as a denial-of-service, so those would probably need to
be privileged instructions. Further, these will require two register
operands, a base address and a length, because the cacheline size is
absolutely not part of the RISC-V user ISA and should never be allowed
to be an implicit operand to any instruction.


-- Jacob

Andrew Waterman

unread,
Jun 10, 2017, 6:35:30 AM6/10/17
to Bruce Hoult, jcb6...@gmail.com, RISC-V ISA Dev, chuanhua.chang
Note, though, that loads targeting x0 still have side effects (page faults, MMIO actions, etc.).



I would be concerned about lock and unlock either opening side channels
or being useful as a denial-of-service, so those would probably need to
be privileged instructions.  Further, these will require two register
operands, a base address and a length, because the cacheline size is
absolutely not part of the RISC-V user ISA and should never be allowed
to be an implicit operand to any instruction.


-- Jacob

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Po-wei Huang

unread,
Jun 10, 2017, 11:13:37 PM6/10/17
to Andrew Waterman, Bruce Hoult, jcb6...@gmail.com, RISC-V ISA Dev, chuanhua.chang
Cache instructions are useful, but there do exist some implementation which don’t have cache but have memory-mapped scratch pad memory or TCM.
Their purpose is to reduce the area overhead. 
I guess cache instructions might be useful for 90% players, but it shouldn’t be mandatory. 

Po-wei

chuanhua.chang

unread,
Jun 11, 2017, 11:08:40 PM6/11/17
to RISC-V ISA Dev, br...@hoult.org, chuanhu...@gmail.com, jcb6...@gmail.com
I agree with Bruce that only a base address register operand is enough. Length operand can reduce the number of executed cache maintenance instructions, but it is not necessary. A programmer can just use a cache maintenance instruction for every address with size increment smaller than the smallest cache line size (e.g., every 4 bytes or 8 bytes) to cover the address range.

Chuanhua


On Saturday, June 10, 2017 at 10:37:38 AM UTC+8, Jacob Bachmeyer wrote:
Bruce Hoult wrote:

> Cache control instructions normally only require a register containing
> an address and an operation code (preload, flush, lock, unlock), so
> don't take up much instruction encoding space.


Jacob Bachmeyer

unread,
Jun 11, 2017, 11:30:38 PM6/11/17
to chuanhua.chang, RISC-V ISA Dev, br...@hoult.org
chuanhua.chang wrote:
> I agree with Bruce that only a base address register operand is
> enough. Length operand can reduce the number of executed cache
> maintenance instructions, but it is not necessary. A programmer can
> just use a cache maintenance instruction for every address with size
> increment smaller than the smallest cache line size (e.g., every 4
> bytes or 8 bytes) to cover the address range.

That would make prefetch hugely inefficient.


-- Jacob

Albert Cahalan

unread,
Jun 11, 2017, 11:59:48 PM6/11/17
to chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
On 6/11/17, chuanhua.chang <chuanhu...@gmail.com> wrote:
> On Saturday, June 10, 2017 at 10:37:38 AM UTC+8, Jacob Bachmeyer wrote:

>> Further, these will require two register
>> operands, a base address and a length, because the cacheline size is
>> absolutely not part of the RISC-V user ISA and should never be allowed
>> to be an implicit operand to any instruction.
...
> I agree with Bruce that only a base address register operand is enough.
> Length operand can reduce the number of executed cache maintenance
> instructions, but it is not necessary. A programmer can just use a cache
> maintenance instruction for every address with size increment smaller than
> the smallest cache line size (e.g., every 4 bytes or 8 bytes) to cover the
> address range.

That won't fly. OS developers will assume the line size to get performance,
causing compatibility issues if the line size changes. If the line size changes
despite the compatibility issues, OS developers will resort to self-modifying
code.

Actual cache line size should be difficult to stumble over -- code written
for one machine should work on any other machine, oblivious to different
cache line sizes.

It is probably reasonable to assume that cache line sizes are small powers
of two, so handling arbitrary lengths and alignments isn't so important.
The instructions could have 3 bits that specify a size by shifting 4096 to
the right. That would allow for 32,64,128,256,512,2048,4096 as sizes.
The low bits of an address in a cache instruction would not matter.

Another approach is to encode the size in the address. For this approach,
the user would mask off low address bits and then OR in a size indicator.
If the size indicator is half of the desired size, then all sizes larger than 1
can be handled. Math for loops still works fine after ORing the bits in.
Example for 256 bytes: value = ((uint64_t)addr&~0xffull)|(256/2)

Bruce Hoult

unread,
Jun 12, 2017, 8:49:09 AM6/12/17
to Jacob Bachmeyer, chuanhua.chang, RISC-V ISA Dev
On Mon, Jun 12, 2017 at 6:30 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
chuanhua.chang wrote:
I agree with Bruce that only a base address register operand is enough. Length operand can reduce the number of executed cache maintenance instructions, but it is not necessary. A programmer can just use a cache maintenance instruction for every address with size increment smaller than the smallest cache line size (e.g., every 4 bytes or 8 bytes) to cover the address range.

That would make prefetch hugely inefficient.

Not really. We have to do that on ARM now anyway, as there exist big.LITTLE implementations with different line sizes and no way to know whether you've been migrated since you queried the line size. The OS has to ensure that the smallest lines size in the system is returned.

If the hint instructions will cause any actual memory traffic then doing two or four times more of them than strictly necessary is *way* down in the noise. Even if no memory traffic is caused, stepping by 32 when you could have stepped by 64 or 128 is not a big deal, and far better than stepping by 1 or 4.

p.s. I wasn't arguing for making the line size implicit, so one can't really "agree with Bruce" there. I was just pointing out that if there is a reason to not have cache control instructions, neither encoding space nor complexity for cacheless designs is one of them.

I would strongly support encoding a size somehow in either the instruction itself, or in the lower bits of the address. Only a few bits are needed for reasonable variation in line sizes. We've had caches for four decades now, and while line sizes have varied it looks to me more like they change in a cycle rather than slowly growing to infinity. Even three bits would cover the range from 4 bytes to 1024 bytes, which should be pretty future-proof.

Po-wei Huang

unread,
Jun 14, 2017, 11:45:16 PM6/14/17
to Bruce Hoult, Jacob Bachmeyer, chuanhua.chang, RISC-V ISA Dev
Nice discussion!
However,  I would like to diverge into another question in terms of isa management:
If we do want to put these cache instruction into RISC-V, where should we put it? 

While these instruction could have custom encoding and architecture, an unified approach could help software programmers a lot.
Should we open up a new extension chapter? 
Po-wei

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

chuanhua.chang

unread,
Jun 15, 2017, 1:30:12 AM6/15/17
to RISC-V ISA Dev, br...@hoult.org, chuanhu...@gmail.com, jcb6...@gmail.com
For an implementation with a write-back data cache and no coherence module to maintain cache coherence, do you mean that a FENCE instruction has to write-back (flush) all dirty cache lines in the data cache to memory?

If this is true, this is a very expensive operation.

Chuanhua



On Saturday, June 10, 2017 at 10:37:38 AM UTC+8, Jacob Bachmeyer wrote:

If I understand correctly, FENCE already provides flush semantics. 
-- Jacob

Richard Herveille

unread,
Jun 15, 2017, 6:42:20 AM6/15/17
to chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
Actually FENCE_I does that. And yes, it's an expensive instruction. 

Richard 



Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Bruce Hoult

unread,
Jun 15, 2017, 7:18:04 AM6/15/17
to Po-wei Huang, Jacob Bachmeyer, chuanhua.chang, RISC-V ISA Dev
As I said in my first message in this thread, almost all cache control instructions are only hints and don't change the meaning of the program. To simplify things for low end processors that don't have caches (or don't implement the cache control instructions) it makes sense to disguise them as instructions that are already no-ops. This includes ADDI or ORI with #0 and the same SRC1 and DST, or anything with a DST of r0.

On Thu, Jun 15, 2017 at 6:45 AM, Po-wei Huang <poweih...@gmail.com> wrote:
Nice discussion!
However,  I would like to diverge into another question in terms of isa management:
If we do want to put these cache instruction into RISC-V, where should we put it? 

While these instruction could have custom encoding and architecture, an unified approach could help software programmers a lot.
Should we open up a new extension chapter? 
Po-wei
Bruce Hoult <br...@hoult.org> 於 2017年6月12日 下午8:49 寫道:

On Mon, Jun 12, 2017 at 6:30 AM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
chuanhua.chang wrote:
I agree with Bruce that only a base address register operand is enough. Length operand can reduce the number of executed cache maintenance instructions, but it is not necessary. A programmer can just use a cache maintenance instruction for every address with size increment smaller than the smallest cache line size (e.g., every 4 bytes or 8 bytes) to cover the address range.

That would make prefetch hugely inefficient.

Not really. We have to do that on ARM now anyway, as there exist big.LITTLE implementations with different line sizes and no way to know whether you've been migrated since you queried the line size. The OS has to ensure that the smallest lines size in the system is returned.

If the hint instructions will cause any actual memory traffic then doing two or four times more of them than strictly necessary is *way* down in the noise. Even if no memory traffic is caused, stepping by 32 when you could have stepped by 64 or 128 is not a big deal, and far better than stepping by 1 or 4.

p.s. I wasn't arguing for making the line size implicit, so one can't really "agree with Bruce" there. I was just pointing out that if there is a reason to not have cache control instructions, neither encoding space nor complexity for cacheless designs is one of them.

I would strongly support encoding a size somehow in either the instruction itself, or in the lower bits of the address. Only a few bits are needed for reasonable variation in line sizes. We've had caches for four decades now, and while line sizes have varied it looks to me more like they change in a cycle rather than slowly growing to infinity. Even three bits would cover the range from 4 bytes to 1024 bytes, which should be pretty future-proof.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

chuanhua.chang

unread,
Jun 15, 2017, 8:42:30 AM6/15/17
to RISC-V ISA Dev, chuanhu...@gmail.com, br...@hoult.org, jcb6...@gmail.com
True. FENCE_I needs to do that. But a "FENCE w, r" will have to do that, right?

MIPS has a "synci" instruction that carries a virtual address operand. It may be inefficient, but it will not cause an entire checking and flushing of the whole data cache and flush too much more than it needs to be.

Chuanhua

Richard Herveille

unread,
Jun 16, 2017, 1:04:03 AM6/16/17
to chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
I don't think so. The FENCE instructions currently only guarantee that memory operations complete. So they restrict reordering of memory operations. 
There's been a whole discussion on this group to add new types of FENCE instructions that will do what you want. 

Richard 


Sent from my iPad

Michael Clark

unread,
Jun 16, 2017, 5:14:53 AM6/16/17
to Richard Herveille, chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
It's interesting as to whether FENCE.I has to be expensive and it very much depends on the microarchitecture.

FENCE.I doesn't necessarily mean flush icache however some implementations may pessimistically flush icache as a simple implementation approach. A simple implementation may also need to flush dcache to the next cache tier if it can't cross populate icache from dcache.

FENCE.I is technically an explicit fence that indicates successive instruction fetches should see previous stores. It is a type of memory fence.

To "sync icache and dcache" on x86 one just needs to execute a JMP instruction, I think since i486. JMP is technically an "implicit" FENCE.I on x86 according to the architecture manual so it obviously has to be relatively cheap, but clearly this requires some microarchitecture gymnastics.

A sophisticated implementation only needs to invalidate icache lines that are dirty in dcache and the icache could cross populate from dcache. In fact the icache may even snoop dcache writes for addresses of lines that are in icache and preemptively populate icache for modified code, with a fence just invalidating instruction prefetch queues. Architectures without an explicit instruction memory fence need to resort to this kind of thing and it's likely necessary for good JIT performance.

So FENCE.I should not be considered an icache flush, rather instruction memory FENCE w,x ; successive instruction fetches see previous stores (using the memory fence notation with 'x' for instruction fetches vs 'r' for loads).

Apparently icache dcache coherency is even cross cores on some architectures. Magic. See http://blog.onlinedisassembler.com/blog/?p=133

Sent from my iPhone

Andrew Waterman

unread,
Jun 16, 2017, 6:10:22 AM6/16/17
to Michael Clark, Richard Herveille, chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
Yeah. FENCE.I is an ordering constraint. Nothing in RV specifies
cache flushes, because, by design, nothing in RV specifies caches.
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/F91D3974-7ACC-4E0B-8206-E532892C3EA4%40mac.com.

Richard Herveille

unread,
Jun 16, 2017, 7:09:15 AM6/16/17
to Michael Clark, chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
Yes, agreed. 
RISC-V only specifies the ISA, how that is implemented is up to the designer. 
In my embedded core, FENCE_I writes-back all dirty data cache lines (although 'all' wouldn't be necessary) and flushes the entire instruction cache (here 'entire' wouldn't be necessary and a flush wouldn't be necessary either). However design goal is small size and simple. 

Richard 



Sent from my iPhone

Andrew Waterman

unread,
Jun 16, 2017, 1:00:45 PM6/16/17
to Michael Clark, Richard Herveille, RISC-V ISA Dev, br...@hoult.org, chuanhua.chang, jcb6...@gmail.com
Yeah. It's a fine microarchitectural decision, especially for a uniprocessor that doesn't otherwise need to snoop caches.

Once you bite the bullet and have coherence for other reasons, probably to support multiple cores or fast DMA, it makes sense to implement FENCE.I differently--e.g. flush only the I$ and snoop the D$ upon I$ misses; or snoop the I$, too, and flush only the pipeline. But for the smallest and simplest implementations with Harvard caches, flushing it all is the way to go.

Guy Lemieux

unread,
Jun 16, 2017, 1:08:17 PM6/16/17
to Andrew Waterman, Michael Clark, Richard Herveille, RISC-V ISA Dev, br...@hoult.org, chuanhua.chang, jcb6...@gmail.com
This is why a ranged FENCE.I makes sense -- you don't have to flush the whole icache, just the memory range that was modified.

I advocate to make both FENCE and FENCE.I to be R-type instructions, so rs1 and rs2 can specify a memory range. If rs1=rs2=r0, then the range corresponds to 'all memory'.

Once the FENCE instructions are R-type, we can debate the use of rd, such as a successful completion code (allowing FENCE to be interrupted and forcing software to do a resume) or an address to indicate progress if the memory range was incompletely checked for the fence, or some other useful value.

Guy


On Fri, Jun 16, 2017 at 10:00 AM, Andrew Waterman <and...@sifive.com> wrote:
Yeah. It's a fine microarchitectural decision, especially for a uniprocessor that doesn't otherwise need to snoop caches.

Once you bite the bullet and have coherence for other reasons, probably to support multiple cores or fast DMA, it makes sense to implement FENCE.I differently--e.g. flush only the I$ and snoop the D$ upon I$ misses; or snoop the I$, too, and flush only the pipeline. But for the smallest and simplest implementations with Harvard caches, flushing it all is the way to go.
On Fri, Jun 16, 2017 at 4:09 AM Richard Herveille <richard.herveille@roalogic.com> wrote:
Yes, agreed. 
RISC-V only specifies the ISA, how that is implemented is up to the designer. 
In my embedded core, FENCE_I writes-back all dirty data cache lines (although 'all' wouldn't be necessary) and flushes the entire instruction cache (here 'entire' wouldn't be necessary and a flush wouldn't be necessary either). However design goal is small size and simple. 

Richard 



Sent from my iPhone

On 16 Jun 2017, at 11:14, Michael Clark <michae...@mac.com> wrote:

It's interesting as to whether FENCE.I has to be expensive and it very much depends on the microarchitecture.

FENCE.I doesn't necessarily mean flush icache however some implementations may pessimistically flush icache as a simple implementation approach. A simple implementation may also need to flush dcache to the next cache tier if it can't cross populate icache from dcache.

FENCE.I is technically an explicit fence that indicates successive instruction fetches should see previous stores. It is a type of memory fence.

To "sync icache and dcache" on x86 one just needs to execute a JMP instruction, I think since i486. JMP is technically an "implicit" FENCE.I on x86 according to the architecture manual so it obviously has to be relatively cheap, but clearly this requires some microarchitecture gymnastics.

A sophisticated implementation only needs to invalidate icache lines that are dirty in dcache and the icache could cross populate from dcache. In fact the icache may even snoop dcache writes for addresses of lines that are in icache and preemptively populate icache for modified code, with a fence just invalidating instruction prefetch queues. Architectures without an explicit instruction memory fence need to resort to this kind of thing and it's likely necessary for good JIT performance.

So FENCE.I should not be considered an icache flush, rather instruction memory FENCE w,x ; successive instruction fetches see previous stores (using the memory fence notation with 'x' for instruction fetches vs 'r' for loads).

Apparently icache dcache coherency is even cross cores on some architectures. Magic. See http://blog.onlinedisassembler.com/blog/?p=133

Sent from my iPhone

On 15/06/2017, at 10:42 PM, Richard Herveille <richard.herveille@roalogic.com> wrote:

Actually FENCE_I does that. And yes, it's an expensive instruction. 

Richard 



Sent from my iPhone

On 15 Jun 2017, at 07:30, chuanhua.chang <chuanhu...@gmail.com> wrote:

For an implementation with a write-back data cache and no coherence module to maintain cache coherence, do you mean that a FENCE instruction has to write-back (flush) all dirty cache lines in the data cache to memory?

If this is true, this is a very expensive operation.

Chuanhua


On Saturday, June 10, 2017 at 10:37:38 AM UTC+8, Jacob Bachmeyer wrote:

If I understand correctly, FENCE already provides flush semantics. 
-- Jacob

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Andrew Waterman

unread,
Jun 16, 2017, 1:33:10 PM6/16/17
to Guy Lemieux, Michael Clark, RISC-V ISA Dev, Richard Herveille, br...@hoult.org, chuanhua.chang, jcb6...@gmail.com
Yeah, we had that eventual option in mind when we chose the instruction encoding. The base ISA won't include this proposal, but I expect that something along the lines of your proposal will eventually be incorporated.

My guess is that it won't prove popular in performant implementations, though. Once there's enough runtime code generation that FENCE.I's performance actually matters, you're better off providing I$ coherence; it will result in better perf/area and perf/W than the other strategies. Same reason that incoherent DMA is an inefficient choice for all but the tiniest systems.

On Fri, Jun 16, 2017 at 10:08 AM Guy Lemieux <glem...@vectorblox.com> wrote:
This is why a ranged FENCE.I makes sense -- you don't have to flush the whole icache, just the memory range that was modified.

I advocate to make both FENCE and FENCE.I to be R-type instructions, so rs1 and rs2 can specify a memory range. If rs1=rs2=r0, then the range corresponds to 'all memory'.

Once the FENCE instructions are R-type, we can debate the use of rd, such as a successful completion code (allowing FENCE to be interrupted and forcing software to do a resume) or an address to indicate progress if the memory range was incompletely checked for the fence, or some other useful value.

Guy


On Fri, Jun 16, 2017 at 10:00 AM, Andrew Waterman <and...@sifive.com> wrote:
Yeah. It's a fine microarchitectural decision, especially for a uniprocessor that doesn't otherwise need to snoop caches.

Once you bite the bullet and have coherence for other reasons, probably to support multiple cores or fast DMA, it makes sense to implement FENCE.I differently--e.g. flush only the I$ and snoop the D$ upon I$ misses; or snoop the I$, too, and flush only the pipeline. But for the smallest and simplest implementations with Harvard caches, flushing it all is the way to go.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Guy Lemieux

unread,
Jul 25, 2017, 9:06:03 PM7/25/17
to Andrew Waterman, Michael Clark, Richard Herveille, chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
On Fri, Jun 16, 2017 at 3:09 AM, Andrew Waterman <and...@sifive.com> wrote:
> On Fri, Jun 16, 2017 at 2:14 AM, Michael Clark <michae...@mac.com> wrote:
>> It's interesting as to whether FENCE.I has to be expensive and it very much
>> depends on the microarchitecture.
>> [...]
>> So FENCE.I should not be considered an icache flush, rather instruction
>> memory FENCE w,x ; successive instruction fetches see previous stores (using
>> the memory fence notation with 'x' for instruction fetches vs 'r' for
>> loads).
>
> Yeah. FENCE.I is an ordering constraint. Nothing in RV specifies
> cache flushes, because, by design, nothing in RV specifies caches.

>> On 15 Jun 2017, at 07:30, chuanhua.chang <chuanhu...@gmail.com> wrote:
>> For an implementation with a write-back data cache and no coherence module
>> to maintain cache coherence, do you mean that a FENCE instruction has to
>> write-back (flush) all dirty cache lines in the data cache to memory?
>>
>> If this is true, this is a very expensive operation.
>>
>> Chuanhua


Based on Andrew's response above, since nothing in RV specifies
caches, then we must interpret the spec in that context. The spec
says: "The FENCE instruction is used to order device I/O and memory
accesses as viewed by other RISC-V harts and external devices or
coprocessors."

Strictly interpreting Andrew's response suggests that non-coherent
systems must flush dcache so that external devices or coprocessors can
see the writes.

However, this is imposing a coherence operation upon a system designer
that is intending to create a "cheap" non-coherent system.

I view this as an overly strong implicit interpretation, and it needs
to be called out explicitly.

I am also a bit worried about trying to specify coherent type
behaviour to non-coherent systems.

Sincerely,
Guy

Andrew Waterman

unread,
Jul 25, 2017, 9:21:37 PM7/25/17
to Guy Lemieux, Michael Clark, Richard Herveille, chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
The commentary for FENCE.I discusses this.

>
> I am also a bit worried about trying to specify coherent type
> behaviour to non-coherent systems.

There are many ways around mandating coherence to implement FENCE.I.
Some that come to mind:

- Heavy-handed cache flushes.
- Trap FENCE.I and emulate it using another mechanism.
- If appropriate for your platform, set the physical memory attributes
so that it's illegal to execute from incoherent memory.

>
> Sincerely,
> Guy

Po-wei Huang

unread,
Aug 9, 2017, 4:32:33 AM8/9/17
to Andrew Waterman, Guy Lemieux, Michael Clark, Richard Herveille, chuanhua.chang, RISC-V ISA Dev, br...@hoult.org, jcb6...@gmail.com
Following up on this issue that could leads to fragmentation in software:

So, what’s the primary reason to not include explicit cache control instruction as an extension?

If we want to allow non-coherent DMA or I/O, we would have to include it, right?
it might be too risky to completely abandon non-coherent DMA, and it will also be a chaos if each company implement their own custom cache instruction.

So, in my opinion, it might be better to include it as an extension, despite the complexity that every additional extension of RISC-V will cause.
After all, in the end, we all need a conclusion for any issues that could lead to fragmentation.

Po-wei


-- 
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
Reply all
Reply to author
Forward
0 new messages