Fences between PLIC claim/complete?

Sebastian Huber

unread,

Aug 1, 2018, 3:21:33 AM8/1/18

to RISC-V SW Dev

Hello,

I have the following code that works on Qemu, but not on a real hardware (FPGA):

    volatile RISCV_PLIC_hart_regs *plic_hart_regs;
    uint32_t interrupt_index;

    plic_hart_regs = cpu_self->cpu_per_cpu.plic_hart_regs;

    while ((interrupt_index = plic_hart_regs->claim_complete) != 0) {
      bsp_interrupt_handler_dispatch(
        RISCV_INTERRUPT_VECTOR_EXTERNAL(interrupt_index)
      );
      plic_hart_regs->claim_complete = interrupt_index;
    }

I end up in an endless loop. There seems to be a problem with the write to plic_hart_regs->claim_complete and the immediate read from plic_hart_regs->claim_complete.

If I insert a

__asm__ volatile ("fence i,r" : : : "memory");

or a couple of nops after the

plic_hart_regs->claim_complete = interrupt_index;

then it works.  The fence is just a guess after looking at the Linux arch/riscv/include/asm/io.h.

Is there some documentation available which explains this behaviour? In The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 2.3-draft, Chapter 6, RVWMO Memory Consistency
Model, Version 0.1 is this:

This chapter defines the memory model for regular main memory operations. The interaction of the memory model with I/O memory, instruction fetches, FENCE.I, page table walks, and SFENCE.VMA is not (yet) formalized. Some or all of the above may be formalized in a future revision of this specification.

Sebastian Huber

unread,

Aug 1, 2018, 6:04:33 AM8/1/18

to RISC-V SW Dev

Do we need something similar to the PowerPC eioio instruction here?

> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V SW Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sw-dev+un...@groups.riscv.org.
> To post to this group, send email to sw-...@groups.riscv.org.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/c2c95643-cb6f-42b5-90f1-f388426064b2%40groups.riscv.org.

Sebastian Huber

unread,

Aug 1, 2018, 6:05:47 AM8/1/18

to RISC-V SW Dev

Do we need something like the PowerPC eioio instruction here?

Michael Clark

unread,

Aug 1, 2018, 8:49:41 AM8/1/18

to Sebastian Huber, RISC-V SW Dev

On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:

Do we need something like the PowerPC eioio instruction here?

Yes, EIEIO - Enforce In-order Execution of IO. The ISA designers must have had a bit of a chuckle with that one.

The FENCE instruction can be used as an IO fence as per your last email. I’m not sure of the correct fence. I personally would need to do some digging.

I would have thought that you want to order predecessor writes (o) before successor reads (i), but then I’m not a memory module expert and could easily have it back to front.

while ((i = plic->claim) { /* in */
handle(i);
plic->claim = i; /* out */
asm volatile(“fence o, i” : : : “memory”);
}

Of course you can always use a sledgehammer:

FENCE IORW,IORW

Check with a memory model expert before you follow any of my suggestions. Weak memory models are hard even for experts and ideally this interface should be as intuitive as possible. The memory model is documented in the draft ISA manual:

- https://github.com/riscv/riscv-isa-manual

I have seen expert developers write release as both fence w,rw and fence rw,w on this mailing list in the archives so it’s clearly not easy even for very experienced developers.

I have this question (ISA manual not in front of me right now). Given a release wants to order a store before all other accesses, it seems intuitive that release would be

STORE /* w */

FENCE w,rw

And likewise acquire wants to order a load after any previous reads or writes have completed.

FENCE rw,r

LOAD /* r */

I suggest studying the latest ISA manual. I believe there are mappings for the various C11 memory model operations (load-acquire,store-release,sequentially-consistent), as well as IO, however it would be nice if the operations were simple and intuitive enough such that one could derive them using simple logic based on the ordering one wants.

I don’t have the ISA manual in front of me, so I will check to see if I have derived release and acquire correctly. I am a good candidate for the “dummy test” because I’m inclined to make mistakes.

I’m going to read the ISA manual section on this tomorrow and see if it passes the intuitiveness test for an experienced developer familiar with acquire release in the C11 memory model.

It would also be nice if we could emulate memory models in the simulators. Perhaps not in QEMU on a strongly ordered host machine, but it would be possible in a simulator that had a bus model with latency for multiple outstanding overlapped operations between harts and IO devices. Memory models certainly need to be made easier for “humans”

Please read the draft ISA manual from the git repo. I’m not sure if the version on the web has all of the latest memory model stuff

Thanks,

Michael

Sebastian Huber

unread,

Aug 2, 2018, 3:34:16 AM8/2/18

to RISC-V SW Dev, seb...@gmail.com

Am Mittwoch, 1. August 2018 14:49:41 UTC+2 schrieb Michael Clark:

On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:

Do we need something like the PowerPC eioio instruction here?

Yes, EIEIO - Enforce In-order Execution of IO. The ISA designers must have had a bit of a chuckle with that one.

Yes, this one, sorry for the typo.

The FENCE instruction can be used as an IO fence as per your last email. I’m not sure of the correct fence. I personally would need to do some digging.

I would have thought that you want to order predecessor writes (o) before successor reads (i), but then I’m not a memory module expert and could easily have it back to front.

while ((i = plic->claim) { /* in */
handle(i);
plic->claim = i; /* out */
asm volatile(“fence o, i” : : : “memory”);
}

Ok, good. I independently guessed this fence too:

https://git.rtems.org/rtems/commit/?id=3d11c1e2afee49470deec48c0cdd773ed0ac5191

It is a different fence compared to the fences used by Linux plic_chained_handle_irq() in drivers/irqchip/irq-riscv-plic.c.

Of course you can always use a sledgehammer:

FENCE IORW,IORW

Check with a memory model expert before you follow any of my suggestions. Weak memory models are hard even for experts and ideally this interface should be as intuitive as possible. The memory model is documented in the draft ISA manual:

- https://github.com/riscv/riscv-isa-manual

I have seen expert developers write release as both fence w,rw and fence rw,w on this mailing list in the archives so it’s clearly not easy even for very experienced developers.

I have this question (ISA manual not in front of me right now). Given a release wants to order a store before all other accesses, it seems intuitive that release would be

STORE /* w */
FENCE w,rw

And likewise acquire wants to order a load after any previous reads or writes have completed.

FENCE rw,r
LOAD /* r */

I suggest studying the latest ISA manual. I believe there are mappings for the various C11 memory model operations (load-acquire,store-release,sequentially-consistent), as well as IO, however it would be nice if the operations were simple and intuitive enough such that one could derive them using simple logic based on the ordering one wants.

I don’t have the ISA manual in front of me, so I will check to see if I have derived release and acquire correctly. I am a good candidate for the “dummy test” because I’m inclined to make mistakes.

I’m going to read the ISA manual section on this tomorrow and see if it passes the intuitiveness test for an experienced developer familiar with acquire release in the C11 memory model.

It would also be nice if we could emulate memory models in the simulators. Perhaps not in QEMU on a strongly ordered host machine, but it would be possible in a simulator that had a bus model with latency for multiple outstanding overlapped operations between harts and IO devices. Memory models certainly need to be made easier for “humans”

Please read the draft ISA manual from the git repo. I’m not sure if the version on the web has all of the latest memory model stuff

For high level synchronization I hope that the compiler is correct. I can do test runs on different targets, e.g. ARM and PowerPC. This makes the situation a bit easier compared to I/O memory access. This is clearly RISC-V specific.

Michael Clark

unread,

Aug 2, 2018, 7:57:50 AM8/2/18

to Sebastian Huber, RISC-V SW Dev

On 2/08/2018, at 7:34 PM, Sebastian Huber <seb...@gmail.com> wrote:

Am Mittwoch, 1. August 2018 14:49:41 UTC+2 schrieb Michael Clark:

On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:

Do we need something like the PowerPC eioio instruction here?

Yes, EIEIO - Enforce In-order Execution of IO. The ISA designers must have had a bit of a chuckle with that one.

Yes, this one, sorry for the typo.

The FENCE instruction can be used as an IO fence as per your last email. I’m not sure of the correct fence. I personally would need to do some digging.

I would have thought that you want to order predecessor writes (o) before successor reads (i), but then I’m not a memory module expert and could easily have it back to front.

while ((i = plic->claim) { /* in */
handle(i);
plic->claim = i; /* out */
asm volatile(“fence o, i” : : : “memory”);
}

Ok, good. I independently guessed this fence too:

https://git.rtems.org/rtems/commit/?id=3d11c1e2afee49470deec48c0cdd773ed0ac5191

I would suggest someone familiar with memory models reviews this commit.

I am only familiar with C11 atomics. The C11 intrinsics have been created by memory model experts.

It is a different fence compared to the fences used by Linux plic_chained_handle_irq() in drivers/irqchip/irq-riscv-plic.c.

There may be some nuance that I am missing but intuitively, in this case, one wants to order the preceding io write (o) to claim before the successive io read (i) to claim.

Also note that IO ordering may not be enough if the device has any post write latency. Some devices also require specified delays.

There is work to verify the Linux memory model. I am not sure if it is included in the current set of intrinsics. This is very easy to get wrong even for competent programmers.

Of course you can always use a sledgehammer:

FENCE IORW,IORW

Check with a memory model expert before you follow any of my suggestions. Weak memory models are hard even for experts and ideally this interface should be as intuitive as possible. The memory model is documented in the draft ISA manual:

- https://github.com/riscv/riscv-isa-manual

I have seen expert developers write release as both fence w,rw and fence rw,w on this mailing list in the archives so it’s clearly not easy even for very experienced developers.

I subsequently checked the ISA manual and I was wrong. One has to consider the point of view of other threads in acquire and release. The ordering isn’t with respect to loads and stores in the same thread. This is an area I need to study. It is also a relatively new section in the specification.

I have this question (ISA manual not in front of me right now). Given a release wants to order a store before all other accesses, it seems intuitive that release would be

STORE /* w */
FENCE w,rw

From the ISA Manual:

Store-Release fence rw,w; sfb|h|w|dg

So the correct sequence is to ask for all reads and writes to have completed before the successive store. It’s a release so the fence must go before the store as it is ordering writes with respect to other threads, i.e. the preceding loads or store before the store is seen.

This is not easy as I mentioned. One needs to refer to the spec unless one is sure as I caveated.

And likewise acquire wants to order a load after any previous reads or writes have completed.

FENCE rw,r
LOAD /* r */

From the ISA manual:

Load-Acquire fence rw, rw; lfb|h|w|dg; fence r,rw

So here there is a full barrier, a load, and a fence to ensure the load has completed for successive reads and writes.

I’m glad there are memory model experts involved. This is a new area of the specification which I have not studied yet. I am getting store release and load acquire around the wrong way because I am not viewing the memory ordering with respect to other threads.

I suggest studying the latest ISA manual. I believe there are mappings for the various C11 memory model operations (load-acquire,store-release,sequentially-consistent), as well as IO, however it would be nice if the operations were simple and intuitive enough such that one could derive them using simple logic based on the ordering one wants.

I don’t have the ISA manual in front of me, so I will check to see if I have derived release and acquire correctly. I am a good candidate for the “dummy test” because I’m inclined to make mistakes.

I’m going to read the ISA manual section on this tomorrow and see if it passes the intuitiveness test for an experienced developer familiar with acquire release in the C11 memory model.

It would also be nice if we could emulate memory models in the simulators. Perhaps not in QEMU on a strongly ordered host machine, but it would be possible in a simulator that had a bus model with latency for multiple outstanding overlapped operations between harts and IO devices. Memory models certainly need to be made easier for “humans”

Please read the draft ISA manual from the git repo. I’m not sure if the version on the web has all of the latest memory model stuff

For high level synchronization I hope that the compiler is correct. I can do test runs on different targets, e.g. ARM and PowerPC. This makes the situation a bit easier compared to I/O memory access. This is clearly RISC-V specific.

--

You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.

To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/7492ec21-57b9-4ec0-ac0b-c3a832e09123%40groups.riscv.org.

Sebastian Huber

unread,

Aug 3, 2018, 8:51:15 AM8/3/18

to RISC-V SW Dev, seb...@gmail.com

Am Donnerstag, 2. August 2018 13:57:50 UTC+2 schrieb Michael Clark:

On 2/08/2018, at 7:34 PM, Sebastian Huber <seb...@gmail.com> wrote:

Am Mittwoch, 1. August 2018 14:49:41 UTC+2 schrieb Michael Clark:

On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:

Do we need something like the PowerPC eioio instruction here?

Yes, EIEIO - Enforce In-order Execution of IO. The ISA designers must have had a bit of a chuckle with that one.

Yes, this one, sorry for the typo.

The FENCE instruction can be used as an IO fence as per your last email. I’m not sure of the correct fence. I personally would need to do some digging.

I would have thought that you want to order predecessor writes (o) before successor reads (i), but then I’m not a memory module expert and could easily have it back to front.

while ((i = plic->claim) { /* in */
handle(i);
plic->claim = i; /* out */
asm volatile(“fence o, i” : : : “memory”);
}

Ok, good. I independently guessed this fence too:

https://git.rtems.org/rtems/commit/?id=3d11c1e2afee49470deec48c0cdd773ed0ac5191

I would suggest someone familiar with memory models reviews this commit.

I am only familiar with C11 atomics. The C11 intrinsics have been created by memory model experts.

It is a different fence compared to the fences used by Linux plic_chained_handle_irq() in drivers/irqchip/irq-riscv-plic.c.

There may be some nuance that I am missing but intuitively, in this case, one wants to order the preceding io write (o) to claim before the successive io read (i) to claim.

Also note that IO ordering may not be enough if the device has any post write latency. Some devices also require specified delays.

In Linux you have (drivers/irqchip/irq-riscv-plic.c):

static inline
u32 plic_claim(struct plic_data *data, int contextid)
{
    return readl(plic_hart_claim(data, contextid));
}

static inline
void plic_complete(struct plic_data *data, int contextid, u32 claim)
{
    writel(claim, plic_hart_claim(data, contextid));
}

Which uses (arch/riscv/include/asm/io.h):

/*
 * I/O memory access primitives. Reads are ordered relative to any
 * following Normal memory access. Writes are ordered relative to any prior
 * Normal memory access.  The memory barriers here are necessary as RISC-V
 * doesn't define any ordering between the memory space and the I/O space.
 */
#define __io_br()    do {} while (0)
#define __io_ar()    __asm__ __volatile__ ("fence i,r" : : : "memory");
#define __io_bw()    __asm__ __volatile__ ("fence w,o" : : : "memory");
#define __io_aw()    do {} while (0)

#define readb(c)    ({ u8  __v; __io_br(); __v = readb_cpu(c); __io_ar(); __v; })
#define readw(c)    ({ u16 __v; __io_br(); __v = readw_cpu(c); __io_ar(); __v; })
#define readl(c)    ({ u32 __v; __io_br(); __v = readl_cpu(c); __io_ar(); __v; })

#define writeb(v,c)    ({ __io_bw(); writeb_cpu((v),(c)); __io_aw(); })
#define writew(v,c)    ({ __io_bw(); writew_cpu((v),(c)); __io_aw(); })
#define writel(v,c)    ({ __io_bw(); writel_cpu((v),(c)); __io_aw(); })

If I didn't completely got this wrong, then the two fences __io_ar() and __io_bw() order only loads/stores between I/O memory and normal memory, but not between I/O memory and I/O memory.

Michael Clark

unread,

Aug 3, 2018, 9:02:32 PM8/3/18

to Sebastian Huber, RISC-V SW Dev

Interesting. Then “fence o,i” might be correct given in this scenario we do not need to consider the fence with respect to another thread (as for acquire or release) i.e. the ordering is local to one hart and the device.

I would really like to get my head around the new RISC-V memory (in the working draft of the spec in its github repo). Not impossible. It would require a not insubstantial amount of study to gain a robust and durable understanding.

The worst mistake anyone can make is to be certain. The easiest way to do this is to always doubt. It seems like the best approach is to seek guidance from someone who is more or less certain.

https://github.com/daniellustig/riscv-memory-model

Tadas Aleksonis

unread,

Sep 18, 2018, 5:01:28 PM9/18/18

to RISC-V SW Dev, seb...@gmail.com

Hey guys, I know I'm chiming in here pretty late, but I've been experiencing a similar issue. I'm running Linux on a RISC-V system on an FPGA and have several external interrupts hooked up to the PLIC. However, I'm always spinning in plic_chained_handle_irq,and I think it's because, for reasons I'm not sure how to confirm or deny, the complete doesn't behave properly and then the claim reclaims the same interrupt it just serviced (to clarify: I know it's the same interrupt line, but I don't know if the behavior is proper).

This doesn't happen on interrupts such as UART or timers, just currently some I2C modules that keep a TX_FIFO_READY interrupt line high. The one key difference between your scenario and mine is that memory fences don't change the outcome for my system.

If I should just start a new thread and link to this as an example of a similar bug, or if there's a better place to ask, let me know. Thanks for the time/attention.

Reply all

Reply to author

Forward