volatile RISCV_PLIC_hart_regs *plic_hart_regs;
uint32_t interrupt_index;
plic_hart_regs = cpu_self->cpu_per_cpu.plic_hart_regs;
while ((interrupt_index = plic_hart_regs->claim_complete) != 0) {
bsp_interrupt_handler_dispatch(
RISCV_INTERRUPT_VECTOR_EXTERNAL(interrupt_index)
);
plic_hart_regs->claim_complete = interrupt_index;
}
plic_hart_regs->claim_complete and the immediate read from plic_hart_regs->claim_complete.
__asm__ volatile ("fence i,r" : : : "memory");
or a couple of nops after theplic_hart_regs->claim_complete = interrupt_index;
then it works. The fence is just a guess after looking at the Linux arch/riscv/include/asm/io.h.
Is there some documentation available which explains this behaviour? In The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 2.3-draft, Chapter 6, RVWMO Memory Consistency
Model, Version 0.1 is this:
This chapter defines the memory model for regular main memory operations. The interaction
of the memory model with I/O memory, instruction fetches, FENCE.I, page table walks, and
SFENCE.VMA is not (yet) formalized. Some or all of the above may be formalized in a future
revision of this specification.
On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:
Do we need something like the PowerPC eioio instruction here?
On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:Do we need something like the PowerPC eioio instruction here?
Yes, EIEIO - Enforce In-order Execution of IO. The ISA designers must have had a bit of a chuckle with that one.
The FENCE instruction can be used as an IO fence as per your last email. I’m not sure of the correct fence. I personally would need to do some digging.
I would have thought that you want to order predecessor writes (o) before successor reads (i), but then I’m not a memory module expert and could easily have it back to front.
while ((i = plic->claim) { /* in */
handle(i);
plic->claim = i; /* out */
asm volatile(“fence o, i” : : : “memory”);
}
Of course you can always use a sledgehammer:
FENCE IORW,IORW
Check with a memory model expert before you follow any of my suggestions. Weak memory models are hard even for experts and ideally this interface should be as intuitive as possible. The memory model is documented in the draft ISA manual:I have seen expert developers write release as both fence w,rw and fence rw,w on this mailing list in the archives so it’s clearly not easy even for very experienced developers.I have this question (ISA manual not in front of me right now). Given a release wants to order a store before all other accesses, it seems intuitive that release would beSTORE /* w */FENCE w,rwAnd likewise acquire wants to order a load after any previous reads or writes have completed.FENCE rw,rLOAD /* r */I suggest studying the latest ISA manual. I believe there are mappings for the various C11 memory model operations (load-acquire,store-release,sequentially-consistent), as well as IO, however it would be nice if the operations were simple and intuitive enough such that one could derive them using simple logic based on the ordering one wants.I don’t have the ISA manual in front of me, so I will check to see if I have derived release and acquire correctly. I am a good candidate for the “dummy test” because I’m inclined to make mistakes.I’m going to read the ISA manual section on this tomorrow and see if it passes the intuitiveness test for an experienced developer familiar with acquire release in the C11 memory model.It would also be nice if we could emulate memory models in the simulators. Perhaps not in QEMU on a strongly ordered host machine, but it would be possible in a simulator that had a bus model with latency for multiple outstanding overlapped operations between harts and IO devices. Memory models certainly need to be made easier for “humans”Please read the draft ISA manual from the git repo. I’m not sure if the version on the web has all of the latest memory model stuff
Am Mittwoch, 1. August 2018 14:49:41 UTC+2 schrieb Michael Clark:On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:Do we need something like the PowerPC eioio instruction here?
Yes, EIEIO - Enforce In-order Execution of IO. The ISA designers must have had a bit of a chuckle with that one.Yes, this one, sorry for the typo.
The FENCE instruction can be used as an IO fence as per your last email. I’m not sure of the correct fence. I personally would need to do some digging.
I would have thought that you want to order predecessor writes (o) before successor reads (i), but then I’m not a memory module expert and could easily have it back to front.
while ((i = plic->claim) { /* in */
handle(i);
plic->claim = i; /* out */
asm volatile(“fence o, i” : : : “memory”);
}Ok, good. I independently guessed this fence too:
It is a different fence compared to the fences used by Linux plic_chained_handle_irq() in drivers/irqchip/irq-riscv-plic.c.
Of course you can always use a sledgehammer:
FENCE IORW,IORW
Check with a memory model expert before you follow any of my suggestions. Weak memory models are hard even for experts and ideally this interface should be as intuitive as possible. The memory model is documented in the draft ISA manual:I have seen expert developers write release as both fence w,rw and fence rw,w on this mailing list in the archives so it’s clearly not easy even for very experienced developers.
I have this question (ISA manual not in front of me right now). Given a release wants to order a store before all other accesses, it seems intuitive that release would beSTORE /* w */FENCE w,rw
And likewise acquire wants to order a load after any previous reads or writes have completed.FENCE rw,rLOAD /* r */
I suggest studying the latest ISA manual. I believe there are mappings for the various C11 memory model operations (load-acquire,store-release,sequentially-consistent), as well as IO, however it would be nice if the operations were simple and intuitive enough such that one could derive them using simple logic based on the ordering one wants.I don’t have the ISA manual in front of me, so I will check to see if I have derived release and acquire correctly. I am a good candidate for the “dummy test” because I’m inclined to make mistakes.I’m going to read the ISA manual section on this tomorrow and see if it passes the intuitiveness test for an experienced developer familiar with acquire release in the C11 memory model.It would also be nice if we could emulate memory models in the simulators. Perhaps not in QEMU on a strongly ordered host machine, but it would be possible in a simulator that had a bus model with latency for multiple outstanding overlapped operations between harts and IO devices. Memory models certainly need to be made easier for “humans”Please read the draft ISA manual from the git repo. I’m not sure if the version on the web has all of the latest memory model stuffFor high level synchronization I hope that the compiler is correct. I can do test runs on different targets, e.g. ARM and PowerPC. This makes the situation a bit easier compared to I/O memory access. This is clearly RISC-V specific.
--
You received this message because you are subscribed to the Google Groups "RISC-V SW Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sw-dev+un...@groups.riscv.org.
To post to this group, send email to sw-...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/sw-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/sw-dev/7492ec21-57b9-4ec0-ac0b-c3a832e09123%40groups.riscv.org.
Am Mittwoch, 1. August 2018 14:49:41 UTC+2 schrieb Michael Clark:On 1/08/2018, at 10:05 PM, Sebastian Huber <seb...@gmail.com> wrote:Do we need something like the PowerPC eioio instruction here?
Yes, EIEIO - Enforce In-order Execution of IO. The ISA designers must have had a bit of a chuckle with that one.Yes, this one, sorry for the typo.
The FENCE instruction can be used as an IO fence as per your last email. I’m not sure of the correct fence. I personally would need to do some digging.
I would have thought that you want to order predecessor writes (o) before successor reads (i), but then I’m not a memory module expert and could easily have it back to front.
while ((i = plic->claim) { /* in */
handle(i);
plic->claim = i; /* out */
asm volatile(“fence o, i” : : : “memory”);
}Ok, good. I independently guessed this fence too:I would suggest someone familiar with memory models reviews this commit.I am only familiar with C11 atomics. The C11 intrinsics have been created by memory model experts.It is a different fence compared to the fences used by Linux plic_chained_handle_irq() in drivers/irqchip/irq-riscv-plic.c.There may be some nuance that I am missing but intuitively, in this case, one wants to order the preceding io write (o) to claim before the successive io read (i) to claim.Also note that IO ordering may not be enough if the device has any post write latency. Some devices also require specified delays.
static inline
u32 plic_claim(struct plic_data *data, int contextid)
{
return readl(plic_hart_claim(data, contextid));
}
static inline
void plic_complete(struct plic_data *data, int contextid, u32 claim)
{
writel(claim, plic_hart_claim(data, contextid));
}
/*
* I/O memory access primitives. Reads are ordered relative to any
* following Normal memory access. Writes are ordered relative to any prior
* Normal memory access. The memory barriers here are necessary as RISC-V
* doesn't define any ordering between the memory space and the I/O space.
*/
#define __io_br() do {} while (0)
#define __io_ar() __asm__ __volatile__ ("fence i,r" : : : "memory");
#define __io_bw() __asm__ __volatile__ ("fence w,o" : : : "memory");
#define __io_aw() do {} while (0)
#define readb(c) ({ u8 __v; __io_br(); __v = readb_cpu(c); __io_ar(); __v; })
#define readw(c) ({ u16 __v; __io_br(); __v = readw_cpu(c); __io_ar(); __v; })
#define readl(c) ({ u32 __v; __io_br(); __v = readl_cpu(c); __io_ar(); __v; })
#define writeb(v,c) ({ __io_bw(); writeb_cpu((v),(c)); __io_aw(); })
#define writew(v,c) ({ __io_bw(); writew_cpu((v),(c)); __io_aw(); })
#define writel(v,c) ({ __io_bw(); writel_cpu((v),(c)); __io_aw(); })