C.add16isp ((N +3)>>2) # i.e. addi sp sp ((N +3)>>2 <<4)
jalr t0 zero Csavew - (N<<1)
where Csavew is the bit of milicode
Csavew-16 C.swsp t2 (+32)
Csavew-14 C.swsp t1 (+28)
Csavew-12 C.swsp a5 (+24)
Csavew-10 C.swsp a4 (+20)
Csavew -8 C.swsp a3 (+16)
Csavew -6 C.swsp a2 (+12)
Csavew -4 C.swsp a1 (+8)
Csavew -2 C.swsp a0 (+4)
Csavew: C.swsp ra (0)
j t0
regards,
Liviu
jalr t0 zero Csavew - (N<<1)
addi sp sp (((N +1)>>1) << 3)
jalr t0 zero Csavew - (N<<1)
regards,
Liviu
The problem with shadow registers is that you always run out and you still need to spill to main memory.
For an RVE implementation, which reduces the RF in half to save gates, it would be weird to double the memory now, just to implement a shadow register.
Richard
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAG7hfcJRmvBgfMCg2rtnAh-3xfoahRiF225Yvu9k%3DBbNjy_xDA%40mail.gmail.com.
On 15 March 2018 at 10:55:11, Richard Herveille
(richard....@roalogic.com) wrote:
> The problem with shadow registers is that you always run out and you still need to spill
> to main memory.
you run out when the interrupt nesting gets deeper than the available
register banks; in most cases, the depth is 1, rarely 2, even rarely
3, and so on.
spilling can be done in parallel, while starting the handler.
but this method reveals a possible latency problem: if a high priority
interrupt occurs right after a series of other interrupts, and there
are no more register banks, it must wait for a previous spill to
complete, to free a register bank, leading to a jitter on the high
priority interrupt latency.
most applications tolerate a small jitter, but for applications that
implement control loops we might need a way to disable this mechanism
and provide constant latency (even if it is slightly higher).
Cortex-M0 has such a configuration bit to prevent jitter.
> For an RVE implementation, which reduces the RF in half to save gates, it would be weird
> to double the memory now, just to implement a shadow register.
yes, this mechanism is not cheap. however, as Jacob suggested, only
the ABI caller registers need to be shadowed/spilled, so, with a
lighter EABI, the extra cost may be kept to a minimum.
regards,
Liviu
> On Mar 15, 2018, at 16:48 , kr...@berkeley.edu wrote:
> I'll shortly be sending out an invite to a new Foundation Task Group
> we have formed to address adding fast interrupts to RISC-V.
>
> Germane to this thread, one feature of the proposal under development
> is to standardize interrupt attribute annotations so C compilers can
> generate interrupt handlers that only save registers as needed. This
> effectively changes the calling conventions just for the handlers but
> leaves the rest of the ABI unchanged.
>
> /* Not real code, just a sketch. */
> extern volatile int *DEVICE;
> extern volatile int *COUNT;
>
> void __attribute__ ((interrupt))
> foo() {
> *DEVICE = 0;
> *COUNT++;
> }
>
> A rough sketch of what a generated handler looks like is:
>
> # Small ISR that pokes device to clear interrupt, and increments in-memory counter.
>
> .align 3 # Has to be 8-byte aligned.
> foo:
> addi sp, sp, -16 # Create a frame on stack.
If the ABI had included a stack "red zone" with a small reservation for interrupts,
then the two "addi sp, " instructions could have been avoided in most cases.
> sw s0, 0(sp) # Save working register.
Presumedly you meant to load s0 with a global pointer?
> sw x0, DEVICE, s0 # Clear interrupt flag.
> sw s1, 4(sp) # Save working register.
> la s0, COUNT # Get counter address.
> li s1, 1
> amoadd.w x0, (s0), s1 # Increment counter in memory.
> lw s1, 4(sp) # Restore registers.
> lw s0, 0(sp)
> addi sp, sp, 16 # Free stack frame.
> mret # Return from handler using saved mepc.
Tommy
>
> This change will be useful even with existing interrupt architecture,
> but TG will be looking at a new design that supports nested
> interrupts. Our initial studies show a small core can take interrupt,
> enter, execute, and exit the handler above in less than 20 cycles,
> while supporting preemption on any clock cycle (i.e., only a few cycles ~3
> to get to first instruction).
>
> Krste
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/995D7241-2798-4754-B2D0-866B910C4B02%40esperantotech.com.
On 16 March 2018 at 20:58:13, Samuel Falvo II (sam....@gmail.com) wrote:
sure.
> On Fri, Mar 16, 2018 at 1:46 AM, Liviu Ionescu wrote:
> > yes, two-tiered interrupt processing is the ideal textbook solution,
> > but few RTOSes/applications do it.
>
> Citation needed?
the venerable eCos calls them ISRs and DSRs; the µC/OS-III calls them
direct and deferred interrupts; FreeRTOS has an optional Deferred
Interrupt Handling.
any proposals that come with use cases that show they are simpler to
use, and performance estimates that show better latency or better
performance in general, are welcomed.
On Mar 18, 2018, at 7:59 PM, Jacob Bachmeyer <jcb6...@gmail.com> wrote:
Microcontrollers can have complex memory systems, particularly when
placed inside larger SoCs and/or when using various power control
tricks, even without considering caches, which are also reasonably
common for microcontrollers. Some microcontrollers have both L1 I &
L2 D caches, and multiple local L1 SRAMs, and shared L2 cache, and
multiple shared L2 SRAMs.
This is where we start to encounter a dividing line between what is a proper application for a microcontroller and what is a proper application for a general processor. I expect that most applications using such complex SoCs should actually be using the general profile; the microcontroller profile I envision is intended for smaller systems, or as a component in a larger system, perhaps as one of the "smart devices" you mentioned being used with application processors, probably even embedded inside one of those SoCs as a peripheral to an application processor.
On 19/03/2018, 03:29, "Jacob Bachmeyer" <jcb6...@gmail.com> wrote:
On 18 March 2018 at 04:10:32, Jacob Bachmeyer (jcb6...@gmail.com) wrote:
The trick is that hardware can spill the oldest shadow set in
the
background while running the ISR. All you need are four instructions
in
the ISR that do not access memory.
four? I tink this number depends on the ABI, which was not yet decided.
My proposal includes an eABI that declares all registers except for link
registers (x1, x5) and return values (x10, x11) to be callee-saved.
These caller-saved registers are included in shadow sets.
For simplicity purposes, if you go with shadow registers, then the entire register file should swap.
The ARM7 and ARM9 have partial shadow registers. That allowed fast arguments in and out of routines, but is a hardware nightmare.
In an FPGA implementation that would, most likely, mean splitting the RF into multiple block RAMS or a combination of block RAMs and flip flops. So, to keep this simple, you’ll need a full set of shadow registers. How then do you pass arguments?
Spilling Shadow Registers in the background means that a separate statemachine needs to keep track of the shadow-register-stack. What happens if the memory system can’t keep up? You’ll need to stall the CPU. This, and the actual spilling, leads to different interrupt latencies and non-deterministic behavior.
If you don’t want the extra gates for the statemachine, then SW controls the spilling and the advantages seem void.
For small micros area is a main concern. Adding shadow registers adds quite a bit of logic (maybe not in an FPGA, but definitely in an ASIC). The method you describe (as I understand it), requires additional hardware to handle the push/pop of the shadow registers into main memory.
Taken all of that into account, I am not in favour of it.
Again, SPARC had a big set of full shadow registers and they still spilled.
Richard
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/5AAF206A.8030402%40gmail.com.
Again, there are more technologies than Linux and FPGAs.
For FPGAs that do provide block RAMs, the smallest size would be enough to hold 8 full 32bit shadow registers (so 8x 32x 32bits)
So, to keep this simple, you’ll need a full set of shadow registers.
How then do you pass arguments?
For traps, you do not pass arguments; the hardware does. Shadow
registers are used only for trap handlers; normal function calls use the
existing convention with nearly all registers callee-saved.
Being pedantic … interrupts. Traps would be software exceptions and they typically do pass arguments.
Spilling Shadow Registers in the background means that a separate
statemachine needs to keep track of the shadow-register-stack. What
happens if the memory system can’t keep up? You’ll need to stall the
CPU. This, and the actual spilling, leads to different interrupt
latencies and non-deterministic behavior.
Ideally, the register spill is performed during an otherwise unavoidable
latency period, such as the pipeline refill latency in a five-stage
pipeline.
We’re talking (tiny) microcontrollers here. I doubt most would have a 5 stage pipeline.
If you don’t want the extra gates for the statemachine, then SW
controls the spilling and the advantages seem void.
They would be void and there would be no reason for shadow registers.
Software register spill is one of Liviu Ionescu's big complaints about
the general profile.
Yes, hence his proposal to reduce the amount of registers that must be pushed/poped, and the suggested ‘movem’ instruction.
For small micros area is a main concern. Adding shadow registers adds
quite a bit of logic (maybe not in an FPGA, but definitely in an
ASIC). The method you describe (as I understand it), requires
additional hardware to handle the push/pop of the shadow registers
into main memory.
The additional hardware is fairly small, as it drives the MEM stage
while the ISR instructions are moving into the pipeline. The shadow
sets themselves are presumed to be "spilled" into an internal (4*XLEN)xN
SRAM first; the oldest row from this SRAM is split up into 4 XLEN-bit
pieces and spilled to the stack in the background. Note that only the
top-most shadow registers are accessible to software in my proposal.
We’re talking tiny microcontrollers (nanocontroller they have been called).
The additional hardware is not fairly small.
There might not be a pipeline.
Your proposal adds memory for the shadow registers. You also seem to suggest only parts of the RF are shadowed (like the older ARMs), which required breaking the RF into sections or requiring additional address decoding.
Now your adding an additional internal SRAM. This seems overly complex; shadow registers->internal memory->external memory. Why not simply reduce the number of registers to push/pop. For speed purposes the stack would be in internal memory.
Taken all of that into account, I am not in favour of it.
Again, SPARC had a big set of full shadow registers and they still
spilled.
Will you suggest a better idea?
Reduce the number of registers that must be saved.
Either use regular load/store (only a few registers, stack in internal memory) or provide a ‘movem’. The disadvantage of a ‘movem’ is that it’s a big instruction; meaning it requires many bits. Full flexibility requires 31bits to specify which registers to move (x0 is always zero). So that’s not possible. For RVE there would only be 15 registers to specify. But that means there won’t be an RVC version of movem. You could further reduce the registers that can be moved, but at what point does it still make sense to add this instruction?
If the overhead can be reduced to 4 registers that must be saved, then I suggest simply using regular load/store instructions, for which there are RVC variants.
Richard
-- Jacob
On 20/03/2018, 05:49, "Allen J. Baum" <allen...@esperantotech.com> wrote:
Taken all of that into account, I am not in favour of it.
Again, SPARC had a big set of full shadow registers and they still spilled.
Note that the application of the shadow register sets here is for fast interrupt response. Sparc used their register windows for gneral function calling, whihc is much more frequent, so you can't really use that argument to say it won't work.
Fair enough. But the argument that there are never enough is still valid.
Both ARM and Intel (in their late-lamented i960) have or had shadow register sets for fast interupt handling.
ARM had a partial shadow register, which could be used for both.
How many shadow registers do you want to support? At some point it is not enough and you still need to spill. So just always spill, but make that fast and predictable.
It all costs area (and hence cost), which in this market segment is a driving factor, if not THE driving factor.
Richard