CSR instructions suggestion

602 views
Skip to first unread message

Hugo Décharnes

unread,
Apr 17, 2017, 8:31:39 AM4/17/17
to RISC-V ISA Dev
Dear all,

Please consider adding move to and from CSR instructions in place of current read-modify-write, as the formers would better fit into existing pipelines, more specifically out-of-order ones. Current instructions require either new register read and write capabilities at commitment or execution serialization, until they're not speculative anymore. This is the consequence of not renaming CSR, which would be unacceptable for the minor benefits… as the hardware cost of the first solution and the performance of the second. Breaking them into two distinct instructions (move from CSR, move to CSR) would better fit in existing designs.

Best regards,
Hugo

Michael Chapman

unread,
Apr 17, 2017, 11:33:31 AM4/17/17
to Hugo Décharnes, RISC-V ISA Dev

The instructions you want already exist.:-

1) CSRRS and CSRRC with rs1 == 0 for moving CSR to a Xreg (rd),

2) CSRRW with rd == 0 for moving Xreg to a CSR (rs1)

Check the details of the specification.

1) There is no write operation at all (different from the case where rs1
is not X0 and the register holds a zero value)

2) There is no read operation at all

Hugo Décharnes

unread,
Apr 17, 2017, 11:58:28 AM4/17/17
to RISC-V ISA Dev
I think you missed the point. My suggestion was to remove existing read-modify-write instructions from the ISA.

Alex Elsayed

unread,
Apr 17, 2017, 12:01:07 PM4/17/17
to RISC-V ISA Dev
That cannot be done, as the base ISA has been (publicly!) frozen for months now. A completely incompatible change like that is, thus, unreasonable.

On Apr 17, 2017 08:58, "Hugo Décharnes" <hdec...@outlook.fr> wrote:
I think you missed the point. My suggestion was to remove existing read-modify-write instructions from the ISA.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b404673c-7114-4595-a6d1-dbe7a1e8fe24%40groups.riscv.org.

Corey Richardson

unread,
Apr 17, 2017, 12:05:56 PM4/17/17
to isa...@groups.riscv.org
Note that it's possible for your core to trap into M-mode for full RMW
instructions.

On 04/17/2017 12:01 PM, Alex Elsayed wrote:
> That cannot be done, as the base ISA has been (publicly!) frozen for
> months now. A completely incompatible change like that is, thus,
> unreasonable.
>
> On Apr 17, 2017 08:58, "Hugo Décharnes" <hdec...@outlook.fr
> <mailto:hdec...@outlook.fr>> wrote:
>
> I think you missed the point. My suggestion was to remove existing
> read-modify-write instructions from the ISA.
>
> --
> You received this message because you are subscribed to the Google
> Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to isa-dev+u...@groups.riscv.org
> <mailto:isa-dev%2Bunsu...@groups.riscv.org>.
> To post to this group, send email to isa...@groups.riscv.org
> <mailto:isa...@groups.riscv.org>.
> <https://groups.google.com/a/groups.riscv.org/group/isa-dev/>.
> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b404673c-7114-4595-a6d1-dbe7a1e8fe24%40groups.riscv.org>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to isa-dev+u...@groups.riscv.org
> <mailto:isa-dev+u...@groups.riscv.org>.
> To post to this group, send email to isa...@groups.riscv.org
> <mailto:isa...@groups.riscv.org>.
> Visit this group at
> https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2Bfp8xbEKbay7%3DMOeQt_UbirKHgiNnbxp4xVue4_ns3VLjT1A%40mail.gmail.com
> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CA%2B%2Bfp8xbEKbay7%3DMOeQt_UbirKHgiNnbxp4xVue4_ns3VLjT1A%40mail.gmail.com?utm_medium=email&utm_source=footer>.

--
cmr
http://octayn.net/
+16038524272

signature.asc

Krste Asanovic

unread,
Apr 17, 2017, 12:22:10 PM4/17/17
to Hugo Décharnes, RISC-V ISA Dev
Atomicity of RMW is an important property of these instructions.

Separate R, M, W instructions cannot replace this functionality.

Krste

> On Apr 17, 2017, at 8:58 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
>
> I think you missed the point. My suggestion was to remove existing read-modify-write instructions from the ISA.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/b404673c-7114-4595-a6d1-dbe7a1e8fe24%40groups.riscv.org.

Hugo Décharnes

unread,
Apr 17, 2017, 12:22:16 PM4/17/17
to RISC-V ISA Dev
I don't know what you meant, but what I emphasized was the problematic atomic nature of the operations, preventing them from gently fitting into existing out-of-order designs. This is from a micro-architect point of view, not an ISA-architect one.

Hugo Décharnes

unread,
Apr 17, 2017, 12:27:26 PM4/17/17
to RISC-V ISA Dev
I understand your viewpoint, but from a hardware architect perspective, it really has serious implications, particularly in high-end processors I am familiar with.

Hugo Décharnes

unread,
Apr 17, 2017, 12:29:06 PM4/17/17
to RISC-V ISA Dev
If it's not possible to replace these instructions, there's no reason to hold this debate.

Krste Asanovic

unread,
Apr 17, 2017, 12:43:41 PM4/17/17
to Hugo Décharnes, RISC-V ISA Dev
The RISC-V privileged architecture (and a few other parts) are built around these instructions supporting atomic RMW, so isn’t going to change.

You already listed a few techniques that current microarchitectures use to handle such instructions (serialization, renaming), and I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions (where new status value partly depends on old status value) to be handled. Does your current ISA support privileged modes and not need any such control/status register updates?

Krste

> On Apr 17, 2017, at 9:22 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
>
> I don't know what you meant, but what I emphasized was the problematic atomic nature of the operations, preventing them from gently fitting into existing out-of-order designs. This is from a micro-architect point of view, not an ISA-architect one.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To post to this group, send email to isa...@groups.riscv.org.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/435f2889-e8fc-42e6-8b07-75ee939fe3be%40groups.riscv.org.

Matthew Hicks

unread,
Apr 17, 2017, 3:26:43 PM4/17/17
to Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev
"I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions": OR1K?  I can see why these types of instructions make sense (in the multiprocessor case), but they come at the cost of increasing hardware complexity.  One of the beautiful aspects of the core of RISC-V is that it errs towards simplicity of hardware implementation.  I think this is one (albeit irreversible) case where the ISA designers got it wrong.


---Matthew Hicks

On Mon, Apr 17, 2017 at 12:43 PM, Krste Asanovic <kr...@berkeley.edu> wrote:
The RISC-V privileged architecture (and a few other parts) are built around these instructions supporting atomic RMW, so isn’t going to change.

You already listed a few techniques that current microarchitectures use to handle such instructions (serialization, renaming), and I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions (where new status value partly depends on old status value) to be handled.  Does your current ISA support privileged modes and not need any such control/status register updates?

Krste

> On Apr 17, 2017, at 9:22 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
>
> I don't know what you meant, but what I emphasized was the problematic atomic nature of the operations, preventing them from gently fitting into existing out-of-order designs. This is from a micro-architect point of view, not an ISA-architect one.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

ron minnich

unread,
Apr 17, 2017, 3:43:28 PM4/17/17
to Matthew Hicks, Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev
given the kind of things CSRs do, and the infrequency with which they need to be done, I'm kind of lost on the need to make all this out-of-order stuff work? What's 
the benefit?

ron

On Mon, Apr 17, 2017 at 12:26 PM Matthew Hicks <mdh...@gmail.com> wrote:
"I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions": OR1K?  I can see why these types of instructions make sense (in the multiprocessor case), but they come at the cost of increasing hardware complexity.  One of the beautiful aspects of the core of RISC-V is that it errs towards simplicity of hardware implementation.  I think this is one (albeit irreversible) case where the ISA designers got it wrong.


---Matthew Hicks
On Mon, Apr 17, 2017 at 12:43 PM, Krste Asanovic <kr...@berkeley.edu> wrote:
The RISC-V privileged architecture (and a few other parts) are built around these instructions supporting atomic RMW, so isn’t going to change.

You already listed a few techniques that current microarchitectures use to handle such instructions (serialization, renaming), and I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions (where new status value partly depends on old status value) to be handled.  Does your current ISA support privileged modes and not need any such control/status register updates?

Krste

> On Apr 17, 2017, at 9:22 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
>
> I don't know what you meant, but what I emphasized was the problematic atomic nature of the operations, preventing them from gently fitting into existing out-of-order designs. This is from a micro-architect point of view, not an ISA-architect one.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

> To post to this group, send email to isa...@groups.riscv.org.
> Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/435f2889-e8fc-42e6-8b07-75ee939fe3be%40groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

Hugo Décharnes

unread,
Apr 17, 2017, 4:02:14 PM4/17/17
to RISC-V ISA Dev
IIRC ARMv8 has no read-modify-write instructions for special purpose registers; only instructions for reading or writing part or all of these registers (aside from unauthorized access).

Hugo Décharnes

unread,
Apr 17, 2017, 4:12:50 PM4/17/17
to RISC-V ISA Dev
Given today's high-end processors (such as Kaby Lake, Zen, Artemis…) prohibitive logic cost, serialization shall be avoided as far as possible… especially when there's an alternative.

Matthew Hicks

unread,
Apr 17, 2017, 4:21:28 PM4/17/17
to ron minnich, Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev
By your own logic, it would be a better ISA design decision to use the more simple move to/from CSR instructions than the existing more complex versions, because these are such a rare case during software execution.  Unfortunately, the gates required to implement the more complex (existing) instructions are there sucking power and taking up area during the entire execution.  To me, this is a pretty clear case where (R)educed is better than (C)omplex when it comes to ISA design.


---Matthew Hicks

On Mon, Apr 17, 2017 at 3:43 PM, ron minnich <rmin...@gmail.com> wrote:
given the kind of things CSRs do, and the infrequency with which they need to be done, I'm kind of lost on the need to make all this out-of-order stuff work? What's 
the benefit?

ron

On Mon, Apr 17, 2017 at 12:26 PM Matthew Hicks <mdh...@gmail.com> wrote:
"I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions": OR1K?  I can see why these types of instructions make sense (in the multiprocessor case), but they come at the cost of increasing hardware complexity.  One of the beautiful aspects of the core of RISC-V is that it errs towards simplicity of hardware implementation.  I think this is one (albeit irreversible) case where the ISA designers got it wrong.


---Matthew Hicks
On Mon, Apr 17, 2017 at 12:43 PM, Krste Asanovic <kr...@berkeley.edu> wrote:
The RISC-V privileged architecture (and a few other parts) are built around these instructions supporting atomic RMW, so isn’t going to change.

You already listed a few techniques that current microarchitectures use to handle such instructions (serialization, renaming), and I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions (where new status value partly depends on old status value) to be handled.  Does your current ISA support privileged modes and not need any such control/status register updates?

Krste

> On Apr 17, 2017, at 9:22 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
>
> I don't know what you meant, but what I emphasized was the problematic atomic nature of the operations, preventing them from gently fitting into existing out-of-order designs. This is from a micro-architect point of view, not an ISA-architect one.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe@groups.riscv.org.

ron minnich

unread,
Apr 17, 2017, 4:25:24 PM4/17/17
to Matthew Hicks, Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev
I'm not claiming I know. I just don't understand your argument. As you pointed out these instructions do make sense in the multiprocessor case. Should we assume that is the rule or the exception? I've been assuming it's the rule.

On Mon, Apr 17, 2017 at 1:21 PM Matthew Hicks <mdh...@gmail.com> wrote:
By your own logic, it would be a better ISA design decision to use the more simple move to/from CSR instructions than the existing more complex versions, because these are such a rare case during software execution.  Unfortunately, the gates required to implement the more complex (existing) instructions are there sucking power and taking up area during the entire execution.  To me, this is a pretty clear case where (R)educed is better than (C)omplex when it comes to ISA design.


---Matthew Hicks
On Mon, Apr 17, 2017 at 3:43 PM, ron minnich <rmin...@gmail.com> wrote:
given the kind of things CSRs do, and the infrequency with which they need to be done, I'm kind of lost on the need to make all this out-of-order stuff work? What's 
the benefit?

ron

On Mon, Apr 17, 2017 at 12:26 PM Matthew Hicks <mdh...@gmail.com> wrote:
"I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions": OR1K?  I can see why these types of instructions make sense (in the multiprocessor case), but they come at the cost of increasing hardware complexity.  One of the beautiful aspects of the core of RISC-V is that it errs towards simplicity of hardware implementation.  I think this is one (albeit irreversible) case where the ISA designers got it wrong.


---Matthew Hicks
On Mon, Apr 17, 2017 at 12:43 PM, Krste Asanovic <kr...@berkeley.edu> wrote:
The RISC-V privileged architecture (and a few other parts) are built around these instructions supporting atomic RMW, so isn’t going to change.

You already listed a few techniques that current microarchitectures use to handle such instructions (serialization, renaming), and I’m not aware of any ISA with a privileged architecture that doesn’t need some such instructions (where new status value partly depends on old status value) to be handled.  Does your current ISA support privileged modes and not need any such control/status register updates?

Krste

> On Apr 17, 2017, at 9:22 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
>
> I don't know what you meant, but what I emphasized was the problematic atomic nature of the operations, preventing them from gently fitting into existing out-of-order designs. This is from a micro-architect point of view, not an ISA-architect one.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.


--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.

To post to this group, send email to isa...@groups.riscv.org.
Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.

kr...@berkeley.edu

unread,
Apr 17, 2017, 5:37:40 PM4/17/17
to Matthew Hicks, Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev
>>>>> On Mon, 17 Apr 2017 15:26:41 -0400, Matthew Hicks <mdh...@gmail.com> said:

| "I’m not aware of any ISA with a privileged architecture that doesn’t need some
| such instructions": OR1K?

OR1K has add-with-carry which reads a bit from the status register,
then writes it.

It also has a special behavior of move-to-status-register for clearing
interrupt bits which is similar to the RISC-V read-and-clear-bit
instruction.

| I can see why these types of instructions make sense
| (in the multiprocessor case), but they come at the cost of increasing hardware
| complexity.

It really isn't much complexity given you have to handle status
registers in general, and you have to handle instructions that
atomically read a register and write a register.

|   One of the beautiful aspects of the core of RISC-V is that it errs
| towards simplicity of hardware implementation.  I think this is one (albeit
| irreversible) case where the ISA designers got it wrong.

We wanted to standardize the pattern rather than having every
operation on CSRs be potentially special-cased. You can specialize
the CSR logic to only support the operations needed for your set of
implemented CSRs, but we figure it's usually going to be easier to
just implement the one pipeline for everything.

Krste


| ---Matthew Hicks

| On Mon, Apr 17, 2017 at 12:43 PM, Krste Asanovic <kr...@berkeley.edu> wrote:

| The RISC-V privileged architecture (and a few other parts) are built around
| these instructions supporting atomic RMW, so isn’t going to change.

| You already listed a few techniques that current microarchitectures use to
| handle such instructions (serialization, renaming), and I’m not aware of
| any ISA with a privileged architecture that doesn’t need some such
| instructions (where new status value partly depends on old status value) to
| be handled.  Does your current ISA support privileged modes and not need
| any such control/status register updates?

| Krste

|| On Apr 17, 2017, at 9:22 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
||
|| I don't know what you meant, but what I emphasized was the problematic
| atomic nature of the operations, preventing them from gently fitting into
| existing out-of-order designs. This is from a micro-architect point of
| view, not an ISA-architect one.
||
|| --
|| You received this message because you are subscribed to the Google Groups
| "RISC-V ISA Dev" group.
|| To unsubscribe from this group and stop receiving emails from it, send an
| email to isa-dev+u...@groups.riscv.org.
|| To post to this group, send email to isa...@groups.riscv.org.
|| Visit this group at https://groups.google.com/a/groups.riscv.org/group/
| isa-dev/.
|| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/435f2889-e8fc-42e6-8b07-
| 75ee939fe3be%40groups.riscv.org.

| --
| You received this message because you are subscribed to the Google Groups
| "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an
| email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-
| dev/.
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/5ED7E319-51B2-4A99-A6EF-
| D4AEFAA49845%40berkeley.edu.


kr...@berkeley.edu

unread,
Apr 17, 2017, 7:07:21 PM4/17/17
to Hugo Décharnes, RISC-V ISA Dev

ARM v8 has bit set/clear in the special registers. It also has
arithmetic with condition codes, which also read/set status bits.

I'm guessing your real concern is that RISC-V CSR instructions can
both read the old value into a new architectural register and update
CSR with a new value, i.e., the real difference with other ISAs is
that there are two register writes per instruction (one to CSR and one
to GPR).

This could be an issue if both GPRs and CSRs were renamed into same
pool, as otherwise only a single new physical register is needed per
instruction. If you don't rename CSRs and instead serialize CSR
updates, they're no harder than handling CSRs in an out-of-order
engine for any other ISA (and similar to an in-order design).

Some pure storage CSRs (like *scratch with no side effects) could be
renamed and handled by treating as more architectural registers, with
the common case of pure swaps handled with rename table tricks, but
because we also allow these to be manipulated with side effects (i.e.,
bit set/clear) they need different handling, e..g, cracking the CSR
instruction into a read-write-GPR uop, then a read-modify-write-CSR
uop.

We could restrict supported CSR operations to just those required on
each register, but this would complicate the spec, and most likely the
implementation. We chose instead to standardize on a common pattern.

It's not clear a high-performance implementation would bother with any
CSR renaming, and I'm fairly sure all current OoO processors have to
handle some serializing instructions, so I don't see how there's a
real performance/complexity hit for the RISC-V scheme.

To avoid/instill confusion, there is a large design space for
instruction serialization, with very different performance penalties -
some of the x86 ones are unfathomably long. For the RISC-V CSR case,
in most cases the penalty could only be that the instruction after a
CSR mod is held in decode until the CSR mod commits.

Krste



>>>>> On Mon, 17 Apr 2017 13:02:14 -0700 (PDT), Hugo Décharnes <hdec...@outlook.fr> said:

| IIRC ARMv8 has no read-modify-write instructions for special purpose registers; only instructions for reading or writing part or all of these registers (aside from unauthorized access).
| --
| You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-dev/.
| To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/4b1d74f5-62ef-46d7-969e-a6022de0b53d%40groups.riscv.org.

Matthew Hicks

unread,
Apr 17, 2017, 7:22:46 PM4/17/17
to Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev
On Mon, Apr 17, 2017 at 5:37 PM, <kr...@berkeley.edu> wrote:
>>>>> On Mon, 17 Apr 2017 15:26:41 -0400, Matthew Hicks <mdh...@gmail.com> said:

| "I’m not aware of any ISA with a privileged architecture that doesn’t need some
| such instructions": OR1K?

OR1K has add-with-carry which reads a bit from the status register,
then writes it.

Ironically, RISC-V was smart enough to remove much of the carry-related complexity (add with carry included) that comes along with previous instruction sets.  This was a great decision as far as implementing hardware/ISA simulators is concerned.


It also has a special behavior of move-to-status-register for clearing
interrupt bits which is similar to the RISC-V read-and-clear-bit
instruction.

I don't know of the instructions you speak :).  MTSPR (move to special-purpose register) does not have the behavior that you are referring to and it is the only way that I know of to directly update SPRs.  From the OR1K specification "The contents of general-purpose register rB are moved into the special register defined by contents of general-purpose register rA logically ORed with the immediate value.  
spr(rA OR Immediate) ← rB[31:0]"
 
 

| I can see why these types of instructions make sense
| (in the multiprocessor case), but they come at the cost of increasing hardware
| complexity.

It really isn't much complexity given you have to handle status
registers in general, and you have to handle instructions that
atomically read a register and write a register.

I agree, but it seems to cut against the central principle of RISC-V (at least as I understand it).  And its not really needed.
 

|   One of the beautiful aspects of the core of RISC-V is that it errs
| towards simplicity of hardware implementation.  I think this is one (albeit
| irreversible) case where the ISA designers got it wrong.

We wanted to standardize the pattern rather than having every
operation on CSRs be potentially special-cased.  You can specialize
the CSR logic to only support the operations needed for your set of
implemented CSRs, but we figure it's usually going to be easier to
just implement the one pipeline for everything.

I don't see how the current CSR modification instructions realize this goal.  To me, the most general behavior is reading an entire register and writing an entire register.  As specified, it seems that there is context-dependent behavior depending on the specified target register.


---Matthew Hicks
 

Krste


| ---Matthew Hicks

| On Mon, Apr 17, 2017 at 12:43 PM, Krste Asanovic <kr...@berkeley.edu> wrote:

|     The RISC-V privileged architecture (and a few other parts) are built around
|     these instructions supporting atomic RMW, so isn’t going to change.

|     You already listed a few techniques that current microarchitectures use to
|     handle such instructions (serialization, renaming), and I’m not aware of
|     any ISA with a privileged architecture that doesn’t need some such
|     instructions (where new status value partly depends on old status value) to
|     be handled.  Does your current ISA support privileged modes and not need
|     any such control/status register updates?

|     Krste

|| On Apr 17, 2017, at 9:22 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
||
|| I don't know what you meant, but what I emphasized was the problematic
|     atomic nature of the operations, preventing them from gently fitting into
|     existing out-of-order designs. This is from a micro-architect point of
|     view, not an ISA-architect one.
||
|| --
|| You received this message because you are subscribed to the Google Groups
|     "RISC-V ISA Dev" group.
|| To unsubscribe from this group and stop receiving emails from it, send an

|| To post to this group, send email to isa...@groups.riscv.org.
|| Visit this group at https://groups.google.com/a/groups.riscv.org/group/
|     isa-dev/.
|| To view this discussion on the web visit https://groups.google.com/a/
|     groups.riscv.org/d/msgid/isa-dev/435f2889-e8fc-42e6-8b07-
|     75ee939fe3be%40groups.riscv.org.

|     --
|     You received this message because you are subscribed to the Google Groups
|     "RISC-V ISA Dev" group.
|     To unsubscribe from this group and stop receiving emails from it, send an

|     To post to this group, send email to isa...@groups.riscv.org.
|     Visit this group at https://groups.google.com/a/groups.riscv.org/group/isa-
|     dev/.
|     To view this discussion on the web visit https://groups.google.com/a/

kr...@berkeley.edu

unread,
Apr 17, 2017, 7:25:49 PM4/17/17
to Matthew Hicks, ron minnich, Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev

>>>>> On Mon, 17 Apr 2017 16:21:27 -0400, Matthew Hicks <mdh...@gmail.com> said:
| By your own logic, it would be a better ISA design decision to use the more
| simple move to/from CSR instructions than the existing more complex versions,
| because these are such a rare case during software execution.  Unfortunately,
| the gates required to implement the more complex (existing) instructions are
| there sucking power and taking up area during the entire execution.

| To me,
| this is a pretty clear case where (R)educed is better than (C)omplex when it
| comes to ISA design.

There really isn't much complexity or area there, compared to a design
that only does reads or writes. You need to include a read mux and a
write broadcast either way, and using both at the same time saves
energy. The logic ALU is small and shared across all CSRs as opposed
to building into each CSR.

CSR swaps are used in context swaps, and save adding more GPR or
scratch registers, which would add to area and energy.

Krste
| send an email to isa-dev+u...@groups.riscv.org.
|| To post to this group, send email to isa...@groups.riscv.org.
|| Visit this group at https://groups.google.com/a/groups.riscv.org/
| group/isa-dev/.
|| To view this discussion on the web visit https://
| groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/
| 435f2889-e8fc-42e6-8b07-75ee939fe3be%40groups.riscv.org.

| --
| You received this message because you are subscribed to the Google
| Groups "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it,
| send an email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/
| group/isa-dev/.
| To view this discussion on the web visit https://groups.google.com/
| a/groups.riscv.org/d/msgid/isa-dev/5ED7E319-51B2-4A99-A6EF-
| D4AEFAA49845%40berkeley.edu.


| --
| You received this message because you are subscribed to the Google
| Groups "RISC-V ISA Dev" group.
| To unsubscribe from this group and stop receiving emails from it, send
| an email to isa-dev+u...@groups.riscv.org.
| To post to this group, send email to isa...@groups.riscv.org.
| Visit this group at https://groups.google.com/a/groups.riscv.org/group/
| isa-dev/.
| To view this discussion on the web visit https://groups.google.com/a/
| groups.riscv.org/d/msgid/isa-dev/CAAQi22atGNpBcLJm4Q8LTmr1ZdU5V
| LOVeJeW5sGecoF-RUMS6w%40mail.gmail.com.


kr...@berkeley.edu

unread,
Apr 17, 2017, 8:26:01 PM4/17/17
to Matthew Hicks, Krste Asanovic, Hugo Décharnes, RISC-V ISA Dev

>>>>> On Mon, 17 Apr 2017 19:22:45 -0400, Matthew Hicks <mdh...@gmail.com> said:
| On Mon, Apr 17, 2017 at 5:37 PM, <kr...@berkeley.edu> wrote:
| It also has a special behavior of move-to-status-register for clearing
| interrupt bits which is similar to the RISC-V read-and-clear-bit
| instruction.

| I don't know of the instructions you speak :).  MTSPR (move to special-purpose
| register) does not have the behavior that you are referring to and it is the
| only way that I know of to directly update SPRs.  From the OR1K specification "
| The contents of general-purpose register rB are moved into the special register
| defined by contents of general-purpose register rA logically ORed with the
| immediate value.  
| spr(rA OR Immediate) ← rB[31:0]"

" 13.3 PIC Status Register (PICSR)

The interrupt controller status register is a 32-bit special-purpose
supervisor-level register accessible with the l.mtspr/l.mfspr
instructions in supervisor mode.

PICSR is used to determine the status of each PIC interrupt input. PIC
can support level-triggered interrupts or combination of
level-triggered and edge-triggered. Most implementations today only
support level- triggered interrupts. For level-triggered
implementations bits in PICSR simply represent level of interrupt
inputs. Interrupts are cleared by taking appropriate action at the
device to negate the source of the interrupt. Writing a '1' or a '0' to
bits in the PICSR that reflect a level- triggered source must have no
effect on PICSR content. The atomic way to clear an interrupt source
which is edge-triggered is by writing a '1' to the corresponding bit
in the PICSR. This will clear the underlying latch for the
edge-triggered source. Writing a '0' to the corresponding bit in the
PICSR has no effect on the underlying latch. "

I've no idea how common edge-triggered interrupts ever were on or1k
systems but at least this was one case where the move to/from was
spec-ed to do something else (bit clear).

I believe the context register also has some behavior where the user
can only write a portion of it, but the supervisor can write all of
it.

| | I can see why these types of instructions make sense
| | (in the multiprocessor case), but they come at the cost of increasing
| hardware
| | complexity.

The atomic CSR operations are not there for multiprocessors
specifically, but are needed even for uniprocessor systems that have
preemptive interrupts, and which pack interrupt enables with other
fields in CSRs.

| It really isn't much complexity given you have to handle status
| registers in general, and you have to handle instructions that
| atomically read a register and write a register.

| I agree, but it seems to cut against the central principle of RISC-V (at least
| as I understand it).  And its not really needed.

So how would you atomically disable interrupts - allocate whole
different CSRs to the *ie bits?

The other reason we include them is that they don't cost much, and
speed up some common system tasks (like setting/clearing bits in CSRs,
swapping contexts, etc.).

| |   One of the beautiful aspects of the core of RISC-V is that it errs
| | towards simplicity of hardware implementation.  I think this is one
| (albeit
| | irreversible) case where the ISA designers got it wrong.

| We wanted to standardize the pattern rather than having every
| operation on CSRs be potentially special-cased.  You can specialize
| the CSR logic to only support the operations needed for your set of
| implemented CSRs, but we figure it's usually going to be easier to
| just implement the one pipeline for everything.

| I don't see how the current CSR modification instructions realize
| this goal.  To me, the most general behavior is reading an entire
| register and writing an entire register.

Many ISA designers end up adding some way of setting and clearing
individual bits in a CSR atomically to handle system issues. This
cannot be done using just separate plain read and write.

| As specified, it seems that there is context-dependent behavior
| depending on the specified target register.

The means to get data in and out of a CSR is not where the
context-dependent behavior is - this part is completely orthogonal in
RISC-V. The CSR complexity is in the myriad special cases that
different CSRs inherently require (e.g, floating-point flags accrue
exceptions, privilege changes pop stacks inside *status), and this
complexity is orthogonal to, but dwarfs the complexity of getting bits
in and out.

Krste
| |     email to isa-dev+u...@groups.riscv.org.
| || To post to this group, send email to isa...@groups.riscv.org.
| || Visit this group at https://groups.google.com/a/groups.riscv.org/group/
| |     isa-dev/.
| || To view this discussion on the web visit https://groups.google.com/a/
| |     groups.riscv.org/d/msgid/isa-dev/435f2889-e8fc-42e6-8b07-
| |     75ee939fe3be%40groups.riscv.org.

| |     --
| |     You received this message because you are subscribed to the Google
| Groups
| |     "RISC-V ISA Dev" group.
| |     To unsubscribe from this group and stop receiving emails from it,
| send an
| |     email to isa-dev+u...@groups.riscv.org.
| |     To post to this group, send email to isa...@groups.riscv.org.
| |     Visit this group at https://groups.google.com/a/groups.riscv.org/
| group/isa-

Jacob Bachmeyer

unread,
Apr 17, 2017, 9:50:26 PM4/17/17
to kr...@berkeley.edu, Hugo Décharnes, RISC-V ISA Dev
kr...@berkeley.edu wrote:
> This could be an issue if both GPRs and CSRs were renamed into same
> pool, as otherwise only a single new physical register is needed per
> instruction. If you don't rename CSRs and instead serialize CSR
> updates, they're no harder than handling CSRs in an out-of-order
> engine for any other ISA (and similar to an in-order design).
>
> Some pure storage CSRs (like *scratch with no side effects) could be
> renamed and handled by treating as more architectural registers, with
> the common case of pure swaps handled with rename table tricks, but
> because we also allow these to be manipulated with side effects (i.e.,
> bit set/clear) they need different handling, e..g, cracking the CSR
> instruction into a read-write-GPR uop, then a read-modify-write-CSR
> uop.
>

I previously proposed a definition for "side-effect-free CSRs" that
distinguishes CSRs (like *scratch, *tvec, *cause, etc.) that *can* be
put into a larger renamed register pool, from those (like *status,
*deleg, *ie, *ip, etc.) that *cannot* be put into a larger pool, because
their individual bits affect hardware in some special way.

The current draft of the privileged ISA is *very* close to
this--swapping the addresses for *ip and *tvec would place only
side-effect-free CSRs in the 0x?4? group, although I eventually
suggested reserving 0---010----- (1/16th of the CSR space;
side-effect-free CSRs cannot be read-only) for side-effect-free CSRs to
allow implementations to partially reuse existing decode logic for the
R-type funct7 field in recognizing this group.

An implementation that wants to store side-effect-free CSRs in a general
rename pool would simply need to support the required read-modify-write
instructions on both general registers and physical CSRs not in the
rename pool, but CSRRW need only produce two rename table updates,
remapping rd to the register currently holding the CSR and remapping the
CSR to rs1. CSRRS and CSRRC simply perform OR and AND-NOT,
respectively, and remap the CSR to the physical register allocated to
hold the produced value while remapping rd to the register initially
holding the CSR value.

The effect of my earlier proposal would be to allow implementations to
"split" the CSR instructions, with eight values of the funct7 field
indicating that a given CSR is actually in the general register pool.
Simpler implementations can of course ignore this distinction and have
only a sparse CSR space completely disjoint from the general registers.


> It's not clear a high-performance implementation would bother with any
> CSR renaming, and I'm fairly sure all current OoO processors have to
> handle some serializing instructions, so I don't see how there's a
> real performance/complexity hit for the RISC-V scheme.
>

CSR renaming could benefit trap handlers in particular, since the
side-effect-free CSRs currently defined are all related to trap
handling. (Hardware may synchronously read from or write to a
side-effect-free CSR, but hardware is *not* continuously affected by the
value held in such a register.) Writing *epc, *cause, and *tval could
be done in parallel with jumping to *tvec (all of which are
side-effect-free CSRs) during the instruction-fetch latency incurred by
the jump when a trap is taken. This also allows an implementation to
reuse the JALR logic for taking traps, since a trap is very similar to
"JALR *epc, 0(*tvec)" except that *epc is loaded with the address of the
instruction that took the trap rather than the following instruction.
Since C.JALR already requires the ability to produce multiple link
offsets (adding 2 for C.JALR instead of 4 for JALR), the additional
logic to support adding zero should be small over the baseline for RVGC.

> To avoid/instill confusion, there is a large design space for
> instruction serialization, with very different performance penalties -
> some of the x86 ones are unfathomably long. For the RISC-V CSR case,
> in most cases the penalty could only be that the instruction after a
> CSR mod is held in decode until the CSR mod commits.
>

Interrupt handling and context switch (reloading the page table base
register) are some of x86's big weak spots and RISC-V is already
expected to do much better. Including certain CSRs into the general
register pool could further improve that performance and exchanging the
CSR addresses for *ip and *tvec could reduce the hardware complexity
needed to gain that improved performance, however slightly. I still
advocate that change.


-- Jacob

Hugo Décharnes

unread,
Apr 18, 2017, 7:46:19 AM4/18/17
to RISC-V ISA Dev, hdec...@outlook.fr


I'm guessing your real concern is that RISC-V CSR instructions can
both read the old value into a new architectural register and update
CSR with a new value, i.e., the real difference with other ISAs is
that there are two register writes per instruction (one to CSR and one
to GPR).


Yep. This is what is called read-modify-write.

Hugo Décharnes

unread,
Apr 18, 2017, 8:14:52 AM4/18/17
to RISC-V ISA Dev, hdec...@outlook.fr

Some pure storage CSRs (like *scratch with no side effects) could be
renamed and handled by treating as more architectural registers, with
the common case of pure swaps handled with rename table tricks, but
because we also allow these to be manipulated with side effects (i.e.,
bit set/clear) they need different handling, e..g, cracking the CSR
instruction into a read-write-GPR uop, then a read-modify-write-CSR
uop.

We could restrict supported CSR operations to just those required on
each register, but this would complicate the spec, and most likely the
implementation.  We chose instead to standardize on a common pattern.

It's not clear a high-performance implementation would bother with any
CSR renaming, and I'm fairly sure all current OoO processors have to
handle some serializing instructions, so I don't see how there's a
real performance/complexity hit for the RISC-V scheme

Renaming CSR would be a very bad idea, even for those that have no side effects. These registers are not referenced often enough to justify the huge hardware cost, but just enough to avoid serialization as much as possible, and I speak from personal experience.
It's far from being insurmountable; it just make micro-architects job a little harder for uncommon cases that are nonetheless worth considering.

Paul A. Clayton

unread,
May 5, 2017, 4:53:15 PM5/5/17
to RISC-V ISA Dev
On 4/18/17, Hugo Décharnes <hdec...@outlook.fr> wrote:
[snip]
> Renaming CSR would be a very bad idea, even for those that have no side
> effects. These registers are not referenced often enough to justify the
> huge hardware cost, but just enough to avoid serialization as much as
> possible, and I speak from personal experience.
> It's far from being insurmountable; it just make micro-architects job a
> little harder for uncommon cases that are nonetheless worth considering.

I disagree. I suspect the simplistic handling of CSR-equivalents
is one significant reason why system calls and context switches
are so much more expensive than function calls.

It is also not obvious that the relatively low rate of CSR access (and
specifically modification) is not in part a consequence of the cost of
such accesses (along with the dominance of "conventional" OSes).

The cost of renaming CSRs need not be enormous (in the context
of a core that already goes to extremes in OoO execution). Most
wrong side effects would be the consequence of mispredicted
branches and would be rolled back with the correction for the
branch prediction (though memory ordering and perhaps some other
forms of speculation might also provide incorrect values which
contaminate later operations, but those forms of speculation also
tend to be supported by non-selective roll-back).

Also, it would be quite possible to provide an atomicity guarantee
for adjacent (or even proximate) CSR read and writes. (An
implementation might choose resume at the CSR read or stall
the interrupt until the write is completed.) The ISA could effectively
guarantee that such are atomic with respect to interrupts with
little more burden than handling variable-length instructions. Such
would also allow extending the atomicity guarantee to support more
sophisticated modifications. (This is similar to the difference between
only providing atomic swap and providing ll/sc.)

However, I also do not think a RMW is especially problematic for
renaming. E.g., one could crack the instruction into micro-ops.

Since that part of the ISA is already frozen, there is little urgency
in considering such issues. Presumably RISC-V is going to have
a procedure for deprecating and removing instructions and guarantees
to allow non-compatible changes. The instruction encoding waste
in using RMW does not seem significant and as already noted trap
and emulate could provide compatibilty with little extra hardware burden
(if a particular microarchitectural family line chose not to implement
CSR RMW directly in hardware).

Hugo Décharnes

unread,
May 12, 2017, 3:18:10 PM5/12/17
to RISC-V ISA Dev
Not to be disrespectful, but why do you think that most if not all high-performance processors serialize, at least partially, CSR accesses? Many have side effects requiering at best stalling subsequent instructions before entering the out-of-order core, at worst flushing the pipeline. In any case, forward progress cannot occur until their effects have been applied.
Then ,renaming CSR would inevitably lower frequency as it is one the most timing critical path in modern out-of-order processors, for insignificant performance gain, as CSR are not frequently enough accessed. This is a very bad idea that proves, with all due respect, you have not been involved in such a project.

Jacob Bachmeyer

unread,
May 12, 2017, 7:12:23 PM5/12/17
to Hugo Décharnes, RISC-V ISA Dev
No disrepect taken; I see this as a misunderstanding.

Some CSRs have side effects, but some, particularly the CSRs involved in
trap handling, do not have such side effects. Implementations can
benefit from renaming these CSRs, but not from renaming *all* CSRs. For
example, *status is likely to physically be small cells scattered across
the core. Renaming *status is insane. The satp CSR controls virtual
addressing with immediate effect. Renaming satp is insane. The
performance counters will need specialized hardware and similarly are
not candidates for renaming in the general case. On the other hand,
*scratch is normally used only by exchanging its value with a
general-purpose register and explicitly has no effects on hardware
whatsoever. Renaming the *scratch CSRs into the same pool as the main
registers is reasonable, and allows the "CSRRW rd, ?scratch, rd"
instruction that must begin a trap handler to be executed with a rename
table update. Renaming *tvec into main register pool would enable
taking a trap to be executed as "JALR x0, *tvec", which may or may not
be useful depending on the implementation.

Similarly, *epc, *cause, and *tval must both hold a value and be written
by hardware as an occasional side-effect of normal execution. One
possible implementation (assuming four privilege levels M/H/S/U, all of
which can handle traps) is to have _five_ sets of these CSRs, one for
each privilege level that hold their values and one "running" set that
is updated with every instruction executed. In this model, *tval is
actually split into multiple columns, one of which is loaded with every
instruction decoded, one of which is loaded with every virtual address
accessed, and one of which is cleared to zero. When a trap is taken
(for example) to the supervisor, rename table updates swap sepc for the
"running" epc, scause for the "running" cause, and stval for the
appropriate "running" value for the trap's cause. The physical cells
that previously held those CSRs are thus the new "running" set.

Note that this does not rename *epc, *cause, and *tval into the main
register file, but instead places them in a separate array (that need
not be physically contiguous--tval for "bad instruction" and tval for
"bad address" may be in different places, with epc somewhere else
entirely) with its own rename table and dedicated write ports for the
hardware-controlled updates.


-- Jacob

Kirk Weedman

unread,
May 12, 2017, 8:00:43 PM5/12/17
to RISC-V ISA Dev, hdec...@outlook.fr
Just create a single CSR Functional unit to grab all the CSR instructions, pass all the necessary info all the way to the Commit stage where you RMW the CSR as defined by the instruction and also update the Arch. register defined by Rd all on the same clock edge.  Of course, there could be no forwarding of Rd until the Commit stage as I think Krste was suggesting, and thus a slight hit, but otherwise its not that complicated.  This is what I just did in my new Out of Order microarchitecture that I just started targeting to the RISC-V ISA.

Stefan O'Rear

unread,
May 12, 2017, 8:53:34 PM5/12/17
to Hugo Décharnes, RISC-V ISA Dev
On Fri, May 12, 2017 at 12:18 PM, Hugo Décharnes <hdec...@outlook.fr> wrote:
> Not to be disrespectful, but why do you think that most if not all
> high-performance processors serialize, at least partially, CSR accesses?

Which high-performance processors, specifically, are you referring to?
As far as I know there has not been an OOO RISC-V core at tapeout
stage.

-s

Hugo Décharnes

unread,
May 13, 2017, 6:09:47 AM5/13/17
to RISC-V ISA Dev, hdec...@outlook.fr, jcb6...@gmail.com
There is no misunderstanding. As mentionned earlier, renaming logic is very often timing critical, and designers try to keep renamed registers as few as possible. Besides, renaming is only worth the cost when there is a reasonable chance that multiple instructions modifying the same register are executing. This is crealy not the case for CSR, except scratch registers. A useless delay increase of the RAT, and a waste of PRF registers.

Note that this does not rename *epc, *cause, and *tval into the main 
register file, but instead places them in a separate array (that need 
not be physically contiguous--tval for "bad instruction" and tval for 
"bad address" may be in different places, with epc somewhere else 
entirely) with its own rename table and dedicated write ports for the 
hardware-controlled updates. 

Adding another rename table won't reduce access delay.

Hugo Décharnes

unread,
May 13, 2017, 6:24:41 AM5/13/17
to RISC-V ISA Dev, hdec...@outlook.fr
There is no point of waiting until commit to produce the GPR; that's only a requirement for not renamed registers. The answer may be to crack the instruction into two µops, or execute it in two phases. Added complexity for no performance gain.

Hugo Décharnes

unread,
May 13, 2017, 6:25:37 AM5/13/17
to RISC-V ISA Dev, hdec...@outlook.fr
All non-RISC-V processors.

Jacob Bachmeyer

unread,
May 13, 2017, 8:32:20 PM5/13/17
to Hugo Décharnes, RISC-V ISA Dev
Hugo Décharnes wrote:
> There is no misunderstanding. As mentionned earlier, renaming logic is
> very often timing critical, and designers try to keep renamed
> registers as few as possible. Besides, renaming is only worth the cost
> when there is a reasonable chance that multiple instructions modifying
> the same register are executing. This is crealy not the case for CSR,
> except scratch registers. A useless delay increase of the RAT, and a
> waste of PRF registers.

The *scratch and *tvec registers are exactly the CSRs that I suggest
could be renamed into the main PRF, since they are accessed equivalently
to general-purpose registers. Nor is renaming strictly needed, although
it would improve the performance of exchanging *scratch with a GPR; a
processor could implement {m,h,s,u}{scratch,tvec} as "x32" through
"x40". Admittedly, *tvec-as-quasi-GPR is less likely to be useful than
*scratch-as-quasi-GPR, but some implementations might benefit from the
ability to dispatch a supervisor trap by "executing" "JALR x0,
[scause*4](stvec)" using the same logic as for ordinary JALR.

> Note that this does not rename *epc, *cause, and *tval into the main
> register file, but instead places them in a separate array (that need
> not be physically contiguous--tval for "bad instruction" and tval for
> "bad address" may be in different places, with epc somewhere else
> entirely) with its own rename table and dedicated write ports for the
> hardware-controlled updates.
>
>
> Adding another rename table won't reduce access delay.

It will not, but placing *epc, *cause, and *tval{insn,addr,zero} in
their own register file (not the main PRF) with dedicated write ports
for updates would permit loading those CSRs to proceed in parallel with
normal execution and permit eliminating any visible trap latency that
would otherwise come from loading values into *epc, *cause, and *tval.
This is close to speculating a trap on every instruction and requires
either that the "running" row be copied into the appropriate set when a
trap is actually taken or that the "trap parameter" registers be renamed
within their own file to permit swapping the "running" row with the
appropriate set. This is needed because the ISA specifies that the
less-privileged "trap parameter" CSRs retain their values when a
more-privileged trap is taken. (In other words, whatever value was last
written to sepc, either by the supervisor or by hardware, is visible on
a subsequent monitor trap. It is probably meaningless, but it is
preserved.)

Minimizing the visible latency of a trap (by doing as much as possible
in parallel, even if this amounts to speculating a trap on every
instruction) could be beneficial since some traps, particularly ECALL,
are fully predictable. In this way, the cost of ECALL could be brought
down to that of an unconditional jump, which would bring the cost of a
system call to only a bit more than an ordinary function call through a
dispatch procedure. I maintain that this would be good.

In summary, the ISA currently permits CSRs to be renamed as I have
described and the only concern that I have here is that the RISC-V ISA
continue to do so.


-- Jacob

Hugo Décharnes

unread,
May 14, 2017, 2:02:33 PM5/14/17
to RISC-V ISA Dev, hdec...@outlook.fr, jcb6...@gmail.com
The *scratch and *tvec registers are exactly the CSRs that I suggest
could be renamed into the main PRF, since they are accessed equivalently
to general-purpose registers.  Nor is renaming strictly needed, although
it would improve the performance of exchanging *scratch with a GPR; a
processor could implement {m,h,s,u}{scratch,tvec} as "x32" through
"x40".  Admittedly, *tvec-as-quasi-GPR is less likely to be useful than
*scratch-as-quasi-GPR, but some implementations might benefit from the
ability to dispatch a supervisor trap by "executing" "JALR x0,
[scause*4](stvec)" using the same logic as for ordinary JALR.

Renaming is not needed as it will not improve performance; just lower frequency.

It will not, but placing *epc, *cause, and *tval{insn,addr,zero} in
their own register file (not the main PRF) with dedicated write ports
for updates would permit loading those CSRs to proceed in parallel with
normal execution and permit eliminating any visible trap latency that
would otherwise come from loading values into *epc, *cause, and *tval.  
This is close to speculating a trap on every instruction and requires
either that the "running" row be copied into the appropriate set when a
trap is actually taken or that the "trap parameter" registers be renamed
within their own file to permit swapping the "running" row with the
appropriate set.  This is needed because the ISA specifies that the
less-privileged "trap parameter" CSRs retain their values when a
more-privileged trap is taken.  (In other words, whatever value was last
written to sepc, either by the supervisor or by hardware, is visible on
a subsequent monitor trap.  It is probably meaningless, but it is
preserved.)

Return address and trap cause are very often saved in dedicated flops but never renamed.
 
Minimizing the visible latency of a trap (by doing as much as possible
in parallel, even if this amounts to speculating a trap on every
instruction) could be beneficial since some traps, particularly ECALL,
are fully predictable.  In this way, the cost of ECALL could be brought
down to that of an unconditional jump, which would bring the cost of a
system call to only a bit more than an ordinary function call through a
dispatch procedure.  I maintain that this would be good.

Context switch latency comes from the need to drain, at least partially, the pipeline; not current status saving.

This conversation suggests me you have not experienced the development of a high-end processor (eg : at AMD, ARM, IBM, or Intel).

Samuel Falvo II

unread,
May 14, 2017, 2:37:36 PM5/14/17
to Hugo Décharnes, RISC-V ISA Dev, Jacob Bachmeyer
On Sun, May 14, 2017 at 11:02 AM, Hugo Décharnes <hdec...@outlook.fr> wrote:
> This conversation suggests me you have not experienced the development of a
> high-end processor (eg : at AMD, ARM, IBM, or Intel).

Disclaimer: I'm not a moderator for this mailing list. But, I've
watched every message you've written and, while often insightful, you
have been pretty brash as well.

However this statement is intended to be taken, please be aware that
it can easily be misconstrued as an ad hominem attack. Be conscious
and conscientious towards your peers on this mailing list. If you
have particular issues with someone, it's perhaps best to take the
matter off-list.

Thanks.

--
Samuel A. Falvo II

Jacob Bachmeyer

unread,
May 14, 2017, 11:39:02 PM5/14/17
to Hugo Décharnes, RISC-V ISA Dev
Hugo Décharnes wrote:
>
> The *scratch and *tvec registers are exactly the CSRs that I suggest
> could be renamed into the main PRF, since they are accessed
> equivalently
> to general-purpose registers. Nor is renaming strictly needed,
> although
> it would improve the performance of exchanging *scratch with a GPR; a
> processor could implement {m,h,s,u}{scratch,tvec} as "x32" through
> "x40". Admittedly, *tvec-as-quasi-GPR is less likely to be useful
> than
> *scratch-as-quasi-GPR, but some implementations might benefit from
> the
> ability to dispatch a supervisor trap by "executing" "JALR x0,
> [scause*4](stvec)" using the same logic as for ordinary JALR.
>
>
> Renaming is not needed as it will not improve performance; just lower
> frequency.

That will ultimately depend on implementation details. I have never
suggested that renaming be in any way mandatory, but that the ISA ensure
that it is _possible_. For now, that means keeping the CSR number as an
immediate in the instruction, which allows an implementation to "break
up" the CSR access instructions into specialized instructions for
different CSRs.

Another question that each implementation must answer for itself is
whether the increase in frequency from omitting renaming is actually
achievable--if CSR access is never the critical path with renaming, then
it does not matter. Similarly, if that higher frequency would result in
power usage that exceeds feasible dissipation, it does not matter; the
implementation gains nothing from omitting renaming. Of course, unless
using CSR renaming simplifies some other logic enough to justify the
cost of the additional rename hardware and splitting the CSR
instructions, there is no reason to rename any CSRs in the first place.

> It will not, but placing *epc, *cause, and *tval{insn,addr,zero} in
> their own register file (not the main PRF) with dedicated write ports
> for updates would permit loading those CSRs to proceed in parallel
> with
> normal execution and permit eliminating any visible trap latency that
> would otherwise come from loading values into *epc, *cause, and
> *tval.
> This is close to speculating a trap on every instruction and requires
> either that the "running" row be copied into the appropriate set
> when a
> trap is actually taken or that the "trap parameter" registers be
> renamed
> within their own file to permit swapping the "running" row with the
> appropriate set. This is needed because the ISA specifies that the
> less-privileged "trap parameter" CSRs retain their values when a
> more-privileged trap is taken. (In other words, whatever value
> was last
> written to sepc, either by the supervisor or by hardware, is
> visible on
> a subsequent monitor trap. It is probably meaningless, but it is
> preserved.)
>
>
> Return address and trap cause are very often saved in dedicated flops
> but never renamed.

Once again, renaming is optional; some implementations may use and
benefit from it while others would not benefit and will not use it. The
hypothetical implementation (that uses a MIPS-like 5-stage pipeline) I
was suggesting writes repc1 (running epc for instruction fetch/decode
traps) and rtval{insn} (running tval{insn}) with every instruction
fetch, repc2 (running epc for breakpoint/ECALL) with every instruction
executed, and repc3 (running epc for memory access) and rtval{addr}
(running fault address) with every memory access. An out-of-order
implementation gets more interesting, since there is effectively an repc
and rtval (which starts out as rtval{insn} until the instruction is
decoded and then becomes either rtval{addr} for memory access or
rtval{zero} for others) for every in-flight instruction. (And
rtval{addr} actually has no incremental storage cost whatsoever--it is
simply the virtual address that is passed to the load-store unit.)

> Minimizing the visible latency of a trap (by doing as much as
> possible
> in parallel, even if this amounts to speculating a trap on every
> instruction) could be beneficial since some traps, particularly
> ECALL,
> are fully predictable. In this way, the cost of ECALL could be
> brought
> down to that of an unconditional jump, which would bring the cost
> of a
> system call to only a bit more than an ordinary function call
> through a
> dispatch procedure. I maintain that this would be good.
>
>
> Context switch latency comes from the need to drain, at least
> partially, the pipeline; not current status saving.

If in-flight instructions are tagged appropriately, there is no need to
drain the pipeline to switch between S-mode and U-mode. Only writes to
satp require a pipeline flush, and that is due to the requirement that
the new satp value apply to fetching the next instruction executed. An
implementation could avoid even this flush if the supervisor cooperates
by putting its address space switching code in a global mapping. This
also means that it is possible to speculate through ECALL and possibly
other traps. I wonder what, if any, overall performance improvement
could be had by speculating a page fault on every (data) memory access.

Even taking an interrupt requires only inserting an "interrupt barrier"
into the decode stream; the internally generated "interrupt barrier"
would carry the repc value appropriate for resuming after the
interrupt. Instruction fetch would immediately continue from the
relevant trap vector. An exception caused by a pending instruction
would cancel the interrupt trap, but the interrupt would remain pending
and would be taken either upon entry to the exception handler (if the
interrupt is to a higher privilege level) or when the exception handler
re-enables interrupts.

> This conversation suggests me you have not experienced the development
> of a high-end processor (eg : at AMD, ARM, IBM, or Intel).

You guess correctly and I counter with an argument that experience can
have subtle costs, specifically a susceptibility to "One True Way"
thinking. This is the wisdom shown by the RISC-V developers in setting
up this very list and opening discussion to the entire
(English-speaking) Internet. We even have one participant who has
admitted having no formal education beyond high school, yet has made
helpful contributions to some of the discussions.


-- Jacob
Reply all
Reply to author
Forward
0 new messages