Hi Ved,
Regarding 2xXLEN instructions in general,
is there some written argumentation in the base standard on why only even numbered registers can be used,
or more precisely the LO part is stored in the R even GPR and the HI part is stored in the R+1 odd GPR.
I can think of two basic options for implementing this in hardware.
1. The simplest option is to access the GPR register file sequentially over two separate clock periods.
2. In case a 32-bit CPU has a 64-bit system bus, this could be done in a single clock period.
The register file can be split into even/odd parts, thus enabling double throughput.
At first glance the even R requirement might reduce logic complexity,
but in practice I do not think it would spare even an XLEN wide multiplexer compared to unrestricted R value.
The even R requirement makes it easy to avoid the case where R=31 and R+1=X0, is this the reason?
Are there any consideration on how 2xXLEN instructions relate to the ABI?
I am just curious here, I do not have enough ABI experience to comment on it.
My interest comes from planning to add to my RV32 implementation a custom extension supporting atomic 64-bit load/store,
and I am writing a system bus protocol which would support atomic 64-bit transactions on a 32-bit data bus (locked transfer pair).
The primary purpose of this instructions would be atomic access to 64-bit timer/counter registers in peripherals,
thus avoiding the need to check for whether the HI part changed after reading the LO part.
Also it would enable atomic access to 64-bit GPIO registers and similar use cases.
This could be useful for CSR instructions too.
Such 64-bit load/store instructions on RV32 could reuse the existing codes for RV64 64-bit load/store, both the base I and C extension.
The encoding for RV128 128-bit load/store is a bit different and thus not a perfect fit to extend RV32.
Another use case I am (will be) looking into is the
P extension.
See 'Processing of 64-bit Values in RV32' in Chapter 7, which states:
"Use of misaligned (odd-numbered) registers for 64-bit operands is reserved.".
A RV32IP CPU with a 64bit load/store data bus could take advantage of 64-bit load/store instructions
to increase memory throughput.
This in general would be an advantage for any implementation focused on DSP functionality needing high data throughput.
The load-multiple/store-multiple approach in the
Zc extension is rather different and focused on stack push/pop.
So what I am looking for is argumentation in the base standard that would deal with various aspects
(odd/even, X0 source/destination, atomicity, endianness, ABI, ...) of load-pair/store-pair instructions,
providing general recommendations for various extensions.
Regards,
Iztok Jeras