Proposal: Alternative RV128 (Upgrade path for Legacy 64b systems, skipping RV64 entirely)

87 views

Skip to first unread message

Xan Phung

unread,

Feb 5, 2023, 12:11:37 AM2/5/23

to RISC-V ISA Dev

This proposal is a business case & feasibility review for an Alternative RV128 design, and outlines a couple of design choices for which I would welcome input from Forum members.

My proposal assumes the first high volume market to adopt 128b computing will be data centres/cloud services & 128b Personal Computer CPUs. In both these markets, x86_64 has overwhelmingly the largest installed base, with trillions in sunk costs by 2030 (AWS, Azure, GCP alone > $500bil).

Example here & now demands for 128b compute:

* Fixed length 128b word size strings are roomy enough (16 chars) to be viable replacements for many byte array string use cases.

* Int128 data type language support is increasing (eg. Rust & Zig) & also packed bitfield structs up to 128b of total size in C.

* LISP (& related languages) using 128 bit cons cells (ie: a pair of 64 bit car/cdr's)

* Faster portable null terminated string processing (eg. musl strlen), where portable = same C code can be compiled for 32b to 128b architectures

* Software can be easily modified to use a 128b datapath for the above here & now, it is just waiting for hardware to catch up!!

* In all the above, 128b can be introduced piecemeal, most pointers are 64b, and there won't be a "big bang" replacement of entire software stacks.

* Even for full 128b user mode address sizes, the kernel may remain 64b (for device driver compat), eg. first 64b Mac OS X had 32b & 64b user mode with a 32b kernel

* Moreover, the silicon wafer cost of a 128 bit datapath is trivial compared to the enormous cost of changing entire installed systems & software.

RV64 can be skipped entirely & RV128 should *not* be a 100% extrapolation from RV64:

* My vision for RV128 sees it as a "feature add-on" to legacy 64b systems (and there is no benefit in an RV64 intermediate transition). However, "pure" or greenfield implementations of my RV128 are also possible - the important point is not to pre-maturely foreclose on either choice.

* See attached PDF file for further details (register model, instruction set & encodings) of my Alternative RV128 ISA

* In it, I also outline ISA strategies for:

1. Greater energy efficiency (128b registers > more energy hungry than 64b registers).

2. Design choices for ABI interop between (legacy) 64b code and (new) 128b code. I believe the link register needs to be dropped from the RV128 ABI so that legacy 64b code can call new 64/128b code, and vice versa. (But I fully retain the fused branch-compare instructions, 32 registers/5b register fields, non destructive destination reg).

3. I also outline scenarios of a "pure" 128b only ISA, vs a "mixed" ISA where my RV128 ISA is embedded in a host ISA using a REX-like prefix (or using almost deprecated x86 opcodes like x87 FPU instructions).

I believe we need to start thinking about 128b systems from today, as the barrier is not silicon/transistor budget cost, nor is it waiting for 128b memory addressing. All it will take is that we change our own historical mindset (that 128b is far off into the future), and think of practical/incremental ways to build & use 128b datapaths right now.

RV128 Alternative Proposal.pdf

MitchAlsup

unread,

Feb 5, 2023, 4:58:05 PM2/5/23

to RISC-V ISA Dev, Xan Phung

One can easily agree that we are on the cusp of some sort of migration to 128-bit systems.

But what you outline does not address some fundamental issues::

1) What does code look like when you call a function over a 64-bit boundary ??

2) What does array access look like when you access an array over a 64-bit boundary ??

3) and indirectly:: What does GOT access look like when GOT is bigger than 2^24 bytes ??

RISC-V is already saddled with an untasty 64-bit versions of these:

1.v) You would access memory for a 64-bit absolute address of function, and then JMPI

2.v) You would access memory for a 64-bit absolute address of array and then index it.

Since the 12-bit ±offset becomes relatively smaller as addresses get larger, there is

additional overhead in accessing these indirect tables (not illustrated herein).

How will RISC-V compare against other ISAs that provide direct access to 64-bit

<Virtual> Address Spaces without the level of indirection RISC-V currently needs.

That is, those systems that do not use instructions (or consume registers) to obtain large

constants.

On Saturday, February 4, 2023 at 11:11:37 PM UTC-6 Xan Phung wrote:

This proposal is a business case & feasibility review for an Alternative RV128 design, and outlines a couple of design choices for which I would welcome input from Forum members.

Example here & now demands for 128b compute:

I see these are nickle-and-dime uses of 128-bit items::

* Fixed length 128b word size strings are roomy enough (16 chars) to be viable replacements for many byte array string use cases.

* Int128 data type language support is increasing (eg. Rust & Zig) & also packed bitfield structs up to 128b of total size in C.

* LISP (& related languages) using 128 bit cons cells (ie: a pair of 64 bit car/cdr's)

* Faster portable null terminated string processing (eg. musl strlen), where portable = same C code can be compiled for 32b to 128b architectures

The betwixt represent maybe 1% of instructions being processed ?!? and you are going to target a new architecture to make these better/more efficient ?!?!?

* Software can be easily modified to use a 128b datapath for the above here & now, it is just waiting for hardware to catch up!!

The world is replete with SW that cannot deal with integers not being 32-bits, too. Some SW

breaks is integers are not at least this big, others break when integers are bigger !?!?! And we

20 years after x86-64...........

* In all the above, 128b can be introduced piecemeal, most pointers are 64b, and there won't be a "big bang" replacement of entire software stacks.

Yes, this is exactly how the 16-bit architectures attempted to migrate to 32-bit architectures, and

how most 32-bit architectures intended to migrate to 64-bit architectures. None of the 16-bitters

survived the transition. Almost none of the 32-bitters did either.....not exactly the track record to pursue.

* Even for full 128b user mode address sizes, the kernel may remain 64b (for device driver compat), eg. first 64b Mac OS X had 32b & 64b user mode with a 32b kernel

* Moreover, the silicon wafer cost of a 128 bit datapath is trivial compared to the enormous cost of changing entire installed systems & software.

Here, we agree, the silicon design cost is "trivial" to "not very hard" the only units getting out of

hand would be the multiplier(s) and the Alignment multiplexers associated with LDs and STs. Almost everything else is just "poof; all this logic gets doubled".

RV64 can be skipped entirely & RV128 should *not* be a 100% extrapolation from RV64:

* See attached PDF file for further details (register model, instruction set & encodings) of my Alternative RV128 ISA

* In it, I also outline ISA strategies for:

1. Greater energy efficiency (128b registers > more energy hungry than 64b registers).

So, we find ourselves at a point in time where we still have 20-odd years before we run out of

the 64-bit address space, but entering the realm where more and more SW wants efficient

access to "a few" multi-precision calculations. For the most part, whether:: uint128_t i = j + k

takes 1 cycle or 5 is not visible. This SW realm has been growing since the transition from

32-to-64-bits (and even earlier). And every time we make the machines 2× as wide, we set

back the need for efficiency because a large portion of multi-precision need was for the size

2× bigger that what is natural for the architecture.

When do you think it will be practicable to place 2^65 bytes of DRAM DIMM addressable in a

single coherent system ??

John Leidel

unread,

Feb 5, 2023, 5:57:25 PM2/5/23

to MitchAlsup, RISC-V ISA Dev, Xan Phung

Xan, as a potential middle group, I would suggest reviewing some of
the work we've done with xBGAS. It provides extended addressing
semantics without trouncing on the base addressing schema (as opposed
to flat 128bit addressing). It does not, however, provide support for
128bit arithmetic (but could likely be modified to do so). We've
published several papers on the efforts hardware and software efforts,
but the seminal work was published at IPDPS:
https://ieeexplore.ieee.org/abstract/document/9460481

best
john

> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/7f7cba37-a02e-4cda-b582-166053db6216n%40groups.riscv.org.

Xan Phung

unread,

Feb 5, 2023, 11:55:23 PM2/5/23

to RISC-V ISA Dev, MitchAlsup, Xan Phung

Hi Mitch, thanks for your comments and first I will address your "big picture" questions about the cost/benefit of 128b systems upgrade.

(I address your specific other questions, about synthesizing constants in my embedded response within the quoting of your email below).

1. 128b data, not 128b addresses (yet):

In relation to your last question about 2^65 sized DRAM systems, my guess is that's likely to be 20+ years away (if not longer - there seems to be a slow down in memory density increase from 2003 onwards). The historical numbers are as follows:

8080 in 1974 = 16b

8086 in 1978 = 20b = +1b per year

80286 in 1982 = 24b address size = +1b per year

80386 in 1985 = 32b address sizes = +2b per year

AMD64 Opteron in 2003 = 48b virtual address space = +1b per year

current systems = approx 57 bits virtual address space = +0.5b per year

For this reason, I don't see the need for 128b memory addresses for at least a decade. I agree my examples (128b data, but retaining 64b pointers in an X64ABI) are not as technically revolutionary as going to 128b addresses, but I think 128b data is the use case the "market" will adopt fastest. I should emphasise moving from X64ABI to the full blown 128ABI is a software migration issue (not hardware), and my aim is first to get 128b capable hardware installed first using X64 ABI - then take time to migrate software to 128ABI.

I agree from a technology viewpoint, the 8086->80286->80386->amd64 transition was a dog's breakfast. But commercially they were all successful due to the incumbent power of the x86 installed base, and that's why I think RV128 should provide an upgrade path from x86_64 (and RV64 should be bypassed).

2. Yes, much software will be stuck on 4 byte int - this is the whole business case for Alternative RV128 (vs "Original" RV128):

I also 100% agree with your comments about "world is replete with SW that cannot deal with integers not being 32-bits" - that's why Alternative RV128 has 16 registers which remain only 64b in size, as we simply don't need *all* 32 registers to be 128b. My Alternative RV128 assumes the data size model as int = 32b for the foreseeable future, pointers = 64b for next 10 yrs then 128b thereafter, and long long or long long long = 128b immediately "here & now".

[I will provide a more detailed look at the Alternative RV128b 16x64b + 16x128b register file in another forum posting very soon].

3. Marketing power of 128b is important, not just technical merit

(Silicon cost of 128b is trivial/not hard, but marketing power of 128b upgrade path is potentially huge):

I disagree though with your estimate that only 1% of processing will use the 128b datapath. Even if it is, I wouldn't underestimate the marketing power of 128b nonetheless - remember, I am looking at this from a business case & market adoption point of view, not purely technical merit.

Is marketing power an illegitimate consideration? I would say *no*. Users care about future proofing. Even if they only need 128b capability for 1% initially, they want the optionality/reassurance that their system won't go obsolete in 3-5 yrs if their need for 128b has increased. The silicon cost of this optionality/reassurance is trivial/not hard, so why not provide it?

But even on the issue of technical merit, the 128b techniques I outlined are under-used, as current software can't assume the presence of 128b hardware. So it's a chicken & egg dilemma (which is why I think establishing business cases is more important than technical merit).

But converting small C structures into bitfields packed into one (or more) 128b words alone would be on the order of 5%+ of workloads, given how common small structures are. What data do I have to show most structures are small & <128b? The best proxy data I can think of is cache miss rates with increasing cache block sizes (which analyses the effect of spatial locality, ie: multiple accesses within a given sized block) - as shown below, the spatial locality effect is most pronounced for memory access within 16 byte blocks and any bigger than this the effect levels off. I admit this data is an imperfect proxy (it would also include array & scalar accesses, not just structs) but anyone with better data is welcome to contribute! Nonetheless, in the structs example, the comparison isn't whether 128b ADD is 1 cycle or 5, the comparison is the speed of register based struct manipulation vs memory based struct manipulation.

Higher associativity means more complex hardware  But a highly-associative cache will also exhibit a lower miss rate —Each set has more blocks, so there's. - ppt download

In relation to my fixed size strings (up to 16 chars) example replacing byte array strings, the comparison again isn't speed of a 128b ADD, it is a single load/store/SLT/MOV/CTZ of 128b vs multiple load/stores/compare/strdup/strlen of individual bytes. The former is purely 128b register based manipulation of the string, whereas the latter requires memory access & string library API function calls. How many strings are 16 chars or less? Again, I don't have good data but using English word length as a proxy, the nearly the entire English dictionary can fit inside 16 chars!:

Twitter and English letter frequency count · Saeed Abdullah

On Monday, 6 February 2023 at 8:58:05 am UTC+11 MitchAlsup wrote: