In relation to your last question about 2^65 sized DRAM systems, my guess is that's likely to be 20+ years away (if not longer - there seems to be a slow down in memory density increase from 2003 onwards). The historical numbers are as follows:
8080 in 1974 = 16b
8086 in 1978 = 20b = +1b per year
80286 in 1982 = 24b address size = +1b per year
80386 in 1985 = 32b address sizes = +2b per year
AMD64 Opteron in 2003 = 48b virtual address space = +1b per year
current systems = approx 57 bits virtual address space = +0.5b per year
For this reason, I don't see the need for 128b memory addresses for at least a decade. I agree my examples (128b data, but retaining 64b pointers in an X64ABI) are not as technically revolutionary as going to 128b addresses, but I think 128b data is the use case the "market" will adopt fastest. I should emphasise moving from X64ABI to the full blown 128ABI is a software migration issue (not hardware), and my aim is first to get 128b capable hardware installed first using X64 ABI - then take time to migrate software to 128ABI.
I agree from a technology viewpoint, the 8086->80286->80386->amd64 transition was a dog's breakfast. But commercially they were all successful due to the incumbent power of the x86 installed base, and that's why I think RV128 should provide an upgrade path from x86_64 (and RV64 should be bypassed).
2. Yes, much software will be stuck on 4 byte int - this is the whole business case for Alternative RV128 (vs "Original" RV128):
I also 100% agree with your comments about "world is replete with SW that cannot deal with integers not being 32-bits" - that's why Alternative RV128 has 16 registers which remain only 64b in size, as we simply don't need *all* 32 registers to be 128b. My Alternative RV128 assumes the data size model as int = 32b for the foreseeable future, pointers = 64b for next 10 yrs then 128b thereafter, and long long or long long long = 128b immediately "here & now".
[I will provide a more detailed look at the Alternative RV128b 16x64b + 16x128b register file in another forum posting very soon].
3. Marketing power of 128b is important, not just technical merit
(Silicon cost of 128b is trivial/not hard, but marketing power of 128b upgrade path is potentially huge):
I disagree though with your estimate that only 1% of processing will use the 128b datapath. Even if it is, I wouldn't underestimate the marketing power of 128b nonetheless - remember, I am looking at this from a business case & market adoption point of view, not purely technical merit.
Is marketing power an illegitimate consideration? I would say *no*. Users care about future proofing. Even if they only need 128b capability for 1% initially, they want the optionality/reassurance that their system won't go obsolete in 3-5 yrs if their need for 128b has increased. The silicon cost of this optionality/reassurance is trivial/not hard, so why not provide it?
But even on the issue of technical merit, the 128b techniques I outlined are under-used, as current software can't assume the presence of 128b hardware. So it's a chicken & egg dilemma (which is why I think establishing business cases is more important than technical merit).
But converting small C structures into bitfields packed into one (or more) 128b words alone would be on the order of 5%+ of workloads, given how common small structures are. What data do I have to show most structures are small & <128b? The best proxy data I can think of is cache miss rates with increasing cache block sizes (which analyses the effect of spatial locality, ie: multiple accesses within a given sized block) - as shown below, the spatial locality effect is most pronounced for memory access within 16 byte blocks and any bigger than this the effect levels off. I admit this data is an imperfect proxy (it would also include array & scalar accesses, not just structs) but anyone with better data is welcome to contribute! Nonetheless, in the structs example, the comparison isn't whether 128b ADD is 1 cycle or 5, the comparison is the speed of register based struct manipulation vs memory based struct manipulation.
In relation to my fixed size strings (up to 16 chars) example replacing byte array strings, the comparison again isn't speed of a 128b ADD, it is a single load/store/SLT/MOV/CTZ of 128b vs multiple load/stores/compare/strdup/strlen of individual bytes. The former is purely 128b register based manipulation of the string, whereas the latter requires memory access & string library API function calls. How many strings are 16 chars or less? Again, I don't have good data but using English word length as a proxy, the nearly the entire English dictionary can fit inside 16 chars!:
If I understand your list below correctly, your concerns are the RISC V approach to synthesizing arbitrary 64b+ constants, or calling/accessing 64b+ absolute addresses/arrays/functions.
My alternative RV128 proposal doesn't (substantially) change RISC V approaches to these problems, but can help (slightly) in the following ways:
(a) You mention limitations of 12b offsets. My Alternative RV128 (for data >=32b word size) would use 4 byte multiples, so in effect 12b offsets have a range of 2^14 bytes.
(b) For synthesising constants, by only using 6% of opcode space in the base RV128, I leave available the opcodes for the equivalent of x86_64's MOVABSQ instruction, ie: something like the following:
49 bc ca cc cc cc cc cc cc 0c movabsq $0xcccccccccccccca,%r12
The above is a 10 byte long instruction (which is why I hesitate to endorse it, as so far everything in Alternative RV128 is a fixed 4 byte instruction length). It could be an extension of Alternative RV128. Of course, the RISC V approach would instead be to encourage instruction fusion of a sequence of LUI/ADDI to synthesize the 64b+ constant. I have also included the PACK instruction (in the base Alternative RV128 spec, not as a Bitmanip add-on) to help further with this approach.
(c) I should also emphasise a key initial use of Alternative RV128 is with the "X64" ABI, which uses 128b for data, but for memory uses 64b pointers, so at least Alternative RV128 won't make the above problems worse than it currently is in RV64.
I hope that answers your questions but if I have misunderstood you, apologies & could you please explain/discuss further?
Best regards
Xan