48 bit Architecture

187 views
Skip to first unread message

jfcg...@gmail.com

unread,
May 26, 2020, 5:05:02 PM5/26/20
to Unum Computing
Hi,

What would you think of an imaginary 48-bit architecture with native support for:

- 48 bit pointers
- 48 bit integers
- 16 / 32 / 48 bit posits

that will target embedded, mobile, desktop & server market? all with single architecture? Designed with concurrency and efficiency in mind as first class citizens..
Does that make sense?

Theodore Omtzigt

unread,
May 27, 2020, 8:17:35 AM5/27/20
to Unum Computing
Makes sense, but is unlikely to happen unless a big player like Samsung, Apple, Intel, or Huawei makes it so. The technology lift to move hardware and software into that direction will be measured in the 10's of billions.

Interestingly, RISC-V tries to cover the same spectrum, but its long double is 128-bits. This makes it currently noncompetitive with ARM, which uses a 64bit long double alias. The size and complexity of the development environment of a 'proper' RISC-V spec is at least twice that of the ARM ecosystem, which for embedded is a key differentiator. So, right now, the RISC-V compiler stack just disables the 128-bit mode, and thus it NOT a single architecture that covers everything from embedded to cloud server. 

In general, given the fact that for embedded power and size matter, whereas for server, performance and capability matter, it will be difficult to create a single environment that covers all. Otherwise stated, the OS and compiler stack will be optimized differently for the embedded side versus the server side, and thus creating a bifurcation in the ABI.

John Gustafson

unread,
May 27, 2020, 9:58:54 AM5/27/20
to jfcg...@gmail.com, Unum Computing
48-bit posits with es = 2 (the proposed choice for all posit sizes in the Posit Standard 4.0 Draft) would have almost 14 decimals of accuracy in the "sweet spot" (magnitudes between 1/16 and 16) and a dynamic range about about 4e–56 to 2.5e+55. Sounds pretty good for HPC codes currently using 64-bit floats.

The big downside of using a word size that is not a power of 2 is that division becomes very ugly. If you know a binary size and want to know how many words that is, you need to divide by 48. There is probably no way, ever, to make this as cheap as a shift operation. In the early IBM and UNIVAC days, people learned about this cost with their 36-bit architectures, leading to IBM's wrenching decision to go to 32-bit architectures with their System/360 mainframe family in the 1960s.

Whenever dabbling in architectures where something is not an integer power of 2, make sure you will seldom need to divide! 

John

--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/bee81482-5049-4029-a2d7-1c2d42bca86d%40googlegroups.com.

James Brakefield

unread,
Jun 4, 2020, 7:38:03 PM6/4/20
to Unum Computing
The subject of 24/48-bit word size has been discussed on comp.arch.  DRAM SIMS are all of power-of-2 word size and not likely to change.  The problem of 24 or 48-bit data not being aligned on power-of-2 boundaries is not a deal breaker and DRAM SIMS have byte write enables.

There is memory savings to be had by using: 24-bit RISC instructions, 24 & 48-bit floats and a Unicode UTC-12 character set.  A 12-bit posit would likely also have memory savings.  My preference is for 8, 12, 16, 24, 32, 48 and 64-bit data be supported at the instruction set level and software libraries.  For addressing 24 or 48 bit data using an index register one needs a multiply by three (same as 1X + 2X, an add).

For FPGA experimentation there are RISC designs: the incomplete https://opencores.org/projects/rois and https://revaldinho.github.io/opc/opc8spec.html
Also a non-public Forth design by C.H.Ting.  Several DSP chips used 24-bit data size.

Jim Brakefield


On Wednesday, May 27, 2020 at 8:58:54 AM UTC-5, johngustafson wrote:
48-bit posits with es = 2 (the proposed choice for all posit sizes in the Posit Standard 4.0 Draft) would have almost 14 decimals of accuracy in the "sweet spot" (magnitudes between 1/16 and 16) and a dynamic range about about 4e–56 to 2.5e+55. Sounds pretty good for HPC codes currently using 64-bit floats.

The big downside of using a word size that is not a power of 2 is that division becomes very ugly. If you know a binary size and want to know how many words that is, you need to divide by 48. There is probably no way, ever, to make this as cheap as a shift operation. In the early IBM and UNIVAC days, people learned about this cost with their 36-bit architectures, leading to IBM's wrenching decision to go to 32-bit architectures with their System/360 mainframe family in the 1960s.

Whenever dabbling in architectures where something is not an integer power of 2, make sure you will seldom need to divide! 

John
On May 27, 2020, at 5:05 AM, jfcg...@gmail.com wrote:

Hi,

What would you think of an imaginary 48-bit architecture with native support for:

- 48 bit pointers
- 48 bit integers
- 16 / 32 / 48 bit posits

that will target embedded, mobile, desktop & server market? all with single architecture? Designed with concurrency and efficiency in mind as first class citizens..
Does that make sense?

--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-co...@googlegroups.com.

Robert Finch

unread,
Jun 4, 2020, 11:00:57 PM6/4/20
to Unum Computing

I’ve a 52-bit architecture based around 13-bit bytes (with variable length instructions). I’ve been toying with the idea of retro-fitting it to use 52-bit posits instead of regular fp. There are about 8-bits available for an immediate posit value. The 52-bit core is about 30% smaller than a 64-bit core would be while offering much the same functionality.

Joe Duarte

unread,
Jun 9, 2020, 7:18:06 PM6/9/20
to Unum Computing
I like 40-bit better. That gives you 1 TiB of memory addressing. 40-bits would be ideal for mobile, and maybe desktop, other than the power-of-two problem. (Servers need more memory long-term, as some of them are already hitting 1 TiB.) I think 40-bit integers, floats, and posits would be ideal for lots of everyday computing (and for encoding image formats and graphics). In this scenario I'd also have 20-bit number types instead of 16-bit. I'm also intrigued by logarithmic number systems.

It's not optimal to have one architecture for embedded, mobile, desktop, and server. Embedded doesn't need 48 bits, or even 40. Or even 32 in many cases.

Note that the "64-bit" address space on Intel/AMD is actually 48-bit or thereabouts on actual systems.

JD

MitchAlsup

unread,
May 23, 2021, 6:45:36 PM5/23/21
to Unum Computing
On Tuesday, June 9, 2020 at 6:18:06 PM UTC-5 Joe Duarte wrote:
I like 40-bit better. That gives you 1 TiB of memory addressing. 40-bits would be ideal for mobile, and maybe desktop, other than the power-of-two problem. (Servers need more memory long-term, as some of them are already hitting 1 TiB.)

48-bits is already too small for servers, and some of the larger desktops, and it is definitely too small for clusters with semi-shared main memory (remote DMA paging).

I think 40-bit integers, floats, and posits would be ideal for lots of everyday computing (and for encoding image formats and graphics). In this scenario I'd also have 20-bit number types instead of 16-bit. I'm also intrigued by logarithmic number systems.

It's not optimal to have one architecture for embedded, mobile, desktop, and server. Embedded doesn't need 48 bits, or even 40. Or even 32 in many cases.

Note that the "64-bit" address space on Intel/AMD is actually 48-bit or thereabouts on actual systems.

Yes, but anyone designing a machine, today, intended to target a similar market as x86 should not stop anywhere short of 64-bit virtual and 64-bit physical address spaces, along with native (i.e., clean) support for hypervisors running several operating systems, each. It would also be wise to have a portion of the virtual address space configured such that one can access another (foreign) address space (map).

John Gustafson

unread,
May 23, 2021, 7:52:32 PM5/23/21
to MitchAlsup, Unum Computing
There is a very simple reason why computer architectures favor storage for which the bit size is a power of 2: Address calculation

If you need to access X[i] where X is an indexed array of, say, 40-bit values, then you need to computer 40 × i plus the base address of X. Integer multiplication is slow and expensive in general. If the data are an integer power of 2, the multiplication can be done with a left shift, one of the cheapest instructions there is.

While it occurs less often, there are times when an address must be converted back to an index. Dividing by 40 is even more painful than multiplying by 40, whereas with power-of-two sizes, it's simply a right shift.

I've had to explain this to computer architects of multicore processors when they produce things like 48-core processors. Data parallelism requires distributing arrays across processors, and by providing a number of cores that is not a power of 2, they force the use of integer multiply and divide operations in parallel application code.

That said, right-sizing the precision makes a lot of sense for FPGAs, custom AI chips, and other situations where saving bit width is more important than low-cost indexing into arrays.

John

-- 
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/1bc30cb6-b952-497c-a827-0c73de2c74d1n%40googlegroups.com.

Anthony Di Franco

unread,
May 23, 2021, 8:08:43 PM5/23/21
to James Brakefield, Unum Computing
Not sure whether or not this is related to the architecture by Ting to which you refer, but there is a lineage of Forth processors and processor arrays that use non-power-of-two word sizes:

To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/427d4898-34c7-4039-a7ec-d816ab24abf3o%40googlegroups.com.

MitchAlsup

unread,
May 23, 2021, 9:24:13 PM5/23/21
to Unum Computing
On Sunday, May 23, 2021 at 6:52:32 PM UTC-5 johngustafson wrote:
There is a very simple reason why computer architectures favor storage for which the bit size is a power of 2: Address calculation

If you need to access X[i] where X is an indexed array of, say, 40-bit values, then you need to computer 40 × i plus the base address of X. Integer multiplication is slow and expensive in general. If the data are an integer power of 2, the multiplication can be done with a left shift, one of the cheapest instructions there is.

In the distant past, one could get away with non-powers of 2 {Bouroughs, CDC, DEC PDP-6, PDP-8, PDP-9, PDP-10, Stretch, and a few others.} Invariably these were word oriented machines instead of byte oriented machines in order to escape from these multiplies (and occasional divides).

While it occurs less often, there are times when an address must be converted back to an index. Dividing by 40 is even more painful than multiplying by 40, whereas with power-of-two sizes, it's simply a right shift.

Word oriented machines started to fall by the wayside beginning in 1963 with the introduction of IBM 360. Word oriented machines fell off the design curve as vector machines imploded in the late 1990s and early 2000s. 

I've had to explain this to computer architects of multicore processors when they produce things like 48-core processors. Data parallelism requires distributing arrays across processors, and by providing a number of cores that is not a power of 2, they force the use of integer multiply and divide operations in parallel application code.

{When one has the ability to put 8 cores on a die, the proper number of cores is 9 one redundant core to make yield livable. But I digress}

Data parallelism requires placing DRAM DIMMs on as many DRAM channels as one can afford pins. {Lets face it, we are not going to get to design DRAMs, nor their DIMMs, {at best we can participate with an industry of DRAM suppliers and consumers.} The easiest way to afford more pins is to build DRAM controllers into all "CPU" chips and a narrow high clock rate interface, leaving the more pins available to the DRAM channels. These CPU chips are then interconnected into parallel multi-core systems with lots of DRAM at moderate latency costs. {I say this as the more integrated "CPU" arena is doing more and more High Bandwidth Memory which don't use "pins" as much as inter-die connection points.}
Reply all
Reply to author
Forward
0 new messages