Computer on a chip...

unixr...@gmail.com

unread,

May 15, 2013, 2:07:48 AM5/15/13

to

Theoretically, I imagine that it is possible to have a complete computer occupying a single die, of course an MCU is basically this, but taking it to a bigger scale, let's say take a modern X86 processor, and on the die give it RAM and all the peripherals needed to eliminate other IC's on a motherboard. It seems to me that advantages are obvious (reduced latency, more compact motherboard, ect) but what are the problems with realizing this? Complexity? Reliability?

acd

unread,

May 15, 2013, 2:25:35 AM5/15/13

to

Look at the Raspberry Pi.
Of course it is possible, but the size of connectors will determine the size of your system.
What good is a computer that you cannot connect?
You can go wireless of course, WLAN and Bluetooth.

Still, I would think that the single chip is not what matters.
Because in the moment, you would need a ton of passives, inductors, capacitors around it.

Andreas

Anton Ertl

unread,

May 15, 2013, 6:05:32 AM5/15/13

to

unixr...@gmail.com writes:
>Theoretically, I imagine that it is possible to have a complete computer oc=
>cupying a single die, of course an MCU is basically this, but taking it to =
>a bigger scale, let's say take a modern X86 processor, and on the die give =
>it RAM and all the peripherals needed to eliminate other IC's on a motherbo=
>ard. It seems to me that advantages are obvious (reduced latency, more comp=
>act motherboard, ect) but what are the problems with realizing this? Comple=
>xity? Reliability?

We have been on this path for a long time. Look at a motherboard from
1990, and compare it to one from 2000 and a modern one. You will see
that a lot of functions that were not there at all or in extension
cards first went onto the motherboard, and finally into the chipset.
Yes, eventually we will have all the logic in one chip, not sure about
the RAM.

Problems: Less flexibility, although I guess we will get full-featured
chips with various features turned off in the cheaper models, and you
can buy keys to turn the features back on instead of traditional
expansion capabilities.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Paul A. Clayton

unread,

May 15, 2013, 7:53:52 AM5/15/13

to

On May 15, 2:25 am, acd <acd4use...@lycos.de> wrote:
> Am Mittwoch, 15. Mai 2013 08:07:48 UTC+2 schrieb unixr...@gmail.com:
>
> > Theoretically, I imagine that it is possible to have a complete computer
> > occupying a single die, of course an MCU is basically this, but taking it to a
> > bigger scale, let's say take a modern X86 processor, and on the die give it RAM
> > and all the peripherals needed to eliminate other IC's on a motherboard. It
> > seems to me that advantages are obvious (reduced latency, more compact
> > motherboard, ect) but what are the problems with realizing this? Complexity?
> > Reliability?
>
> Look at the Raspberry Pi.

Raspberry Pi has the memory on separate ICs within a
single _package_. (Yes, this is tighter integration with
the same kinds of advantages as on-chip integration, but
it is _technically_ not the same.)

Paul A. Clayton

unread,

May 15, 2013, 8:46:33 AM5/15/13

to

[Caveat: The comments below may sound more informed than
they actually are. Although a moderate amount of knowledge
and some reasoning can derive somewhat accurate conclusions,
the complexities of technology and economics may sometimes
trip up even professionals, so consider the source--a
"hobbyist", not a professional or trained academic.]

In some cases integration can increase reliability (partially
from co-design and more unified responsibility, but more
finely controlled and automated manufacturing and reduced
transportation of sensitive components *might* also be
factors). (Also, on the one hand, thermal expansion and
contraction might be less problematic with shorter distances;
but, on the other hand, the tolerances might be even smaller
and cooling more problematic.)

One factor that discourages integration onto a single IC is
that different components would be optimally manufactured
with different manufacturing technologies.

E.g., current DRAMs use different process technology and
fewer metal layers than microprocessors. It would be
possible to manufacture DRAM in a logic-oriented process
(such is called embedded DRAM [eDRAM] and is used by IBM
in its POWER7), but such tends to be less dense and the
extra metal layers (that are not necessary for the memory
portion) add some cost to the memory and the different
process steps needed for DRAM increase the cost of the
processor (though IBM exploited its trench capacitors to
help in balancing power distribution). In the opposite
direction, implementing a processor in a DRAM-oriented
process has recently been announced for a specialized
product (I do not remember the company or product name off
hand), but such limits the number of metal layers for the
processor and the quality of the logic circuits. (For the
intended use, system performance is limited by memory
bandwidth and parallelism is abundant, so reasonably
numerous wimpy cores are not a problem.)

Some of the memory technologies that are under development
are more friendly to logic processes (and in some cases
less influenced by higher temperatures--DRAM generally
needs to be refreshed more frequently at higher
temperatures), so integration onto a single chip might be
more practical in the not-too-distant future.

Aside from different components potentially being more
optimally manufactured with different process types, it
can also be more economical to manufacture some components
in less advanced technology. In some cases, improving
energy efficiency and performance is less critical than
reducing cost or aspects of the chip require a fixed
minimum area (e.g., the size of solder balls has not been
decreasing as rapidly as the size of transistors and some
analog components have more-or-less fixed sizes), so
there can be a cost advantage to not integrating components.

It may also be desirable to separate some components which
communicate with the outside world to reduce interference.

Improvements in integration technology (and increased use)
will likely make tighter integration more common. The
increased fixed costs associated with advanced manufacturing
are also likely to increase the emphasis on mass production
and power density constraints may encourage dark silicon;
both of these factors may encourage integration of components
that are not highly utilized (whether by product [very coarse
grained time scale] or by finer-grained time scales
[assigned purpose--perhaps on the order of a year--to
current task--potentially on the order of microseconds and
smaller]).

Noob

unread,

May 15, 2013, 8:56:24 AM5/15/13

to

Paul A. Clayton wrote:

> E.g., current DRAMs use different process technology and
> fewer metal layers than microprocessors. It would be
> possible to manufacture DRAM in a logic-oriented process
> (such is called embedded DRAM [eDRAM] and is used by IBM

> in its POWER7) [...]

I think it is worth pointing out that Intel chose to provide
128 MB of eDRAM on GT3e variants of Haswell.

http://www.anandtech.com/show/6911/intels-return-to-the-dram-business-haswell-gt3e-to-integrate-128mb-edram

Regards.

Paul A. Clayton

unread,

May 15, 2013, 12:53:48 PM5/15/13

to

On May 15, 8:56 am, Noob <r...@127.0.0.1> wrote:
> Paul A. Clayton wrote:
> > E.g., current DRAMs use different process technology and
> > fewer metal layers than microprocessors. It would be
> > possible to manufacture DRAM in a logic-oriented process
> > (such is called embedded DRAM [eDRAM] and is used by IBM
> > in its POWER7) [...]
>
> I think it is worth pointing out that Intel chose to provide
> 128 MB of eDRAM on GT3e variants of Haswell.
>

> http://www.anandtech.com/show/6911/intels-return-to-the-dram-business...

I agree. I was tempted to mention it. On the other hand,
it is not clear how much _logic_ is included in these
Intel memory chips. (An earlier product--late 1990s?--
called 3D-RAM [at least partially from Mitsubishi] was
ordinary DRAM with simple ALUs to translate some simple
Read-Modify-Write traffic into write-with-modification.
With modern systems, such simple functionality might not
be particularly useful, but there might be some logic
that would still be useful to add to an almost entirely
storage chip. Applying simple textures/sprites through
very simple logic to keep communication within the chip
might be possible, saving some energy in some not uncommon
use cases [e.g., moving a cursor/dragging an icon??]; with
three chips perhaps separation by primary color could
avoid inter-chip communication?)

If Intel is making them to provide a custom interface
to fast memory (and fill some fab capacity), these chips
might be "just memory chips". (With multiple chips
[correct?], it might not be practical to support display
refresh directly from such memory [unless each chip would
be linked to a separate display with distinct frame
buffers]--which could reduce energy use--, not that such
would require that much extra logic.)

128MiB capacity does not seem likely to be very interesting
for processor-in-memory uses, especially if it is split
among three chips, so I am _guessing_ that there is not much
logic on these memory chips. However, such could be a
low-risk start that might bring Intel to a more interesting
use of eDRAM.

Quadibloc

unread,

May 16, 2013, 7:08:24 AM5/16/13

to

On May 15, 12:07 am, unixrun...@gmail.com wrote:
> but what are the problems with realizing this?

It's certainly true that a modern x86 CPU can be put on a die leaving
some room over. That's why we have four-core chips.

Yield limits the maximum size of a die - a single defect tends to ruin
the whole chip.

There is a difference in process technology between DRAM and
microprocessors; this has been overcome in some designs cited here,
and so IBM has some chips which use on-chip DRAM instead of
conventional cache in order to have a larger cache, but to combine the
two, compromises are necessary.

And while one could easily enough put several megabytes of memory on a
chip with a CPU, today RAM for most PCs is measured in gigabytes.
Yield is what prevents one from making enormous chips, and so a system
on a chip is limited to a small memory - so although it could run
faster, it would be a weaker system, not a more powerful one.

Cache memory already allows conventional chips to run almost as fast
as if all their large external memory ran as fast as the on-chip
cache, which is why doing away with external DRAM is a minus, not a
plus.

John Savard

Paul A. Clayton

unread,

May 16, 2013, 9:17:48 AM5/16/13

to

On May 16, 7:08 am, Quadibloc <jsav...@ecn.ab.ca> wrote:
[snip]

> Yield limits the maximum size of a die - a single defect tends to ruin
> the whole chip.

Except that redundancy allows defects to be tolerated. E.g.,
a four-core chip with one defective core could still be sold
as a three-core chip.

Memory is perhaps the easiest component to make redundant.
One can provide redundancy at the column and row level *and*
at the array level. Having a large memory is not a major
problem in terms of yield.

> And while one could easily enough put several megabytes of memory on a
> chip with a CPU, today RAM for most PCs is measured in gigabytes.
> Yield is what prevents one from making enormous chips, and so a system
> on a chip is limited to a small memory - so although it could run
> faster, it would be a weaker system, not a more powerful one.

Distances could also be greater in a single plain of circuitry.
Some degree of true 3D integration could better performance than
a single plain with the same amount of circuitry. (Of course,
with at least some forms of 3D integration, it becomes more
difficult to clearly define what constitutes part of a chip and
what is part of a multi-chip module.)

> Cache memory already allows conventional chips to run almost as fast
> as if all their large external memory ran as fast as the on-chip
> cache, which is why doing away with external DRAM is a minus, not a
> plus.

Of course, caches do not work well on all workloads, and have
tag overhead (area, power, and latency). In addition, for a
cache one is more likely to trade density/capacity for speed.

Given the tiny size (128 MiB) of Intel's Haswell eDRAM memory
(and that might be using multiple chips), it seems likely to
be a while before main memory is integrated onto the processor
chip (even with good distributed computing software the cost
effectiveness of such would seem likely to be problematic for
most uses; two PiM chips might be workable, but eight or more
*seems likely* to require too complex/expensive inter-chip
interconnect).

Noob

unread,

May 16, 2013, 10:08:34 AM5/16/13

to

Paul A. Clayton wrote:

> Given the tiny size (128 MiB) of Intel's Haswell eDRAM memory

Perhaps Intel hopes 128 MiB is enough on 1366x768 res, with some
whiz-bang (e.g. MSAA) disabled? (But then the i7-4770R doesn't make
much sense...)

NB : desktop CPUs with GT3e are soldered to the motherboard (BGA)

https://en.wikipedia.org/wiki/Haswell_%28microarchitecture%29

Regards.

Quadibloc

unread,

May 17, 2013, 7:16:17 AM5/17/13

to

On May 16, 7:17 am, "Paul A. Clayton" <paaronclay...@gmail.com> wrote:

> Except that redundancy allows defects to be tolerated. E.g.,
> a four-core chip with one defective core could still be sold
> as a three-core chip.
>
> Memory is perhaps the easiest component to make redundant.

That is true enough. Even there, though, history records many failed
attempts at things like "wafer-scale technology".

It is true caches have limitations and overhead. I would be very
pleased to have a big chip which not only has all the memory on the
chip, but, say, a 4,096-bit-wide path between the memory and the CPU.

John Savard

exop...@gmail.com

unread,

May 17, 2013, 12:14:01 PM5/17/13

to

On Wednesday, May 15, 2013 7:53:52 AM UTC-4, Paul A. Clayton wrote:
> Raspberry Pi has the memory on separate ICs within a
>
> single _package_. (Yes, this is tighter integration with
>
> the same kinds of advantages as on-chip integration, but
>
> it is _technically_ not the same.)

It's package on package. Two separate packages on top of each other. Not really
a single package, although I guess you could look at it that way if you purchase
the SoC and RAM already soldered together.

The memory package itself does have multiple stacked dies to achieve the
necessary density and interface width. There have even been combo chips that
stack DRAM and NAND flash.

Chris M. Thomasson

unread,

May 17, 2013, 4:24:47 PM5/17/13

to

> wrote in message
> news:8f156863-3e27-416c...@googlegroups.com...

I have always wanted the chip makers to create their own memory and
try to directly integrate it with several multi-core processors.

Can you directly integrate, say 8GB of memory and 2 multi-core processors
on a pluggable board? The motherboard would have high-speed bays for
these boards and several processors dedicated to system work like
channeling/queuing data streams from bay-to-bay.

If you have 8 of them, then you have a system with 64GB of memory and
16 multi-core chips.

Can something like that work?

Sorry if this comes across as complete non-sense... I am not a hardware guy!

;^(

Robert Wessel

unread,

May 17, 2013, 6:27:45 PM5/17/13

to

That's basically the "book" design IBM uses on mainframes. Current
mainframes can hold up to four books, each of which contains
processors (six multi-core chips, plus L4 cache in an MCM), memory
(all memory is on-book), and the top of the I/O hierarchy.

That is a bit bigger than you specified, with up to 30 (active) cores
and .75TB of memory on a book.

Chris M. Thomasson

unread,

May 17, 2013, 8:25:17 PM5/17/13

to

> "Robert Wessel" wrote in message
> news:f8bdp89f53f3f8opt...@4ax.com...

I was thinking along the lines of the 8GB of memory being directly
integrated with
the 2 processors such that the main 8gb "local" memory and its CPU's are
one...

Literally a fraction of a millimeter/nanometer away. 3d stacked perhaps.
The cards would not be that big at all. No reason main memory has to be far
away.

8GB and two multi-core processor plug-in board would be the size of half a
pack
of gum.

Perhaps, you one could use fractals to efficiently stack the 8gb-16gb of
memory and
local chip infrastructure, and get rid of heat...

Robert Wessel

unread,

May 18, 2013, 4:56:19 AM5/18/13

to

On Fri, 17 May 2013 17:25:17 -0700, "Chris M. Thomasson"

I basically agree - some amount of local memory (but still globally
visible and cache coherent) that is significantly faster than the
global memory would be a significant performance boost. Most OS's are
already NUMA aware, so tweaking them to improve page placement to take
advantage would not be a big deal.

On the flip side cache accomplishes much the same thing, in a
(usually) more transparent way. This would only make sense if you
could include significantly more local RAM than you could cache, and
if that RAM were, in fact, significantly faster than external RAM. You
should certainly be able to make RAM significantly denser than cache.

Some of the has been done. For example, IBM's EC12 MCMs include 384MB
of L4 in the MCM, the original Pentium Pros did something similar, and
some versions of Haswell integrate 128MB of RAM for the GPU into the
package. Many cases of multiple CPU dies in a package exist too
(again, the afore mentioned IBM MCMs, as well as several x86s).

Whether or not stacked dies or MCMs are the correct approach is an
interesting question. Stack dies have an inherent cooling problem,
although the interconnect distances are shorter than for an MCM.

At this time, though, I don't think we could hope for more than about
1GB of in-package RAM.

>8GB and two multi-core processor plug-in board would be the size of half a
>pack
>of gum.

The problem with that small a package is I/Os and power. You won't
have enough pins to support external DRAM or enough pins to power the
processor (not to mention the power needed by the now on-package RAM).
You could do a narrow and slow DRAM interface (and hope that the
internal RAM will, in fact, absorb the vast majority of accesses), and
use a higher voltage on fewer power pins and then put the voltage
regulator on package too (Intel is doing this on some chips now). A
really small package will present cooling problems too, although those
are kept under control be the power limitations...

>Perhaps, you one could use fractals to efficiently stack the 8gb-16gb of
>memory and
>local chip infrastructure, and get rid of heat...

eh?

Quadibloc

unread,

May 18, 2013, 7:05:19 AM5/18/13

to

Well, if I wanted to transmit a clock signal to every part of a chip
with the same delay wherever it is used, I would be able to solve that
problem by using a trace that looked like little versions of the
letter H on the points of larger versions...

A curve like that for the flow of liquid inside a heat sink would also
have its virtues in cooling an entire die uniformly.

John Savard

Paul Colin de Gloucester

unread,

May 18, 2013, 10:01:57 AM5/18/13

to

On May 14th, 2013, UnixR...@GMail.com sent:
|--------------------------------------------------------------------|

|--------------------------------------------------------------------|

You might enjoy reading the SoC (System-on-a-Chip) literature.

Thomas Womack

unread,

May 18, 2013, 12:28:21 PM5/18/13

to

In article <4dfep8hn5ecppfmbl...@4ax.com>,

Robert Wessel <robert...@yahoo.com> wrote:
>At this time, though, I don't think we could hope for more than about
>1GB of in-package RAM.

We already have 2GB of on-package RAM for some cellphone apps, though
that's BGA-on-BGA mounting and no faster than contemporary PC main
memory.

http://www.hardkernel.com/renewal_2011/products/prdt_info.php?g_code=G135341370451
is one you can get off the shelf for a fairly trivial sum.

http://www.chipworks.com/blog/technologyblog/2012/07/05/sonys-ps-vita-uses-chip-on-chip-sip-3d-but-not-3d/
is about the best deployed chip-stacking; 1080 pads (on 45-micron
pitch) between the processor and the video RAM in a Sony handheld
console.

I look forward to seeing Chipworks dismantle a Haswell-R; I suspect it
will be fast-SERDES rather than massively-parallel connection from CPU
to eDRAM, it'll be interesting to know how much silicon space you can
save if your SERDES only needs to drive across a few millimetres of
tracks on a BGA substrate that you can co-design.

Tom

MitchAlsup

unread,

May 18, 2013, 2:51:03 PM5/18/13

to

On Saturday, May 18, 2013 6:05:19 AM UTC-5, Quadibloc wrote:
> Well, if I wanted to transmit a clock signal to every part of a chip with
> the same delay wherever it is used,

I used to make a joke about just putting the entire chip inside a magnetron
and sticking a wire up everywhere you wanted a clock signal.

Mitch

Ivan Godard

unread,

May 18, 2013, 3:24:59 PM5/18/13

to

The boundary between a signal on a wire and RF in a waveguide blurs,
even in old tech. On the B6500 circa 1970, the Gardner-Denver backplanes
that the cards fit into (roughly five flip-flops per card) were subject
to spurious ringing; frequency and hot spot varied from machine to
machine due to parts tolerance variation. Jake Vigil, the brilliant
hardware engineer who actually built the boxes, came up with the solution.

For each new machine off the production line he would go over the 3-12
rack-sized back planes with a RF sniffer, the kind you would use to
locate parasitic oscillations in ham radio. When he found a hot spot he
would take one of the ferrite cores (small rings of magnetizable
material, used in "core" memory) from the prior generation of memory,
which had exactly the right internal diameter to fit over a pin on the
card socket. Slipping the core on the pin dramatically changed the local
frequency response, and the hot spot would migrate to somewhere else on
the plane. Enough cores and the ringing stopped.

Every system Burroughs shipped had 5-10 ferrite cores on (apparently)
random pines scattered across the backplanes. They had a field
reputation as being electrically the most stable computers of that vintage.

Ivan

Noob

unread,

May 20, 2013, 7:15:26 AM5/20/13

to

I just came across RWT's take on the subject:

http://www.realworldtech.com/intel-dram/

RWT wrote:

> To put this in perspective, Intel has compared the Haswell GT3e
> performance to the discrete Nvidia GT 650M, which is used in the 15"
> Macbook Pro with 1GB of GDDR5. The GT 650M can also be configured
> with 2GB of DDR3 and is used by many other OEMs including Lenovo,
> Asus, and HP for notebooks that are priced around $1000. Intel's
> 128MB custom eDRAM solution has similar bandwidth, but lower power
> and less board area, thus reducing the demand for GDDR5/DDR3 by
> supplanting it with Intel's custom silicon.

Regards.

mac

unread,

May 20, 2013, 1:55:29 PM5/20/13

to

>> Well, if I wanted to transmit a clock signal to every part of a chip with
>> the same delay wherever it is used,

You don't really need the same delay. Just the same delay modulo the clock
period. The latter is usually bounded

--
mac the naïf