Multi-core CPU Production Economics

116 views
Skip to first unread message

Martin Vahi

unread,
Jun 16, 2018, 12:19:22 PM6/16/18
to ParaSail Programming Language

It's known that the greater the die area, the greater the
probability that at least something is wrong at the die.
That is to say, the bigger is the die of an individual
chip, the lower the yield. The lower the yield, the
more expensive those chips have to be that get the
dies that work "sufficiently well" to be shipped.

It's also known that the more complex a single CPU core
is, the more die area the single CPU core consumes.
May be I'm mistaken, but if I look at the

https://www.amd.com/en/ryzen-pro
(archival copy: https://archive.is/FO1J7 )

then I suspect that the AMD Ryzen CPUs
include even some neural network implementation
to optimize single core pipeline. A citation from the
marketing materials:

---citation--start---
Neural Net Prediction

Increased efficiency from a true AI that
evaluates the current application
and
predicts the
next steps before they are needed.
---citation--end----

That has to consume some die area even just
for storing the neural network neuron states.
Probably (I'm not sure, if that's the right place) from

https://www.bunniestudios.com/blog/?page_id=1022
(archival copy: https://archive.is/Kaqu0 )

I read that the reason, why Flash memory cards
are so cheap is that their manufacturing costs are
reduced by skipping the testing of the Flash dies
and by having the memory cards each include
at least 2 dies: one is the Flash die and another is
the controller die that keeps track of the flawed
Flash cells. As the Flash cells "burn through" during the
life time of the memory card, the controller tries to
reallocate the data to those Flash cells that have not yet
"burned through". The Flash cells that are flawed right
after the Flash die exits the semiconductor foundry are
handled by the controller just as any other "burned through"
cell and therefore there is no point of thorough testing
of the Flash dies. If the testing is skipped, then the cost of
such "testing" is ZERO and the yield of the (sellable) Flash dies is
also much better than it would be, if they were all required to
be perfect.

As little as I understand, the economic incentive to increase the
yield of sellable devices is the reason behind the different frequency
ranges and core counts of CPU chips.

Interestingly, the newest consumer grade
AMD CPUs (read: chips, where single CPU cores are huge)
tend to include only 8 cores maximum. At the same time,
ARM cpu cores, that tend to be physically smaller and simpler,
are sold also in 64-core chips/bundles. That gives me the reason
to suspect that for economic reasons the huge, general, CPU cores
will not be delivered in great quantities by placing them all
on a single die. There might be multi-die chips, which might
be like the RAM is stacked on top of Raspberry Pi SoC or
there might be fancier ways to place multiple dies on top of each other,
as explained at

https://www.youtube.com/watch?v=Tjkfr3BzbUY

From ParaSail perspective it means that if the
number of huge, general purpose, CPU-cores goes up
at LOW COST CONSUMER ELECTRONICS, not just
at some fancy, expensive, military equipment, where
the high expense of the low yield dies is tolerable, then
the cores will likely to be clustered together at some
tree-like structure. May be there will be 4 cores per die,
those 4-core dies might be then clustered together to
form a 4-layer stack of dies that has 4*4=16 cores. The
4-layer stacks might then be assembled to form a CPU-chip,
may be at some cheaper cases 4 stacks per chip, which
would include 4*16=64 cores per chip. A chip with 3*3 stacks
would contain 3*3*16=9*16=90+54=144 cores.

If the CPU cores are seen as graph vertices and the
connections between the CPU cores are seen as graph edges,
and If most of the cores that run a ParaSail
program, run a work stealing engine, then probably the
work stealing has to be scheduled according to the shape
of the graph, taking to account the possible
congestion of some of the routes.
The graph shape might be a CPU architecture
specific ParaSail compilation parameter. A C++ style
instrumentation

https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html

might be used to fine-tune the work-stealing scheduling for
a particular application.

Another reason for considering the way the cores
are clustered is that even if the cores were made of
superconductors, id est even if literally no heat would ever
be generated by the CPU, there is still the current that
is needed to load the parasitic capacitors that the
lines inside of the CPU form. That is to say, even if
the CPU did not produce any heat at all, there is a
minimum current that is needed for driving the CPU.
That minimum current is dependent on the geometrical
properties of the CPU internal structures, but at the same time
the physical size of atoms sets a limit to how big those
internal structures, for example, current conducting lines,
have to be to make sure that the classical electrical
rules would even be applicable. At some point the
nanotechnology rules will be more influential than the
macro process based classical electrical rules. For example,
tunneling effect might start to determine, how many electrons even
move from point some point A to some other point B.
I do not think that there is much opportunities for reducing
the physical shape of the internal components of CPUs.
I believe that they might be able to replace the materials
and some physical processes that they use, but not the
size of the components. The reason, why I believe that is
that I have summarized some historic data about the
CPUs of different era and my summary, which is a very
rough estimation, is in the form of the following table:

nOfA --- minumum CPU die feature size in number of atoms
nm  
--- minimum CPU die feature size in nanometers
f    
--- CPU frequency
=============================
| nOfA  | nm   | f       |
-----------------------------
| 60    | 30   | ~3,5GHz |
| 260   | 130  | ~2,3GHz |
| 400   | 200  | ~550MHz |
| 1200  | 600  |  100MHz |
| 12000 | 6000 |    3MHz |
=============================

The thing to notice about this table is that
if one wants to create a relatively reliable
chip that has an electrical line width of
at least about 250 atoms, then the
"economical CPU frequency" is about 2GHz.
That's the reason, why I estimate that
the single cores of the future consumer grade
cheaper hardware will be about 2GHz.
I take it as a cap, when I think about
algorithm design. The 2GHz will be the
approximate bottleneck width of the
non-parallelizable code, at least
at those applications that need
hardware reliability, may be some
radiation tolerance. The less atoms there are
per CPU die feature, the bigger is the
relative size of the "bullet hole" that is
created to the CPU die by a single radiation particle.
 
For comparison, the Raspberry Pi 3 has a
CPU frequency that is less than 2GHz and some
Russian foundry(I'll skip the link for now) also
once advertised that they have a "200nm process".
I as an Estonian think of that "200nm process" of that
Russian semiconductor foundry that that is as
low as they have any motivation to go, because that's
roughly the minimum that they can use for producing reliable
chips for the Russian military industry. Any smaller than
that and the chips might become too sensitive to radiation,
which means, technically they can do pretty much anything that
they can dream up to do and the only thing holding
them back are their social processes. That is to say,
I believe that this statement by the Western propaganda that some
Russian sanctions somehow limit their military industrial
complex is truly just propaganda and nothing more.
I do not know, may be the Russian side wants the
westerners to believe that the sanctions have any effect
while in reality the sanctions do not have any effect.
I really do not understand the statements about the sanctions.
In my 2018_06 opinion the most damaging thing for the
Russian electronics and IT industry is the repression of
free speech and repression of businesses. The rest,
even total lack of exports, they could handle really well,
if the businesses were allowed to flourish in Russia and
if the free speech issues were solved. None of that gets
solved, as long as there is a Czar in Russia and as long
as the Russian culture praises hierarchy, there will be a Czar
in Russia, even if that Czar is not the Putin. At least
someone will be at the top of the hierarchy.

Russia and CPUs might be a bit of a stretch, to say the least,
but it is related to the global electronics manufacturing and
economics. Military industry, including that of the adversaries,
does drive the tech industry at least to some extent.


Thank You for reading my post :-)

Martin Vahi

unread,
Jun 16, 2018, 4:26:24 PM6/16/18
to ParaSail Programming Language
I suspect that in the case of safety critical systems,
real-time systems, the maximum delay will depend
on how congested the channel between a single CPU-core
and the "south bridge", input-output hub, is. May be
the future multi-core CPUs will have different core types or
the CPU cores are somehow prioritized, so that the
most timing critical tasks are allocated only to
cores that have high priority IO-access. The ParaSail
compiler should then take the prioritization of the CPU cores
to account.

It's just a wild thought that occurred to me about 10 minutes ago.
Thank You for reading :-)

Martin Vahi

unread,
Jul 8, 2018, 4:09:05 AM7/8/18
to ParaSail Programming Language

"Sophie Wilson - The Future of Microprocessors"

https://www.youtube.com/watch?v=_9mzmvhwMqw

If I understand the year 2016 presentation correctly, then the summary of the
presentation might be that the limits of the CPUs of "the future"(from 2016 perspective)
are production costs, power density and software support.

Martin Vahi

unread,
Aug 27, 2018, 7:46:10 AM8/27/18
to ParaSail Programming Language

I thought that may be it's worth mentioning that
there is a group of people, who believe in

    deterministic multi-threaded programming model

https://sampa.cs.washington.edu/new/research/dmp.html
(archival copy: https://archive.is/rXTXJ )

I reached that link from

http://www.cis.upenn.edu/~devietti/research.html

Mainly due to the
("GPUDet: A Deterministic GPU Architecture")
http://dl.acm.org/authorize?6809238

At the time of my current post I lack an opinion about that
approach. May be it helps at some corner cases, but
one thought that comes to my mind is that if there is
a system of firmly connected mechanical gears, then
the moment one of the gears jams, the whole gear mechanism
jams, unless there's some "automatic gearbox" mechanism
built into the system. That is to say, I think that some
lack of determinism can be a Good Thing. Some lack of
determinism can be good also from security point of view,
specially given how hopeless the situation with the
mainstream CPUs seems to be. Some "mainstream news"
about hardware related vulnerabilities:

http://www.cs.vu.nl/~ast/intel/
(archival copy: https://archive.is/vYff8 )

https://marc.info/?l=openbsd-tech&m=153504937925732&w=2
(archival copy: https://archive.is/B3I49 )

https://meltdownattack.com/


Thank You for reading my comment.

Tucker Taft

unread,
Aug 27, 2018, 5:17:25 PM8/27/18
to ParaSail Programming Language
Dijkstra was a believer that there was no need to be deterministic if you had well-defined semantics. See his book "A Discipline of Programing" where both the fundamental "if" and "do" statements make arbitrary choices when two guards are true.  Of course it is incumbent on the programmer to show that it doesn't matter which choice is made. 

Determinism is great if you have no connection to the external world, and you are doing a pure computation, as in many High-Performance Computing situations.  But if you are interacting with the external world, then clearly there are timing and sequencing issues that imply that no two executions of the program are necessarily going to be identical.  And even without such external interactions, if you can show that it doesn't matter exactly how the work-stealing scheduling is performed, for example, so long as you get equivalent results (for some definition of "equivalent"!), then there is no need to be overly deterministic.

There was a series of workshops called "Workshop on Determinsm and Corretness in Parallel Programming" (WoDet) where determinism was seen as a fundamental goal of parallel programming, but I would say over the years the workshop "morphed" a bit toward admitting that determinism was not the end-all and be-all, and "correctness" was more important (for some definition thereof!).

-Tuck

--
You received this message because you are subscribed to the Google Groups "ParaSail Programming Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to parasail-programming...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Vahi

unread,
Sep 11, 2018, 7:08:34 PM9/11/18
to ParaSail Programming Language

Accoriding to the 2018_08_27 article at

(archival copy: https://archive.fo/uK7Vz )

the GlobalFoundries skips adopting a 7nm process and specializes to products that can be manufactured with the equipment that they have already adopted.

As of 2018_09_12 I do not fully understand the reasons behind such a move. From the article it seems to me that they want to make lemonade out of lemons, they try to hype up the mess that they have at their management level, id est I'm currently not capable of seeing any technical or economic reasons behind such a move. In my opinion the Internet_of_Things/battery_operated_devices market always yearns for lower power consumption and the truly high volume chip projects like major microcontrollers will probably prefer the most power saving chip manufacturing technology that the industry has to offer, with yield/cost taken to consideration. The foundries that use power-hungrier technologies have to make up that deficiency with something "extra" at some other parameters of their offers and megacorporations are definately NOT FLEXIBLE, which means that the various smaller-but-more-flexible foundries, for example, the X-FAB


will probably outcompete the GlobalFoundries at the niche specialized chip manufacturing market. That is to say, according to my very subjective and uninformed view the GlobalFoundries is too big to competitively and cost effectively  handle small, start-up-style, fabless, chip companies and by re-specializing on small clients, the management of the GlobalFoundries has initiated the bankruptcy process of the GlobalFoundries. I'm very likely to be wrong with my current prediction, but my prediction is that at the very latest 12 years from now, at 2018+12=2030 the GlobalFoundries will be "reorganized" or "sold-to-pieces", unless they start mass-producing open source FPGA-s or something similar that would take the start-up-market(read: the parties that determine what gets produced in volume in the future) by storm. 

Thank You for reading my (very uninformed) text.

Martin Vahi

unread,
Jul 29, 2021, 11:58:16 AM7/29/21
to ParaSail Programming Language
Just a reference to a 2020 presentation by the initial designer of the ARM CPU instruction set,  the Roger/Sophie Wilson:

    ("UA 2020 Wheeler Lecture: The Future of Microprocessors")
    https://www.youtube.com/watch?v=R2SdSLCMKEA

She covers many things there, but one of the takeaways is that if components are partly just tens of atoms wide and there is the danger that they melt due to heat, then the heat sources do need to be distributed in space in some way, which introduces some light speed related delays and that sets some theoretical limits on the maximum CPU frequency. The related key phrase: "dark silicon"(meaning: on-chip switched off circuitry, id est transistors exist, but they melt themselves, if switched on).

Thank You for reading this comment.
Reply all
Reply to author
Forward
0 new messages