I thought that it might be nice if there were a
thread in this forum for gathering various
references to various design patterns and
some not-famous-but-interesting academic papers.
The core idea behind the initiation of this thread
is that in the past projects and ideas that
can be very relevant and useful with today's
hardware, were discarded as unpractical due to
the limitations of the hardware of the day. An example of
such a technology seems to be "Symbolic Computation", which
in its essence is the transformation of mathematical
formulae. The "Symbolic Computation" seems to have
been discarded due to the RAM and CPU-time
requirements, but modern web browsers run
JavaScript interpreters (read: consume "a lot" of CPU-time) and
in the background the Just-in-Time compilers of
the various JavaScript implementations perform
transformations that are probably much more
elaborate than the transformations of mathematical
formulae probably were back at the era, when I was
a child(I'm born in 1981). Illustrations are
(REDUCE Computer Algebra Project)
http://reduce-algebra.com/and the various Lisps, which really do not need
any special, extra powerful, "Lisp machines" any more.
Supposedly the REDUCE was usable only on supercomputers,
but in 2017 it can be run on an old version
of the Raspberry Pi that has only a single 700MHz CPU-core.
I guess 700MHz divided with ~20MHz is about 35 + the
time saved from inter-CPU-communication. The CPU-caches
of the era either did not exist or were crap anyway,
so those do not even count. As it turns out, the
Soviet military industry compensated its few-fold
inferiority of its electronics manufacturing capability
by combining a formal methods based parallelizing compiler with
a specialized CPU-architecture, but that technology
never reached the general public, so
that progress does not really count.
(The way I understand, the idea behind the formal methods
was that by making sure that some flaws never happen,
the electronics of the CPU could be simplified by
leaving out the electronics blocks that check for the
flaws. An example of such flaw type is the
C programming language style "wrong" memory access,
so, if I remember the video-lecture correctly,
Ada was properly supported, but for C there was a
clause that whoever uses C, uses it at its own risk.
If I understand it correctly, the primary motive
for reducing the amount of electronics
was not the savings on the cost of the components,
but the reduction of power consumption, because the more
power the computer consumed, the more elaborate was the
cooling system and the cooling part was, supposedly,
a problem in its own right. The parallelism effort
was just to get more performance, to compensate
the inferior electronics manufacturing capability.
The story emerges through the whole series of the lecture
videos. The cited video is just one of the videos of the
lecture series. The author of the lecture series
claims that he, in person, was one of the main authors
of the computer that run some of the Soviet Union
nuclear ballistic missiles. I as someone born in the
Soviet Union can certainly understand, how that
technology does not reach the general public in the
cultural context of the Soviet Union, even if it were cheap,
but, long live the Internet and the Glasnost :-D
https://www.youtube.com/watch?v=2qERUc4UoGw (The modern application of the Soviet era work.)
http://www.elbrus.ru/arhitektura_elbrus (archival copy:
http://archive.is/scu9P )
As a seed value to this thread, the tread of
references to thrown-away-or-undiscovered gems,
I offer the following reference:
(the Jasmine Compiler Project)
http://www.eecg.toronto.edu/~tsa/jasmine/ (archival copy:
https://archive.is/QZwQa )
I'm not totally sure, whether that reference is useful,
but it seems to be a very old, bitrotting,
project that caught my attention with a phrase like:
"enhancing cache and memory locality while
preserving existing parallelism in programs"
I found the material interesting in the context of
learning design patterns. The Jasmine Compiler Project
seems to be so abandoned that I couldn't even find the
source of the Jasmine compiler, not that it matters,
because I just wanted to gather some ideas.
What regards to data locality, utilization of caches and
the avoidance of possibly broken or slow network connections,
regardless of what programming language is being used,
then one other relevant project, which seems to be totally dead,
but which has an interesting idea to offer, is
David Keller's and Andres Amaya Garcia's
OpenTransputer Project
http://wordpress-transputeriot.rhcloud.com/ The idea that I picked up from that project is that
in stead of arranging the processing elements, computers,
to a mesh network like the Parallella does, or a
"scale free network", which is the evolution's response
in nature, the computers/processing_elements
might/should be arranged to a
Benes Network
which, according to
("The KR-Benes Network: A Control-Optimal Rearrangeable
Permutation Network" by Rajgopal Kannan and Baton Rouge)
https://arxiv.org/pdf/cs/0309006.pdf seems to be some very old and classical thing from
the era of telephone network building heyday.
I might be mistaken, but if I understand the definitions from
https://www.cs.cmu.edu/afs/cs.cmu.edu/project/phrensy/pub/papers/AroraLM94/node2.html#SECTION00011000000000000000 (archival copy:
https://archive.is/9NVlh )
correctly, the phrase "rearrangeable network" should designate
a black box that has parallel 1-to-1 connections from
all of its inputs to all of its outputs regardless of the
permutations of the inputs and outputs and the "big show"
and "loads of texts" is often about at what point in time
the information about what input needs to be connected to
what output comes available and in the case of the
"rearrangeable network" all of the information about which
input needs to be connected to which output must be
available before any of the connections is established.
(I might be mistaken, of course, I haven't studied network theory,
I only know a little bit of graph theory and discrete math.)
The Benes Network idea appeals to me in a sense that it
seems to be a more real-life-scalable version of a situation, where
an ant can walk from one end of a circle of pearls, which
has infinitely many pearls that are all connected to their neighbors by
a finite length string, to the other end of the circle only, if
there is a finite length piece of string crossing
the circle, roughly through the center point of the circle.
Another nice thing about the Benes Network seems to be
that it seems to be tested in practice, for decades.
Ants like data, pearls like computers or CPU-cores with cache.
The finite length strings between pearls are data connection links.
What regards to the "scale free" networks, then here are
a few 2011 citations from the work of a person, whom I
happen to know personally(she was my class mate
at primary school, in Kuressaare, Saaremaa):
http://publications.lib.chalmers.se/records/fulltext/143636.pdf ----citation---start----
For decades graph theory was focused on either
regular or completely random graphs. However,
neither model is suitable for describing
most real-life networks like social networks,
internet and biological networks (protein-protein interaction,
metabolic, transcription-regulatory and signaling networks).
Only recently a new graph model was introduced which
describes all of these networks - the model of
scale-free networks.
----citation---end------
----citation---start----
Scale-free networks are amazingly robust against
accidental failures - even if 80% of randomly selected
nodes fail, the remaining 20% still form a compact cluster
with a path connecting any two nodes [1]. On the other hand,
a lot of damage might be caused
if a few key hubs are knocked out.
----citation---end------
The work is about biology, but it seems to be
relevant in the context of data locality
in an infinitely large cluster computer which
should also have some graceful degradation properties,
where some computers/nodes will fail at
an arbitrary moment. From cost point of view,
if the software is written so that it can
tolerate the failure of random nodes of the
cluster computer, then the not-so-good-reliability
of some-Chinese-products won't be a
quality issue any more. If the unreliable computers
are power conserving, then it's OK to
have a lot of "backups" connected to the cluster.
There's even the option to run the same
program with the same input on multiple nodes in parallel,
like was the case with the Soveit Buran computer.
http://www.buran-energia.com/bourane-buran/bourane-consti-ordinateur-computer.phpAs a matter of fact, the RethinkDB database
engine uses exactly that idea:
(A ~14min long screencast that demonstrates
database replication and node failure and the
recovery from that situation.)
https://www.youtube.com/watch?v=cnpSi9qI02EIn the case of servers the extra slow bottle neck is not
just the network connection, but also access to HDD,
which makes automated, transparent, mirroring, very practical.
I suspect that Jasmine Compiler Project people probably
did not think that their cache optimization ideas
might be read for gathering ideas in 2017 :-D
Thank You for reading my post.
I hope that I'm not spamming "too much".