Memory Usage and Parallelism Related Design Patterns from the WWW

Martin Vahi

unread,

May 11, 2017, 12:22:17 AM5/11/17

to ParaSail Programming Language

I thought that it might be nice if there were a
thread in this forum for gathering various
references to various design patterns and
some not-famous-but-interesting academic papers.

The core idea behind the initiation of this thread
is that in the past projects and ideas that
can be very relevant and useful with today's
hardware, were discarded as unpractical due to
the limitations of the hardware of the day. An example of
such a technology seems to be "Symbolic Computation", which
in its essence is the transformation of mathematical
formulae. The "Symbolic Computation" seems to have
been discarded due to the RAM and CPU-time
requirements, but modern web browsers run
JavaScript interpreters (read: consume "a lot" of CPU-time) and
in the background the Just-in-Time compilers of
the various JavaScript implementations perform
transformations that are probably much more
elaborate than the transformations of mathematical
formulae probably were back at the era, when I was
a child(I'm born in 1981). Illustrations are

    (REDUCE Computer Algebra Project)
    http://reduce-algebra.com/

and the various Lisps, which really do not need
any special, extra powerful, "Lisp machines" any more.
Supposedly the REDUCE was usable only on supercomputers,
but in 2017 it can be run on an old version
of the Raspberry Pi that has only a single 700MHz CPU-core.
I guess 700MHz divided with ~20MHz is about 35 + the
time saved from inter-CPU-communication. The CPU-caches
of the era either did not exist or were crap anyway,
so those do not even count. As it turns out, the
Soviet military industry compensated its few-fold
inferiority of its electronics manufacturing capability
by combining a formal methods based parallelizing compiler with
a specialized CPU-architecture, but that technology
never reached the general public, so
that progress does not really count.

    (The way I understand, the idea behind the formal methods
    was that by making sure that some flaws never happen,
    the electronics of the CPU could be simplified by
    leaving out the electronics blocks that check for the
    flaws. An example of such flaw type is the
    C programming language style "wrong" memory access,
    so, if I remember the video-lecture correctly,
    Ada was properly supported, but for C there was a
    clause that whoever uses C, uses it at its own risk.
    If I understand it correctly, the primary motive
    for reducing the amount of electronics
    was not the savings on the cost of the components,
    but the reduction of power consumption, because the more
    power the computer consumed, the more elaborate was the
    cooling system and the cooling part was, supposedly,
    a problem in its own right. The parallelism effort
    was just to get more performance, to compensate
    the inferior electronics manufacturing capability.

    The story emerges through the whole series of the lecture
    videos. The cited video is just one of the videos of the
    lecture series. The author of the lecture series
    claims that he, in person, was one of the main authors
    of the computer that run some of the Soviet Union
    nuclear ballistic missiles. I as someone born in the
    Soviet Union can certainly understand, how that
    technology does not reach the general public in the
    cultural context of the Soviet Union, even if it were cheap,
    but, long live the Internet and the Glasnost :-D
    https://www.youtube.com/watch?v=2qERUc4UoGw

   (The modern application of the Soviet era work.)
   http://www.elbrus.ru/arhitektura_elbrus
   (archival copy: http://archive.is/scu9P )

As a seed value to this thread, the tread of
references to thrown-away-or-undiscovered gems,
I offer the following reference:

    (the Jasmine Compiler Project)
    http://www.eecg.toronto.edu/~tsa/jasmine/

    (archival copy: https://archive.is/QZwQa )

I'm not totally sure, whether that reference is useful,
but it seems to be a very old, bitrotting,
project that caught my attention with a phrase like:

    "enhancing cache and memory locality while
     preserving existing parallelism in programs"

I found the material interesting in the context of
learning design patterns. The Jasmine Compiler Project
seems to be so abandoned that I couldn't even find the
source of the Jasmine compiler, not that it matters,
because I just wanted to gather some ideas.

What regards to data locality, utilization of caches and
the avoidance of possibly broken or slow network connections,
regardless of what programming language is being used,
then one other relevant project, which seems to be totally dead,
but which has an interesting idea to offer, is

    David Keller's and Andres Amaya Garcia's
    OpenTransputer Project
    http://wordpress-transputeriot.rhcloud.com/

The idea that I picked up from that project is that
in stead of arranging the processing elements, computers,
to a mesh network like the Parallella does, or a
"scale free network", which is the evolution's response
in nature, the computers/processing_elements
might/should be arranged to a

    Benes Network

which, according to

    ("The KR-Benes Network: A Control-Optimal Rearrangeable
      Permutation Network" by Rajgopal Kannan and Baton Rouge)
    https://arxiv.org/pdf/cs/0309006.pdf

seems to be some very old and classical thing from
the era of telephone network building heyday.
I might be mistaken, but if I understand the definitions from

    https://www.cs.cmu.edu/afs/cs.cmu.edu/project/phrensy/pub/papers/AroraLM94/node2.html#SECTION00011000000000000000
    (archival copy: https://archive.is/9NVlh )

correctly, the phrase "rearrangeable network" should designate
a black box that has parallel 1-to-1 connections from
all of its inputs to all of its outputs regardless of the
permutations of the inputs and outputs and the "big show"
and "loads of texts" is often about at what point in time
the information about what input needs to be connected to
what output comes available and in the case of the
"rearrangeable network" all of the information about which
input needs to be connected to which output must be
available before any of the connections is established.
(I might be mistaken, of course, I haven't studied network theory,
I only know a little bit of graph theory and discrete math.)

The Benes Network idea appeals to me in a sense that it
seems to be a more real-life-scalable version of a situation, where
an ant can walk from one end of a circle of pearls, which
has infinitely many pearls that are all connected to their neighbors by
a finite length string, to the other end of the circle only, if
there is a finite length piece of string crossing
the circle, roughly through the center point of the circle.
Another nice thing about the Benes Network seems to be
that it seems to be tested in practice, for decades.
Ants like data, pearls like computers or CPU-cores with cache.
The finite length strings between pearls are data connection links.

What regards to the "scale free" networks, then here are
a few 2011 citations from the work of a person, whom I
happen to know personally(she was my class mate
at primary school, in Kuressaare, Saaremaa):

    http://publications.lib.chalmers.se/records/fulltext/143636.pdf

    ----citation---start----
    For decades graph theory was focused on either
    regular or completely random graphs. However,
    neither model is suitable for describing
    most real-life networks like social networks,
    internet and biological networks (protein-protein interaction,
    metabolic, transcription-regulatory and signaling networks).
    Only recently a new graph model was introduced which
    describes all of these networks - the model of
    scale-free networks.
    ----citation---end------

    ----citation---start----
    Scale-free networks are amazingly robust against
    accidental failures - even if 80% of randomly selected
    nodes fail, the remaining 20% still form a compact cluster
    with a path connecting any two nodes [1]. On the other hand,
    a lot of damage might be caused
    if a few key hubs are knocked out.
    ----citation---end------

The work is about biology, but it seems to be
relevant in the context of data locality
in an infinitely large cluster computer which
should also have some graceful degradation properties,
where some computers/nodes will fail at
an arbitrary moment. From cost point of view,
if the software is written so that it can
tolerate the failure of random nodes of the
cluster computer, then the not-so-good-reliability
of some-Chinese-products won't be a
quality issue any more. If the unreliable computers
are power conserving, then it's OK to
have a lot of "backups" connected to the cluster.
There's even the option to run the same
program with the same input on multiple nodes in parallel,
like was the case with the Soveit Buran computer.

    http://www.buran-energia.com/bourane-buran/bourane-consti-ordinateur-computer.php

As a matter of fact, the RethinkDB database
engine uses exactly that idea:

    (A ~14min long screencast that demonstrates
    database replication and node failure and the
    recovery from that situation.)
    https://www.youtube.com/watch?v=cnpSi9qI02E

In the case of servers the extra slow bottle neck is not
just the network connection, but also access to HDD,
which makes automated, transparent, mirroring, very practical.
I suspect that Jasmine Compiler Project people probably
did not think that their cache optimization ideas
might be read for gathering ideas in 2017 :-D

Thank You for reading my post.
I hope that I'm not spamming "too much".

Tucker Taft

unread,

May 11, 2017, 5:26:57 PM5/11/17

to ParaSail Programming Language

Thanks, Martin, for your thought-provoking post.

On Thu, May 11, 2017 at 12:22 AM, Martin Vahi <sininet...@gmail.com> wrote:

I thought that it might be nice if there were a
thread in this forum for gathering various
references to various design patterns and
some not-famous-but-interesting academic papers.

The core idea behind the initiation of this thread
is that in the past projects and ideas that
can be very relevant and useful with today's
hardware, were discarded as unpractical due to

the limitations of the hardware of the day. ...

....

Thank You for reading my post.
I hope that I'm not spamming "too much".

Always fun to read your posts...

Take care,

-Tuck

--
You received this message because you are subscribed to the Google Groups "ParaSail Programming Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to parasail-programming-language+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Martin Vahi

unread,

Jun 9, 2017, 7:11:41 AM6/9/17

to ParaSail Programming Language, tuc...@yaletaft.com

neljapäev, 11. mai 2017 21:26.57 UTC kirjutas Tucker Taft:

Thanks, Martin, for your thought-provoking post.

...

Always fun to read your posts...

...

Thank You for the compliment :-)

Currently I have the following links to "deposit":

    http://apollo.gforge.inria.fr/about
    (archival copy: https://archive.is/Gzicm )
    ---citation--start-----
    APOLLO - Automatic speculative POLyhedral Loop Optimizer is
    a compiler framework dedicated to automatic, dynamic and
    speculative parallelization and optimization of
    programs' loop nests.
    ---citation--end-------

    http://pluto-compiler.sourceforge.net/
    (archival copy: https://archive.is/JjOJE )
    ---citation--start-----
    PLUTO is an automatic parallelization tool
    based on the polyhedral model.
    ---citation--end-------
    (The actual home page of the PLUTO compiler seems to be
     an Indian "Multicore Computing Lab".
     http://mcl.csa.iisc.ac.in/index.html#tools
     https://archive.is/EjmcP )

    http://polyhedral.info/
    (archival copy: https://archive.is/8Ep4E )
    ---citation--start-----
    Polyhedral compilation encompasses the compilation
    techniques that rely on the representation of programs,
    especially those involving nested loops and arrays,
    thanks to parametric polyhedra 952 or Presburger
    relations 31, and that exploit combinatorial and
    geometrical optimizations on these objects to
    analyze and optimize the programs.
    ---citation--end-------

I'm not totally sure, if the following link is relevant,
but due to the keyword "polyhedral" I somehow
stumbled upon it and added it to my personal to-be-studied list.
I'll add it here just in case.

    https://www.math.ucdavis.edu/~latte/
    (archival copy: https://archive.is/dVFzo )
    ---citation--start-----
    LattE (Lattice point Enumeration) is a
    computer software dedicated to the problems of
    counting lattice points and integration
    inside convex polytopes.
    ---citation--end-------

Martin Vahi

unread,

Jun 19, 2017, 8:16:39 AM6/19/17

to ParaSail Programming Language

As of 2017_06_19 I do not know, whether the following link

    http://invasic.informatik.uni-erlangen.de/en/

has of any practical value, but at first glance it seems to point to
a German academic project that seems to try to
find an answer to the question: how to best parallelize
a program given the timeline of the number of
cores that are exclusively available to the
program at the given time points?

    ---citation--start-----
    In the Transregional Collaborative
    Research Center Invasive Computing (abbr. InvasIC),
    we are investigating a novel paradigm
    for the design and resource-aware programming of
    future parallel computing systems. For systems
    with 1000 and more cores on a chip,
    resource-aware programming is of utmost importance
    to obtain high utilisation as well as
    computational and energy efficiency numbers.
    ---citation--end-------

I guess that in the case of real-time software(read: control systems)
it is important, how many CPU-cores are available, because
if there's not enough of them available, the deadlines will be missed.

It seems to me that their work seems to be relevant
in the context of the question that if the control software
has been compiled for the SoC that has 10*N CPU-cores and then
the same binary is loaded to a board with a SoC
that has only N CPU-cores of the same type, may be RISC-V,
then what prevents the deadlines from being missed at some corner cases
that testing does not catch? The formal verification part takes
place only before the binary for the 10*N-CPU-core chip is
generated, so the formal verification will not catch that
run-time-like flaw.

Martin Vahi

unread,

Aug 13, 2017, 2:58:56 PM8/13/17

to ParaSail Programming Language

A 2017_08_13 citation from

http://www.hsafoundation.com/hsa-developer-tools/
(archival copy: https://archive.is/3fHJQ )

----citation--start----
HSAIL is HSA’s Intermediate Language that is
the assembly language of HSA. It abstracts
the complexities of compute devices
in heterogeneous systems. A compiler outputs HSAIL
in a binary object format. An HSA Finalizer then
translates HSAIL to the native machine code
of the computing unit( GPU, DSP, ISP, etc.).
----citation--end------

Martin Vahi

unread,

Sep 28, 2017, 6:09:15 AM9/28/17

to ParaSail Programming Language

Yet another link to this thread:

http://grappa.io/

According to their own statements, it's a BSD-licensed, bitrotting,
version of a C++library
https://github.com/uwsampa/grappa
that implements distributed shared memory. However, it caught
my attention with the following title:

"Automatic Locality Extraction via Migration"
https://sampa.cs.washington.edu/new/papers/oopsla14-alembic.pdf

Martin Vahi

unread,

May 19, 2018, 4:01:26 AM5/19/18

to ParaSail Programming Language

Given that the F35 is probably the most capable
fighter plane on 2018 planet Earth and given that the ParaSail
aims to be a replacement of Ada and a replacement of C++,
the presentation about the use of C++ for implementing
safety critical code of the F35 might be interesting to watch.

CppCon2014: Bill Emshoff
"Using C++ on Mission and Safety Critical Platforms"
https://www.youtube.com/watch?v=sRe77Mdna0Y

Interestingly the relevant Bjarne Stroustrup's JSF++ document:
http://www.stroustrup.com/JSF-AV-rules.pdf
is from year 2005, id est as of the time of the writing
of my current comment the document is at least 12 years old.
That's a considerable time for Ada tools development, meaning,
may be they would have used Ada 2012,
had it been available for them in year 2005.

(AdaCore "The Ada 2012 Story")
https://www.youtube.com/watch?v=3e-BGblAMC4

Marius Amado-Alves

unread,

Aug 29, 2018, 5:25:04 AM8/29/18

to ParaSail Programming Language

CppCon2014: Bill Emshoff
"Using C++ on Mission and Safety Critical Platforms"
https://www.youtube.com/watch?v=sRe77Mdna0Y

"Interesting", yes, but kind of makes me puke. They failed to explain why they chose C++ over Ada. The version does not matter. Every Ada version is superior to the contemporaneous C++ version. No amount of C++ coding standard can change that. There are reports indicating many, many software bugs with the F35. Versus, e.g., Boeing using SPARK to provide zero software defects. Do I understand correctly that ParaSail is to be succeeded by Sparkel, integrating the SPARK lessons? One smart feature of ParaSail or Sparkel is the harmless inclusion of C++-like stuff, at least sytactically, to make it more edible to C++ heads. Maybe for the next fighter programme they'll be lured into Sparkel for this.

Martin Vahi

unread,

Sep 4, 2018, 1:07:02 AM9/4/18

to ParaSail Programming Language

I do not want to get into flame-wars by using a construction that X is better than Y, because I like X better, but I guess I can use one, in my opinion, very serious, argument IN FAVOR of the C/C++. That argument has absolutely NOTHING to do with the C/C++/Ada/WhateverMyFavoriteLanguage as a programming language, but it's all about using the "WhateverMyFavoriteLanguage" on AVAILABLE HARDWARE.

I guess that we can all agree that a system programming language is totally useless, if there is no implementation of it that produces "binaries"(quotes due to comparison of CPU machine code and FPGA configuration stream) for the hardware that is available to us. If available hardware is able to run a simulator of some "alien computer" and the WhateverMyFavoriteLanguage "compiler" is able to produce "binaries" for that "alien computer", then I understand that there can be arguments, whether the WhateverMyFavoriteLanguage is usable as a system programming language, but my take on it is that if the available hardware is able to run a simulator of the "alien computer" that can run programs written in the WhateverMyFavoriteLanguage, then the WhateverMyFavoriteLanguage MIGHT be usable for application development even if it does not qualify as a system programming language.

From that perspective, the difference between the Ada and the C/C++ is that the C/C++ compilers get ported to all new CPUs, as demonstrated at

https://riscv.org/2018/08/update-on-the-development-of-the-risc-v-software-toolchain/
(archival copy: https://archive.fo/CIgOt )

to run/compile Linux/BSD/Solaris/AIX/probably_Windows/probably_MacOS/Java_virtual_machine/C#_virtual_machine, but there is no long term guarantee that the Ada compiler(s) get ported to brand new hardware, because there is no huge ecosystem of parties, who would literally bankrupt, if they did not port Ada to the latest and greatest hardware.

The safety critical systems designers CAN AFFORD TO CHOOSE THEIR HARDWARE, at least for the F35 like projects, and they can choose the kind of hardware that is supported by their favorite programming language, including Ada. Consumer electronics vendors CAN NOT AFFORD all hardware, they have to choose according to the fancyness/price ratio.

One argument in favor of the Ada might be that the safety critical system vendors can pool their money for paying AdaCore or some similar party, but that's kind of walking on thin ice, because there is no guarantee that that the safety critical system vendors keep on paying AdaCore-like parties in the future, because some of the safety critical system vendors might switch to some other tools, may be some modeling tool that has formal verification built in

https://www.eschertech.com/)

and that modeling tool might generate some safety critical C/C++ that can be compiled with something like

http://compcert.inria.fr/compcert-C.html
(archival copy: https://archive.is/6Bb72 )

That is to say, C/C++ has an advantage over all other system programming languages, including the Ada, that there are a lot of deep pocketed parties that have a vital need to port the C/C++ compilers to the newest hardware, but the parties that use Ada have a migration path away from the Ada(embedded systems are smaller, easier to write from scratch than the Linux/BSD kernels are) while also not having a vital need for porting the Ada to the newest hardware. (They can afford to use old, probably relatively expensive, hardware.) Not to mention the humongous amount of work that it takes to create the software that performs the various speed optimization related code transformations. Even the ParaSail implementation depends on the C/C++ related LLVM for optimizations.

Summary:
It's not just the programming language that matters, but also what it takes to keep the programming language implementation running on available hardware and operating system combinations. ("Bare metal" is an operating system special case, corner case.). The C/C++ is the "high level portable assembly language" that always gets properly ported as long as there is a need to port Linux/BSD/Solaris/something_POSIX/other_mainstream_legacy_operating_systems to new hardware. The need to port C/C++ does not depend on the availability of fancier, more modern, "safer", programming languages, because the operating system kernels that need to run on the bare metal are written in C/C++. As long as the kernels are written in C/C++, the moral obsolescence of the C/C++ does not eliminate the need to port the C/C++ to new hardware.

May be an illustration might also be that if the whole internet is written in some version of HTML, then will some fancier HTML replacement eliminate the need to make HTML work on newer web browsers? From that point onward the question might be, is HTML a future-proof format for writing documentation and if it is, then is there a need for tools that translate non-HTML documentation to HTML format even if the documentation itself is developed in some more modern non-HTML format?

Thank You for reading my comment :-)

Tucker Taft

unread,

Sep 4, 2018, 8:06:19 AM9/4/18

to ParaSail Programming Language

For what it is worth, AdaCore recognizes this issue, and has released their "CCG" (GCC backwards ;-) which is a C code generator (aka "common code generator") which generates compilable C from a "mission-critical" subset of Ada 2012. SofCheck had a similar product for Ada 95 (called "AdaMagic"), and such tools are quite useful for use of new hardware platforms. Also, for what it is worth, AdaCore has a RISC-V port.

-Tuck

On Tue, Sep 4, 2018 at 1:07 AM Martin Vahi <sininet...@gmail.com> wrote:

I do not want to get into flame-wars by using a construction that X is better than Y, because I like X better, but I guess I can use one, in my opinion, very serious, argument IN FAVOR of the C/C++. That argument has absolutely NOTHING to do with the C/C++/Ada/WhateverMyFavoriteLanguage as a programming language, but it's all about using the "WhateverMyFavoriteLanguage" on AVAILABLE HARDWARE.

I guess that we can all agree that a system programming language is totally useless, if there is no implementation of it that produces "binaries"(quotes due to comparison of CPU machine code and FPGA configuration stream) for the hardware that is available to us. If available hardware is able to run a simulator of some "alien computer" and the WhateverMyFavoriteLanguage "compiler" is able to produce "binaries" for that "alien computer", then I understand that there can be arguments, whether the WhateverMyFavoriteLanguage is usable as a system programming language, but my take on it is that if the available hardware is able to run a simulator of the "alien computer" that can run programs written in the WhateverMyFavoriteLanguage, then the WhateverMyFavoriteLanguage MIGHT be usable for application development even if it does not qualify as a system programming language.

From that perspective, the difference between the Ada and the C/C++ is that the C/C++ compilers get ported to all new CPUs, as demonstrated at

https://riscv.org/2018/08/update-on-the-development-of-the-risc-v-software-toolchain/
(archival copy: https://archive.fo/CIgOt )

to run/compile Linux/BSD/Solaris/AIX/probably_Windows/probably_MacOS/Java_virtual_machine/C#_virtual_machine, but there is no long term guarantee that the Ada compiler(s) get ported to brand new hardware, because there is no huge ecosystem of parties, who would literally bankrupt, if they did not port Ada to the latest and greatest hardware. ...

Reply all

Reply to author

Forward