Neural Network Accelerators

robf...@gmail.com

unread,

Nov 12, 2021, 1:38:06 AM11/12/21

to

For the Thor 2021 version core neural network accelerator instructions
are included. Most of the instructions are for loading values into arrays
like the inputs or weights array, or variables like the bias value. The
neural network performs computations when triggered by software.
The network runs asynchronously and has a status register used to
indicate completion of a computation cycle. Eight neurons all compute
at the same time resulting in a sigmoid output for each neuron.
Thor’s neural network consists of eight neurons computing using 16.16
fixed point arithmetic. Each neuron may have up to 1024 inputs.
Activation values are computed serially with a fixed point multiply and
accumulate operation.
While there are only eight neurons in a single layer, multiple layers may
be built up by reading the output levels and using them as inputs for the
next round of calculations. The input / weights array may be partitioned
so that only part of it is used in any one calculation cycle.
I was wondering if there were any processors supporting neural networks
that I could study?
I had thought the network may be better as a memory mapped I/O device.

Stephen Fuld

unread,

Nov 12, 2021, 2:25:42 AM11/12/21

to

There are some historical chips, and several current, either in use or
under development. I am not sure how much information you can get about
their internal architecture.

You might start with

https://en.wikipedia.org/wiki/AI_accelerator

and check out Google's Tensor Processing Unit, and for a different
approach, IBM's True North.

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

JohnG

unread,

Nov 12, 2021, 6:37:08 AM11/12/21

to

While this documents an abstract machine rather than an exact implementation, it still gives a lot of insight IMHO. https://docs.nvidia.com/deeplearning/performance/dl-performance-convolutional/index.html

Another couple papers that look interesting on first scan...
https://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/
https://arxiv.org/pdf/1410.0759.pdf

Another direction might be programming for Apple's M1 Neural Engine -
https://developer.apple.com/documentation/accelerate/bnns
https://developer.apple.com/documentation/metalperformanceshaders

-JohnG

Terje Mathisen

unread,

Nov 12, 2021, 7:20:32 AM11/12/21

to

Please include Tesla's custom chips as well, both the runtime engine
which is installed in all their cars (10K 8-bit MACs, presumably with a
32-bit accumulator) and the training chip they use in Dojo, which I
believe is using a more or less standard 16-bit fp format.

Tesla had much more severe power constraint than Google, so I believe
their in-car chips are leading in performance/watt.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

JimBrakefield

unread,

Nov 12, 2021, 9:57:11 AM11/12/21

to

Dojo also supports 8-bit fp using a "block floating-point" exponent offset
https://cdn.motor1.com/pdf-files/535242876-tesla-dojo-technology.pdf
The Posit and NN literature have studies of shortened fp for NN weights.

Not sure if training uses larger fp and then application instances use shortened fp?

MitchAlsup

unread,

Nov 12, 2021, 5:52:27 PM11/12/21

to

I think it is rather safe to say (at this point in time) the NN-accelerators
are in the 704 days of these kinds of architectures.

A bit more than barely function, and a long way to go.........

EricP

unread,

Nov 13, 2021, 1:28:55 PM11/13/21

to

The NN talked about mostly in the press and which most vendors
are trying to sell you are Convolution Neural Networks (CNN)
which are basically multiple layers of sums.
You can build a fancy pattern matcher out of it but
it will never make a good decision mechanism.
It will always be a mystery why it "decided" a certain way
because it is just calculating a great whacking polynomial.
As NN go, I have a gut feeling that is a dead end.

Real NN have feedback, called recurrent, and real neurons are spiky
which introduces signal timing and phase delays as attributes.

In particular signal phase timing adds a whole new dimension for
information storage. Feedback allows resonances to enhance or
suppress different combinations of inputs.

That is the basis for my suspicion that we will eventually find
that brains, all brains, are akin to _holograms_.

Clusters of neurons can build holographic modules and form
holographic modules of modules.

BGB

unread,

Nov 13, 2021, 1:37:53 PM11/13/21

to

Yeah, it is a bit hit or miss even what they should be doing exactly...

Some of the popular options seem to be in-effect doing large glorified
matrix multiplies using a truncated floating format (BF16, essentially
Binary32 with the low 16 bits cut off).

Not sure if there is a "good" reason for BF16, or if it is more a
workaround for popular / mainstream CPU architectures lacking support
for Binary16 SIMD.

It comes off like, say, if I proposed a new 32-bit floating point format
that was effectively just Double with the low 32 bits cut off (then try
to pass it off as "better" when it is really just a way to allow cheaper
format conversions).

Both Binary16 and BF16 could make sense for dedicated low-precision SIMD
ops, and also are small enough to be implemented reasonably affordably
on an FPGA.

For general use, Binary16 (S.E5.F10) probably makes more sense, though
one could debate whether BF16 (S.E8.F7) has enough value in the
general-case to make it worthwhile (short of trying to do a TensorFlow
port or similar, I have doubts).

More debatable, but one could argue for an 8x or 16x (S.E4.F3) vectors
for Neural-Net uses. Though, I suspect the gains would be small as the
relative cost of wrangling inputs would somewhat outweigh the possible
savings from such operators over operating on 16-bit elements (with
possible Packed FP8 <-> FP16 conversion operators).

Intermediate options are mostly those involving FP10 (S.E5.F4) or FP12
(S.E5.F6):
3x FP10 in 32 bits (Maps to 4x FP16, X/Y/Z/{0/-/1/-1});
6x FP10 in 64 bits (Maps to 8x FP16, X/Y/Z/0,P/Q/R/0);
4x FP12 in 48 bits;
...

If one expected to do a lot of FP16, this could justify the cost of have
dedicated SIMD units for these (rather than run them internally through
a slower but higher precision FPU).

EricP

unread,

Nov 13, 2021, 1:55:19 PM11/13/21

to

EricP wrote:
>
> Real NN have feedback, called recurrent, and real neurons are spiky
> which introduces signal timing and phase delays as attributes.
>
> In particular signal phase timing adds a whole new dimension for
> information storage. Feedback allows resonances to enhance or
> suppress different combinations of inputs.
>
> That is the basis for my suspicion that we will eventually find
> that brains, all brains, are akin to _holograms_.
>
> Clusters of neurons can build holographic modules and form
> holographic modules of modules.

One mystery about this is why doesn't every brain immediately
collapse in a giant epileptic fit. Nature must have found a
way to detect and prevent it as the organism grows.
Natural selection would be a poor mechanism because there are
so many ways to fail and many fewer ways to succeed
that almost no brains would survive.

So there must be some mechanism that "drives" these self organizing
networks toward interconnections that do not have uncontrolled feedback.

I've said this before but I suspect _that_ is what nature discovered
at the Cambrian explosion 540 million years ago.

MitchAlsup

unread,

Nov 13, 2021, 2:30:22 PM11/13/21

to

On Saturday, November 13, 2021 at 12:37:53 PM UTC-6, BGB wrote:
> On 11/12/2021 4:52 PM, MitchAlsup wrote:
> > I think it is rather safe to say (at this point in time) the NN-accelerators
> > are in the 704 days of these kinds of architectures.
> >
> > A bit more than barely function, and a long way to go.........
> >
> Yeah, it is a bit hit or miss even what they should be doing exactly...
>
>
> Some of the popular options seem to be in-effect doing large glorified
> matrix multiplies using a truncated floating format (BF16, essentially
> Binary32 with the low 16 bits cut off).
>
>
> Not sure if there is a "good" reason for BF16, or if it is more a
> workaround for popular / mainstream CPU architectures lacking support
> for Binary16 SIMD.
>
> It comes off like, say, if I proposed a new 32-bit floating point format
> that was effectively just Double with the low 32 bits cut off (then try
> to pass it off as "better" when it is really just a way to allow cheaper
> format conversions).
>
>
> Both Binary16 and BF16 could make sense for dedicated low-precision SIMD
> ops, and also are small enough to be implemented reasonably affordably
> on an FPGA.
>
> For general use, Binary16 (S.E5.F10) probably makes more sense, though
> one could debate whether BF16 (S.E8.F7) has enough value in the
> general-case to make it worthwhile (short of trying to do a TensorFlow
> port or similar, I have doubts).
<

A lot of the self-driving NNs can use as few as 1-bit weighting matrices
and 8-bit accumulators and achieve useful pattern recognition rates.
Indeed, much of the work, here, is centered around decreasing the storage
BW to feed to convolution engine than on making the convolution engine
faster or larger.
<
I am pretty sure this kind of experiments will continue for another decade.

>
>
> More debatable, but one could argue for an 8x or 16x (S.E4.F3) vectors
> for Neural-Net uses. Though, I suspect the gains would be small as the
> relative cost of wrangling inputs would somewhat outweigh the possible
> savings from such operators over operating on 16-bit elements (with
> possible Packed FP8 <-> FP16 conversion operators).
<

One big problem is how does one "express" such a matrix multiplication,
where the size of the containers in the weighting matrix change on a per
weight basis.

EricP

unread,

Nov 13, 2021, 2:40:50 PM11/13/21

to

EricP wrote:
> MitchAlsup wrote:
>> I think it is rather safe to say (at this point in time) the
>> NN-accelerators
>> are in the 704 days of these kinds of architectures.
>>
>> A bit more than barely function, and a long way to go.........
>
> The NN talked about mostly in the press and which most vendors
> are trying to sell you are Convolution Neural Networks (CNN)
> which are basically multiple layers of sums.
> You can build a fancy pattern matcher out of it but
> it will never make a good decision mechanism.
> It will always be a mystery why it "decided" a certain way
> because it is just calculating a great whacking polynomial.
> As NN go, I have a gut feeling that is a dead end.

This all reminds me of the fuzzy logic fad of the mid 1980's.
It was invented in the 1920's and the term was coined in 1965.
For some reason is became "a thing" in the tech magazines around 1985
for a while (though a quicky search still finds lots of references).
Everything was going fuzzy.
Fuzzy logic sometimes was and still is labeled artificial intelligence too.

Ivan Godard

unread,

Nov 13, 2021, 3:25:22 PM11/13/21

to

I thought the Cambrian invention was teeth?

BGB

unread,

Nov 13, 2021, 6:13:53 PM11/13/21

to

Yeah.

I suspect in my case, some of the Block-Texture and Block-Audio ops
could also conceivably be applicable to NN weights, at least in very
specialized use-cases.

It is also possible (for cases involving vectors with constant weights),
that I could define encodings for, say:
PLDCM8SH Imm32, Rn //Load 4x FP8 into 4x Binary16 in Rn.
PLDCM8UH Imm32, Rn //Load 4x FP8 into 4x Binary16 in Rn.
PLDCH Imm32, Rn //Load 2x Binary16 into 2x Binary32

These could also make sense for vector-literals in C, but at the moment
is a lower priority due to the use of vector-literals in C being a
fairly niche feature.

Went and added a few possible encodings to the listing (in Op64 space),
which mostly add alternative forms to the existing "FLDCF Imm32, Rn"
encoding.

>>
>>
>> More debatable, but one could argue for an 8x or 16x (S.E4.F3) vectors
>> for Neural-Net uses. Though, I suspect the gains would be small as the
>> relative cost of wrangling inputs would somewhat outweigh the possible
>> savings from such operators over operating on 16-bit elements (with
>> possible Packed FP8 <-> FP16 conversion operators).
> <
> One big problem is how does one "express" such a matrix multiplication,
> where the size of the containers in the weighting matrix change on a per
> weight basis.

In many contexts, there is a trick that one can store a matrix in a
pre-transposed form and turn it into a bunch of vector-multiply and
accumulate operations.

Though, as can be noted, doing a net with naive matrix multiplies does
mean one is going to be doing a whole lot of meaningless multiplies with
zero.

I guess one possible trick (if doing a large matrix multiply with a
loop), would be using an RLE scheme to skip over vectors for runs where
all of the components are zeroes (and "pre-cooking" the model to
eliminate large sections of nearly-zero values).

Though, as can be noted:
I don't really use these sort of "huge matrix multiply" style nets in
any of my own projects.

My own uses have tended to be mostly things like using genetic
algorithms to build more specialized classifiers.

These would tend to be more things like "do some math on some vectors"
followed by comparing the results against some threshold biases, and
running the result into a decision tree. The "fancier" version would
likely be having some way to transform the vector-compare result into a
bit-mask which could be fed into a "switch()" block or similar (as-is,
this can theoretically be done for 4-element vector-compare results).

There is not currently a dedicated S-Curve operation, though when
needed, variations of the Heaviside function can be built using packed
compare and packed select (*1).

In these cases, the weight and bias vectors would typically be embedded
directly into the program logic (though, granted, this sort of thing
falls slightly outside the scope of "standard C").

*1:
const __vec4sf cv_n1={-1,-1,-1,-1}, cv_p1={1,1,1,1};
__vec4sf v, bi, wv;

... calculate v ...
wv = __vec4sf_pcsel_gt(v, bi, cv_p1, cv_n1);

Or, say, psuedo-asm:
PLDCM8SH 0x38383838, cv_p1 //possible encoding
PLDCM8SH 0xB8B8B8B8, cv_n1 //possible encoding
...
PCMPGT.H bi, v
PCSELT.W cv_p1, cv_n1, wv

Which, for each vector element, does effectively:
wv[i] = (v[i] > bi[i]) ? cv_p1[i] : cv_n1[i];
Or, alternately:
wv[i] = (v[i] > bi[i]) ? 1.0 : -1.0;

...

Scott Smader

unread,

Nov 13, 2021, 6:56:04 PM11/13/21

to

The internet says an early Cambrian innovation was hard parts, like shells and plates. The oldest toothed fossil dates to 410 Mya, but the Cambrian Era was 541 - 485.4 Mya. (Btw, more recent fossils suggest that the appearance of new body plans was more gradual than once believed, so not so "explosive.")

Arguing in favor of EricP's point, the cerebellum (which has been shown to at least be able to influence seizures) was a Cambrian "invention," according to Paul Cisek in "Resynthesizing behavior through phylogenetic refinement." (Great read!) However, Cisek also says image-forming eyes were already being used in visually guided approach and reinforcement learning by telencephalons during the Pre-Cambrian Era. To me that suggests that the solution for avoiding epileptic seizures was baked in even earlier.

Ivan Godard

unread,

Nov 13, 2021, 7:01:16 PM11/13/21

to

Why would you evolve shells if nobody has teeth?

Scott Smader

unread,

Nov 13, 2021, 7:07:50 PM11/13/21

to

There are some mean suckers out there!

MitchAlsup

unread,

Nov 13, 2021, 8:26:35 PM11/13/21

to

The precursor to teeth is chitin (stuff of fingernails and claws). This is sufficient to "bite" through non-hardened skin (and also how the first hardened-skin evolved.
<
So after the evolution of chitin, somebody had to evolve a series of stuff even harder (for protection) and the arms race continues........even until today........

JimBrakefield

unread,

Nov 13, 2021, 9:04:48 PM11/13/21

to

The Long Short Term Memory (LSTM) type of ANN has such feedback
connections and works well. Also see Recurrent Neural Network (RNN).

Stephen Fuld

unread,

Nov 14, 2021, 1:50:21 AM11/14/21

to

On 11/13/2021 10:28 AM, EricP wrote:
> MitchAlsup wrote:
>> I think it is rather safe to say (at this point in time) the
>> NN-accelerators
>> are in the 704 days of these kinds of architectures.
>>
>> A bit more than barely function, and a long way to go.........
>
> The NN talked about mostly in the press and which most vendors
> are trying to sell you are Convolution Neural Networks (CNN)
> which are basically multiple layers of sums.
> You can build a fancy pattern matcher out of it but
> it will never make a good decision mechanism.

I disagree with the last phrase. NNs have been used for decision
making, and I expect their use will continue, as they are better at
certain kinds of decisions than other, more "conventional" algorithms.
Of course, one may quibble about how "good" is good, but that is a
different question.

BTW, the brain seems to be a sophisticated pattern matcher, and it works
pretty well at many tasks.

> It will always be a mystery why it "decided" a certain way
> because it is just calculating a great whacking polynomial.

There has been, and continues to be work on this, and some progress is
being made. But, while it certainly would be nice, I am not sure it is
necessary for them to be useful.

> As NN go, I have a gut feeling that is a dead end.

Could be. Time will tell.

>
> Real NN have feedback, called recurrent, and real neurons are spiky
> which introduces signal timing and phase delays as attributes.

Sure. You have to distinguish between two different "uses" for neural
networks. One is "science", trying to figure out how the brain works.
These are usually research projects, and their emphasis is on biological
faithfulness. The other is "engineering", trying to find a better way
of solving some useful problem. For this purpose, biological
faithfulness isn't critical. That is where most of the money is, and all
of the hype.

Ivan Godard

unread,

Nov 14, 2021, 2:02:21 AM11/14/21

to

And shells had precursors too. Call any offensive hard surface a
"tooth", and any defensive hard surface a "shell". I assert that teeth
precede shells.

EricP

unread,

Nov 14, 2021, 9:18:51 AM11/14/21

to

Eyes developed then too, but so did legs, antenna, all complex life forms.
And probably teeth too, which need muscles to work them.
And all that requires a complex controller, particularly eyes.

In pre-Cambrian the most complex life was things like jellyfish which
are multicellular organisms and have nerves that allows them to swim,
and a few neurons are specialized to detect light or dark,
but no central controller, no complex signal processing or
decision making capability.

After the Cambrian line are arthropods and all the animals of today.
Eyes developed at this time which needs complex signal processing.

Clearly something changed at that boundary that allowed the assembly
of complex NN to control all of these new functions.

Scott Smader

unread,

Nov 14, 2021, 11:00:13 AM11/14/21

to

The opinions being expressed here would do well to refer to contemporary research. A "tooth" means something specific in the fossil record. The Cambrian Era was not a "line" or "boundary." (https://www.nature.com/articles/s41559-019-0821-6) The neural tube had already yielded to archencephalon which had yielded to telencephalon as the most complex neural circuitry long before the Cambrian Era started. Tens of millions of years before the Cambrian, dopamine was already generating sophisticated foraging behavior that is today displayed by "microorganisms, insects, mollusks, reptiles, fish, birds, and even human hunter–gatherers" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848052/).

Please consider reading this paper by an expert in the field: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848052/ or at least have a good look at Figure 2 from that paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848052/figure/Fig2/?report=objectonly

Then maybe we should start a separate thread?

EricP

unread,

Nov 14, 2021, 11:46:40 AM11/14/21

to

>> Eyes developed then too, but so did legs, antenna, all complex life forms..

>> And probably teeth too, which need muscles to work them.
>> And all that requires a complex controller, particularly eyes.
>>
>> In pre-Cambrian the most complex life was things like jellyfish which
>> are multicellular organisms and have nerves that allows them to swim,
>> and a few neurons are specialized to detect light or dark,
>> but no central controller, no complex signal processing or
>> decision making capability.
>>
>> After the Cambrian line are arthropods and all the animals of today.
>> Eyes developed at this time which needs complex signal processing.
>>
>> Clearly something changed at that boundary that allowed the assembly
>> of complex NN to control all of these new functions.
>
> The opinions being expressed here would do well to refer to contemporary research. A "tooth" means something specific in the fossil record. The Cambrian Era was not a "line" or "boundary." (https://www.nature.com/articles/s41559-019-0821-6) The neural tube had already yielded to archencephalon which had yielded to telencephalon as the most complex neural circuitry long before the Cambrian Era started. Tens of millions of years before the Cambrian, dopamine was already generating sophisticated foraging behavior that is today displayed by "microorganisms, insects, mollusks, reptiles, fish, birds, and even human hunter–gatherers" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848052/).
>
> Please consider reading this paper by an expert in the field: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848052/ or at least have a good look at Figure 2 from that paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6848052/figure/Fig2/?report=objectonly

Thanks, I just found the Cisek 2019 paper online this morning.
I'll have a look at the other too.

Problem is that when I started searching on this I came across
a whole bunch of papers that are also on-topic so it is easy to
get side tracked. E.G. I started looking at

[open access]
On the Independent Origins of Complex Brains and Neurons, 2019
https://www.karger.com/Article/FullText/258665

The problem is finding research that deals specifically with the
development neural interconnect and how it was able to scale out,
as opposed to say the evolution of different neural transmitters.

> Then maybe we should start a separate thread?

Seems on-topic to me (certainly more so than some of the other
threads discussions on the motivations of deities).

Understanding the origin of the wiring of biological NN (BNN)
is appropriate to discussion of NN Accelerators as we are
endeavoring to improve such simulators.

Hardware architectures like Tensorflow are likely not appropriate
for recurrent spiking NN (RSNN) and some researchers have been
exploring new hardware accelerators in this area.

MitchAlsup

unread,

Nov 14, 2021, 3:12:24 PM11/14/21

to

It is pretty clear that NNs are "pattern matchers" where one does not
necessarily know the pattern a-priori.
<
The still open question is what kind of circuitry/algorithm is appropriate
to match the patterns one has never even dreamed up ??

Terje Mathisen

unread,

Nov 14, 2021, 3:43:05 PM11/14/21

to

That is the pattern of "I have never seen this pattern before! I wonder
why?" which is the basis for most new research & inventions, right?

Stephen Fuld

unread,

Nov 14, 2021, 4:13:18 PM11/14/21

to

On 11/14/2021 12:12 PM, MitchAlsup wrote:

snip

> It is pretty clear that NNs are "pattern matchers" where one does not
> necessarily know the pattern a-priori.
> <
> The still open question is what kind of circuitry/algorithm is appropriate
> to match the patterns one has never even dreamed up ??

There are algorithms that extract some pattern from whatever data is
presented to it. They generally fall into the area of "unsupervised
learning". Whether the pattern extracted is of value is, of course, a
totally different question. :-)

But these are probably not appropriate for mass implementation in
circuitry, as they don't have as widespread use as matching a previously
known pattern.

EricP

unread,

Nov 14, 2021, 4:56:27 PM11/14/21

to

Ok I've read them quickly.
Yes there are some burrowing animals with a central nervous system
in the Ediacaran but both papers acknowledge the development of
limbs and vision across the Ediacaran–Cambrian boundary.

Burrowing doesn't strike me as complicated behavior,
perhaps just a variation on the jellyfish nerve system
and doesn't require any signal processing capability.

And jellyfish today have single neuron light sensors.
Again it doesn't require any signal processing capability to move
towards light, move away from dark, eat whatever you bump into.

I saw nothing in those to contradict the idea that something important
happened to allow the construction far more complex NN around the
Ediacaran–Cambrian boundary as an enabling technology to all the
other changes.

> Problem is that when I started searching on this I came across
> a whole bunch of papers that are also on-topic so it is easy to
> get side tracked. E.G. I started looking at
>
> [open access]
> On the Independent Origins of Complex Brains and Neurons, 2019
> https://www.karger.com/Article/FullText/258665
>
> The problem is finding research that deals specifically with the
> development neural interconnect and how it was able to scale out,
> as opposed to say the evolution of different neural transmitters.

I came across this paper later.
It is more on target for what I'm thinking of as it addresses
the change from simple nerve nets to centralized brains.

[open access]
Of Circuits and Brains: The Origin and Diversification
of Neural Architectures, 2020
https://www.frontiersin.org/articles/10.3389/fevo.2020.00082/full

In the above papers' section "How Do These Neural Circuits Evolve?"
it references to the one below which I haven't read yet but also
looks relevant (see what I mean about getting side tracked) as it
has a section titled "2.1. Evolution of connectivity":

[open access]
Developmental and genetic mechanisms of neural circuit evolution, 2017
https://www.sciencedirect.com/science/article/pii/S0012160617301495

But I don't see anything yet in either one that addresses how
they construct NN correctly (non-epileptic).

Thomas Koenig

unread,

Nov 14, 2021, 5:29:05 PM11/14/21

to

Terje Mathisen <terje.m...@tmsw.no> schrieb:

> That is the pattern of "I have never seen this pattern before! I wonder
> why?" which is the basis for most new research & inventions, right?

My personal experience is that a lot of inventions is due to
transfer, recognizing a pattern in another set of circumstances
than the one you originally saw them, and applying it.

I've read recently that there is a high correlation between the
inventiveness of people and their tendency to make bad jokes.
The latter requires taking things out of their usual context, just
as invention des.

MitchAlsup

unread,

Nov 14, 2021, 6:39:08 PM11/14/21

to

Guilty as charged............

Scott Smader

unread,

Nov 14, 2021, 9:20:57 PM11/14/21

to

On Sunday, November 14, 2021 at 1:56:27 PM UTC-8, EricP wrote:

Thank you for those references. I look forward to reading them.

I apologize for leading you to believe that the Cisek and Wood, et al, papers denied increasing neural complexity during the Cambrian Era or explained how epileptic synchrony was prevented. I sought simply to point out that sophisticated behavior and teeth did not "explode" in at a moment in time.

At the risk of burdening you with more possibly unfruitful reading, may I recommend the work of Ramin Hasani at MIT with others from various institutions? His group have built robotic control systems based ultimately on the careful measurements they had made previously on the 302 neurons and 5000 synapses of C. elegans. I don't think they have published specifically about the epileptic problem, but their work seems like it might at least be another interesting tangent for you. They do show that the hidden state of every neuron in one of their networks is bounded, but I doubt that that directly relates to epileptic avoidance.

Liquid Time Constant Networks https://arxiv.org/abs/2006.04439
This paper describes their LTC networks in some detail and compares performance against LSTM, CT-RNN, Neural ODE and CT-GRU.

Hasani also gave this talk in March about: Liquid Time Constant Networks https://simons.berkeley.edu/talks/tbd-296
Images and sequences shown starting at about 31 minutes are quite impressive, especially considering the much smaller number of neurons required and the robustness against noise. He also mentions cascading LTCs into what he calls Neural Circuit Policies which he then shows are Dynamic Causal Models.

I have only skimmed most of his publications: http://www.raminhasani.com/publications/
which document his journey from worms to LTCNs, but perhaps it may help you sort out whether the epileptic avoidance solution is earlier or later than the C elegans brain!

Best wishes!

Stefan Monnier

unread,

Nov 14, 2021, 9:52:03 PM11/14/21

to

> I've read recently that there is a high correlation between the
> inventiveness of people and their tendency to make bad jokes.

Sadly, it's only a correlation.
I think I'm pretty good(?) at making bad jokes, but it doesn't seem to
carry over to inventiveness,

Stefan

Ivan Godard

unread,

Nov 14, 2021, 9:56:50 PM11/14/21

to

On 11/14/2021 1:55 PM, EricP wrote:
> EricP wrote:

<snip>

> But I don't see anything yet in either one that addresses how
> they construct NN correctly (non-epileptic).

A feedback system can go into oscillation (i.e. epilepsy) and get eaten.
Or it can simply damp down and get eaten. Over evolutionary time it will
hunt to the chaotic boundary; that's us.

Terje Mathisen

unread,

Nov 15, 2021, 8:29:01 AM11/15/21

to

I thought that was pretty well established?

Punning is a well-known form of this, some people supposedly hate puns,
I tend to love them, the more groan-inducing the better.

EricP

unread,

Nov 15, 2021, 11:18:50 AM11/15/21

to

The artificial convolution NN are basically fancy curve fit algorithms
that adjust a polynomial with tens or hundreds of thousands of terms
to some number of inputs after millions of examples.

Biological NN perform associative learning after just a few examples
with just a few neurons.

Both are suitable for sorting fish but only one
can fit inside and control a fruit fly.

EricP

unread,

Nov 15, 2021, 11:18:50 AM11/15/21

to

Not at all, they were both interesting. And I have seen others
questioning when the Cambrian Explosion started and what caused it.

> At the risk of burdening you with more possibly unfruitful reading, may I recommend the work of Ramin Hasani at MIT with others from various institutions? His group have built robotic control systems based ultimately on the careful measurements they had made previously on the 302 neurons and 5000 synapses of C. elegans. I don't think they have published specifically about the epileptic problem, but their work seems like it might at least be another interesting tangent for you. They do show that the hidden state of every neuron in one of their networks is bounded, but I doubt that that directly relates to epileptic avoidance.

Yes, this is the track I'm thinking of.
He uses the word "liquid" to mean time-varying.

Designing Worm-inspired Neural Networks for
Interpretable Robotic Control, 2019
https://publik.tuwien.ac.at/files/publik_287624.pdf

"In this paper, we design novel liquid time-constant recurrent neural
networks for robotic control, inspired by the brain of the nematode,
C. elegans. In the worm’s nervous system, neurons communicate through
nonlinear time-varying synaptic links established amongst them by their
particular wiring structure. This property enables neurons to express
liquid time-constants dynamics and therefore allows the network to
originate complex behaviors with a small number of neurons."
...
"We evaluate their performance in controlling mobile and arm robots"
...
"The C. elegans nematode, with a rather simple nervous system composed
of 302 neurons and 8000 synapses, exhibits remarkable controllability
in it’s surroundings; it expresses behaviors such as processing complex
chemical input stimulations, sleeping, realizing adaptive behavior,
performing mechano-sensation, and controlling 96 muscles.
How does C. elegans perform so much with so little?"

> Liquid Time Constant Networks https://arxiv.org/abs/2006.04439
> This paper describes their LTC networks in some detail and compares performance against LSTM, CT-RNN, Neural ODE and CT-GRU.

This sounds like what I was trying to speculate earlier, that information
in the NN is encoded not only in the location of connections and
in their weights, but also in the phase delay of the signal arrivals.

An analogy would be an asynchronous logic circuit with feedback pathways
where propagation delay on the interconnect wire encodes part of the
signal processing logic.

I see they say the networks are stable and bounded but its not clear
to me yet why they are. I've searched for terms like "meta stability".

> Hasani also gave this talk in March about: Liquid Time Constant Networks https://simons.berkeley.edu/talks/tbd-296
> Images and sequences shown starting at about 31 minutes are quite impressive, especially considering the much smaller number of neurons required and the robustness against noise. He also mentions cascading LTCs into what he calls Neural Circuit Policies which he then shows are Dynamic Causal Models.
>
> I have only skimmed most of his publications: http://www.raminhasani.com/publications/
> which document his journey from worms to LTCNs, but perhaps it may help you sort out whether the epileptic avoidance solution is earlier or later than the C elegans brain!
>
> Best wishes!

Thanks for the pointers.

Stephen Fuld

unread,

Nov 15, 2021, 11:48:47 AM11/15/21

to

On 11/15/2021 8:18 AM, EricP wrote:

snip

> This sounds like what I was trying to speculate earlier, that information
> in the NN is encoded not only in the location of connections and
> in their weights, but also in the phase delay of the signal arrivals.

The weights in Artificial Neural Networks are a stand in for the timing
of real neurons. Weighted amount of firing doesn't really exist in real
neurons.

Real neurons are, of course, analogue. When the pre-synaptic neuron
fires, it send a relatively fixed amount of a neurotransmitter into the
synapse. This causes the receptors in the post synaptic neuron to open
a channel that allows ions into the cell. The cell has ion pumps that
continually try to maintain the potential across the cell membrane, so
over time, the effect of the depolarization from the synapse is
dissipated. But if enough synaptic receptors open before the ion pumps
can cope with the effects, the whole neuron depolarizes (i.e. "fires").

But the amount of depolarization from any one synapse is relatively
fixed. It is the number and timing of the firings of the presynaptic
neurons, not their "weight" that determines when the post synaptic
neuron fires. And, of course there is no "clock". ANNs use weights and
defined update times to simulate this.

Scott Smader

unread,

Nov 15, 2021, 1:38:09 PM11/15/21

to

On Monday, November 15, 2021 at 8:18:50 AM UTC-8, EricP wrote:

> Thanks for the pointers.
Well, quoting Bobcat Goldthwait, "Thank you for encouraging my behavior."

> This sounds like what I was trying to speculate earlier, that information
> in the NN is encoded not only in the location of connections and
> in their weights, but also in the phase delay of the signal arrivals.
>
> An analogy would be an asynchronous logic circuit with feedback pathways
> where propagation delay on the interconnect wire encodes part of the
> signal processing logic.

That is very much in haromony with the thinking in this paper which documents the use of varying myelination of axons to produce precise timing in birdsong.
Local axonal conduction delays underlie precise timing of a neural sequence
https://www.biorxiv.org/content/10.1101/864231v1

This 2009 paper directly states your proposition: "[T]he visual detection threshold fluctuates over time along with the phase of ongoing EEG activity. The results support the notion that ongoing oscillations shape our perception, possibly by providing a temporal reference frame for neural codes that rely on precise spike timing."
The Phase of Ongoing EEG Oscillations Predicts Visual Perception
https://www.jneurosci.org/content/29/24/7869

And the criticality of phase-related information is also suggested by this:
Intracranial recordings reveal ubiquitous in-phase and in- antiphase functional connectivity between homologous brain regions in humans
https://www.biorxiv.org/content/10.1101/2020.06.19.162065v2

Possibly related, this paper claims that brain signaling is divided into frequency bands:
Causal evidence of network communication in whole-brain dynamics through a multiplexed neural code
https://doi.org/10.1101/2020.06.09.142695
I don't believe the paper addresses this, but multiple bands could be used simultaneously for individual phase-synchronization signals in separated functional networks.

In line with Ivan's insightful comment about evolution optimizing control systems to the edge of chaos, it also makes sense that given enough time, evolution would find an (approximate) implementation of almost every possible signal processing technique.

And maybe even back-propagation, too, as speculated in this very recent paper about some simulations they did:
Burst-dependent synaptic plasticity can coordinate learning in hierarchical circuits
https://www.nature.com/articles/s41593-021-00857-x
I'm too cheap to buy the article, but this Wired article describes it:
https://www.wired.com/story/neuron-bursts-can-mimic-a-famous-ai-learning-strategy/
Numenta forum has a discussion about it: https://discourse.numenta.org/t/burst-as-a-local-learning-rule-in-apical-but-not-basal-dendrites/9093

Fun stuff!

EricP

unread,

Nov 15, 2021, 1:39:01 PM11/15/21

to

I don't think this is correct, or maybe we are thinking of
different types of neurons.

It is my understanding that after a neuron fires the action potential
is constant down the axon (no-fire or fire), but there are different
numbers of synaptic vesicles that release neurotransmitters and receptors
to receive them and this controls the strength of individual connections.
The number of vesicles and/or receptors are adjusted over time
to increase or decrease the individual connection weights.

Adjusting the number of vesicles or receptors and thereby the weights of
connections is thought to be one of the mechanisms for long term memory,
but also I was told to addiction to some drugs like cocaine
and possibly how LSD can cause flashbacks.

https://en.wikipedia.org/wiki/Chemical_synapse#Synaptic_strength

https://en.wikipedia.org/wiki/Synaptic_plasticity

https://en.wikipedia.org/wiki/Long-term_potentiation

Also in real neurons some connections are excitatory and others inhibitory
but that can be modeled by using signed weights.

One thing that the classic artificial "sum of multiplied weights" neuron
can't model is an XOR gate - it can only do AND and OR.

Message has been deleted

Scott Smader

unread,

Nov 15, 2021, 2:13:00 PM11/15/21

to

t's not just the number of pre-synaptic neurons; there are typically multiple synapses between connected neurons, and varying numbers of vesicles on synapses.
Pretty video from Sebastian Seung's lab in 2013:
How to map neurons in 3D
https://www.youtube.com/watch?v=_iKrE2A2Vx4
The red and green neurons can be seen contacting at two separated areas.

> One thing that the classic artificial "sum of multiplied weights" neuron
> can't model is an XOR gate - it can only do AND and OR.

Um, that was true for the original Perceptron, but XOR is a long-solved problem for networks. If you've got AND and NOT, or OR and NOT, with multiple levels, you can do anything!

Stephen Fuld

unread,

Nov 16, 2021, 6:02:39 PM11/16/21

to

You are right, of course. I obviously had a malfunction in my neural
network. :-( I apologize.

snip

> One thing that the classic artificial "sum of multiplied weights" neuron
> can't model is an XOR gate - it can only do AND and OR.

That was a failure of the original perceptron. Minsky and Papert showed
this (for any number of layers), and that was what led to the "dark
winter" of NN research. The realization that non-linear activation
functions could get around this problem is what led to their "renascence".

MitchAlsup

unread,

Nov 16, 2021, 6:23:59 PM11/16/21

to

This leads to a short story about microcode.........
<
Basically, microcode is a ROM built from a PLA--a PLA is simply 2 NOT planes
back to back. There are a lot of things a PLA cannot do easily, but the addition
of a row of XOR gates between the NOR-planes significantly increases the kinds
of things a PLA can compute (sequence...)
<
But, computer NNs are built around × and +, they could just as easily be built
around × and ± ; if the weights (and/or coefficients) were signed.
<
It is just arithmetic..........
<
Getting nice signmoid functions is not that hard with look-up-tables.........

EricP

unread,

Nov 17, 2021, 10:08:10 AM11/17/21

to

MitchAlsup wrote:
> <
> This leads to a short story about microcode.........
> <
> Basically, microcode is a ROM built from a PLA--a PLA is simply 2 NOT planes

NOR planes or course.

> back to back. There are a lot of things a PLA cannot do easily, but the addition
> of a row of XOR gates between the NOR-planes significantly increases the kinds
> of things a PLA can compute (sequence...)

The planes are both dynamic logic because CMOS doesn't allow static
wired-OR and I saw some mention that there are some design fiddly bits
ensuring the second NOR plane doesn't discharge the first plane too soon.

Hmmm... XOR's between NOR planes... interesting, I never thought of that.

> <
> But, computer NNs are built around × and +, they could just as easily be built

> around × and ± ; if the weights (and/or coefficients) were signed..
> <
> It is just arithmetic..........

That is how it is usually represented in the equations,
as the sum of a series of synapses state multiplies times the weights,
and an unsigned compare to a trigger value.

The sum and trigger have enough bit to not overflow.
For 1024 8-bit integer synapse weights the parallel adder looks
like it requires 2048 adders varying in size from 8 to 8+10 bits
producing an 18 bit total. Does that sound correct?

Note that the synapse state is 0 or 1 so the multiply is unnecessary.
I would consider replacing the above neuron mechanism with a PHI or MUX
function to select between 0 or a signed weight for each synapse based on
the 0 or 1 state, the sum done with signed arithmetic without overflow,
and a signed compare to the trigger level.

Thomas Koenig

unread,

Nov 17, 2021, 12:37:45 PM11/17/21

to

MitchAlsup <Mitch...@aol.com> schrieb:

> Basically, microcode is a ROM built from a PLA--a PLA is simply
> 2 NOT planes back to back. There are a lot of things a PLA cannot
> do easily, but the addition of a row of XOR gates between the
> NOR-planes significantly increases the kinds of things a PLA can
> compute (sequence...)

Sound interesting.

Do you have a reference for that, by any chance?

MitchAlsup

unread,

Nov 17, 2021, 2:44:32 PM11/17/21

to

Probably not:: we did use this trick on the 68000, 68010 and 68020 microcode
stores.
<
Use cases: Say you have a term that is asserted by 98% of all microcode
"instructions", you can save power by only computing it on the 2% that
don't need it and then use XOR to flip the polarity.
<
Another use case: Say you have a calculation and an available function
unit and, as long as "blah" does not happen, you can use it, so you setup
microcode to assume you can do it, and then use the XORs to cancel
it on those "special" occasions when someone else consumes that function
unit.

JimBrakefield

unread,

Nov 17, 2021, 3:38:43 PM11/17/21

to

On Wednesday, November 17, 2021 at 11:37:45 AM UTC-6, Thomas Koenig wrote:

wikipedia has a writeup on "CPLD" most of which have the XOR of a product term with other sum-of-product terms.
Digikey has them in stock along with data sheets
"Embedded - CPLDs (Complex Programmable Logic Devices)"

I've used the XC9536X series.
It has a schematic of the "Macrocell"
Totally obsoleted by FPGAs which are bigger, faster and lower power.

BGB

unread,

Nov 17, 2021, 3:47:09 PM11/17/21

to

On 11/15/2021 10:15 AM, EricP wrote:
> MitchAlsup wrote:
>> On Sunday, November 14, 2021 at 10:46:40 AM UTC-6, EricP wrote:
>>>
>>> Understanding the origin of the wiring of biological NN (BNN) is
>>> appropriate to discussion of NN Accelerators as we are endeavoring to
>>> improve such simulators.
>> <
>> It is pretty clear that NNs are "pattern matchers" where one does not
>> necessarily know the pattern a-priori.
>> <
>> The still open question is what kind of circuitry/algorithm is
>> appropriate to match the patterns one has never even dreamed up ??
>
> The artificial convolution NN are basically fancy curve fit algorithms
> that adjust a polynomial with tens or hundreds of thousands of terms
> to some number of inputs after millions of examples.
>

A lot of what people are doing with NNs could be done with
auto-correlation and FIR filters.

Though, in the overly loose definitions often used, one could also try
to classify auto-correlation and FIR filters as a type of NN.

> Biological NN perform associative learning after just a few examples
> with just a few neurons.
>
> Both are suitable for sorting fish but only one
> can fit inside and control a fruit fly.
>

There is one reason I like genetic algorithms and genetic programming
for some tasks:
While the training process is itself fairly slow, and the results are
(rarely) much better than something one could come up with themselves
(in usually a fraction the time and effort), one can at least use it to
generate results that are fairly cheap to run within the constraints of
the target machine (unlike CNN's or "Deep Learning" models).

So, one can in theory set up a GP evolver to be able to use a simple set
of vector-arithmetic operators and a simplified register machine
(generally simpler register models seem to work out better, and are
simpler to implement, than a GP evolver which works in terms of ASTs).

The weighting algorithm can also impose a penalty for the number of
(Non-NOP) operations used, favoring smaller and simpler solutions (and
causing non-effective operations to tend to mutate into NOPs; which can
be removed when generating the final output).

Say, for example, for each "program":
Has between 64 and 1024 instruction words to work with;
Usually this is a fixed parameter in the tests.
Has 32 or 64 bits per instruction word;
Has 16 or 32 registers;
May or may not have control-flow operations (depending on the task);
...

An example GP-ISA design might have:
64-bit instruction words;
16 or 32 registers, encoded in a padded form (ECC style, *);
Opcode bits may or may not also have a padded encoding;
Most invalid operations are treated as NOP;
There is a way to encode things like vector loads, ...;
Most operators are 3R form, eg: "OP Rs, Rt, Rd"
...

*: Multiple encodings may map to the same logical register, and using
ECC bits makes the register ID more resistant to random bit-flips
(caused by the mutator).

So, a register may be encoded as:
4 bits (abcd): Register Number, Gray-Coded
3 bits: Parity (a^b^c, b^c^d, c^d^a)
R0: 000-0000
R1: 011-0001
R2: 100-0011
R3: 111-0010
R4: 001-0110
...

Similarly, could do a 5 bit register in 8 bits, eg:
{ a^b^c, b^c^d, c^d^e, e, a, b, c, d }

The ECC (~ Hamming(7,4)) may try to "correct" the register on decode.

Opcode may encode an 8 or 10-bit opcode number in 16 bits.

Encodings which fall into the "unrecoverable" or "disallowed" parts of
the encoding space are interpreted as NOPs.

Vector immediate values may be encoded as 48-bits, such as four 12-bit
floating-point values (S.E5.M6), which may also be stored in gray-coded
form. There might also be 2x 24-bit bit (truncated gray-coded single),
or 1x 48-bit (truncated gray-coded double).

It may also make sense to have integer operators available (depends on
the task).

...

The GP evolver basically consists of:
Test data, which is fed into the program in some form;
Say, the test data is presented as input registers.
An interpreter, which runs each GP program;
Output is one or more registers.
A heuristic to rank its performance;
...

For breeding the top-performing programs:
Pick instruction words randomly from each parent;
Randomly flip bits in each child produced.

The initial state would fill the programs with "random garbage" though,
using NOP encodings for the operators.

If one allows for control-flow, the interpreter will automatically
terminate after a certain number of instructions, and impose a fairly
severe penalty value.

Result (after a certain number of runs) would be dumped out in an ASM
style notation ("disassembled" from the internal format).

Not really developed any of this into a cohesive library or tool, partly
as it tends to be fairly ad-hoc, and I am not sure if anyone besides
myself would find something like this all that useful. These sorts of
small-scale tests were usually done via "copy-pasting something together".

...

Actually, thinking of it, it isn't exactly that huge of a stretch that
someone could also run such a GP evolver on an FPGA (as opposed to
running it on the CPU on a PC). It is possible that an FPGA could be
significantly faster at this task (if one had a good way to move results
and data between the FPGA and a PC).

...

EricP

unread,

Nov 17, 2021, 4:00:55 PM11/17/21

to

Google Scholar finds

https://www.biorxiv.org/content/10.1101/2020.03.30.015511v2.full

> Numenta forum has a discussion about it: https://discourse.numenta.org/t/burst-as-a-local-learning-rule-in-apical-but-not-basal-dendrites/9093
>
> Fun stuff!
>

Thanks.

With respect to phase shifting of signals I was thinking it might be
viewed as a dynamic mechanism.
Viewed statically neuron A triggers if B AND (C OR D) inputs are present.
But viewed dynamically signal C arrives early, D arrives later,
so they can be seen as causing a phase shift in A's output,
C advances A's output, D retards A's output.

Networks of such dynamic circuits connected together with feedback
reminded me of a hologram where the associative memory is distributed
across the whole of the circuit. That would be a big advantage as it
means that no single neuron is responsible for an individual memory.
Also the storage capacity is much higher than just the number
of synapses implies. Also no back propagation is required.

https://en.wikipedia.org/wiki/Holographic_associative_memory

I finally found the links I was looking for and it seems I am not
the first to note a similarity between neural networks and holograms
as it was proposed by Pribram in 1969. Apparently these are called
"Holographic Recurrent Networks" or "Holographic Reduced Representations"

Holographic Recurrent Networks, Plate, 1992
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.17.6991&rep=rep1&type=pdf

Holographic Reduced Representations, Plate, 1995
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.4546&rep=rep1&type=pdf

Plugging those titles into Google Scholar leads to
great many papers on holographic neural memory like

Encoding Structure in Holographic Reduced Representations, 2013
https://www.researchgate.net/profile/Douglas-Mewhort/publication/233836706_Encoding_Structure_in_Holographic_Reduced_Representations/links/57adb76608ae0932c976b72e/Encoding-Structure-in-Holographic-Reduced-Representations.pdf

Dynamically Structured Holographic Memory 2014
https://psyarxiv.com/pw93e/download/?format=pdf

Towards holographic brain memory based on randomization and
Walsh-Hadamard transformation 2016
https://www.researchgate.net/profile/Shlomi-Dolev/publication/289487007_Holographic_Brain_Memory_and_Computation/links/5d23fd2e92851cf4407280d0/Holographic-Brain-Memory-and-Computation.pdf

My gut tells me that somehow all of the above ties together
and the path forward to really useful artificial NN's is in
systems that combine all of the above ideas.
Which is why I thought convolution NN look like a dead end.

Yoga Man

unread,

Nov 17, 2021, 8:32:18 PM11/17/21

to

On Wednesday, November 17, 2021 at 1:00:55 PM UTC-8, EricP wrote:
<snip>

> With respect to phase shifting of signals I was thinking it might be
> viewed as a dynamic mechanism.
> Viewed statically neuron A triggers if B AND (C OR D) inputs are present.
> But viewed dynamically signal C arrives early, D arrives later,
> so they can be seen as causing a phase shift in A's output,
> C advances A's output, D retards A's output.
>

Subutai Ahmad at Numenta has proposed that sub-threshold dendritic pre-charge can prime a neuron to fire enough sooner than competitive neurons in a voting circuit to allow it to inhibit their firing. It's certainly possible that there are asynchronous races between portions of neural circuits as they compete for their own axonal discharge behavior (fire/not fire/burst/partial depolarization) to get a reward from the system for that behavior (eg, nourishment that allows new synapses to be made; maybe some other hygienic activity by astrocytes). But if the system is phase-locked (with some controllable variability) to a reference, then isn't it easier to pick winners and losers? (Presumably, neurons that don't meet the next-cycle deadline don't get rewarded and gradually disconnect from that circuit.)

And it's possible that networks of asynchronous circuits might fire synchronously at chaotically stable frequencies without being influenced to do so, but that seems pretty unlikely to me, especially given the ubiquity of alpha, theta, etc. waves.

I agree that the original CNNs are inferior to however networks are configured in human brains, and I mean no disrespect to your gut, but there are other ways to survive noise and component failure in an associative memory. One such is sparse data representation, also described by Numenta. SDRs have other neat characteristics, like the ability to store multiple entries in one group of neurons while retaining the ability to access them individually, and the correspondence between the sparsity of neurons required for a representation in an SDR and the fact that only about 2% of neurons are active at any given moment. Numenta offer a suite of tools to explore SDRs.

Subutai Ahmad has also shown how a single neuron can learn hundreds of contexts or sequence of its inputs. The Numenta group has done some really innovative and useful work.

One thing I don't like about Numenta's approach is their assumption that intelligence is neocortical. That seems kinda chauvinist coming from the species with the most blown-out neocortexx, and a whole lot of animals survived and are surviving without neocortices. Besides, humans had bigger brains before 3,000 years ago, so were people smarter then?
https://www.frontiersin.org/articles/10.3389/fevo.2021.742639/full

And now I have some learning to do about the biological basis for holographic memory. I had dismissed it years ago, as reports of hippocampal localization of memory, place and grid cells, visual cortex specificity - esp. layer 1, etc., grew more frequent. I had not realized it's actively being pursued.

Thank you.

Scott Smader

unread,

Nov 17, 2021, 8:35:09 PM11/17/21

to

Oops. I was logged in with my other Google ID. I'm usually Yoga Man in other contexts. I'll try to keep it Scott Smader here.

EricP

unread,

Nov 18, 2021, 12:30:01 PM11/18/21

to

EricP wrote:
>
> The sum and trigger have enough bit to not overflow.
> For 1024 8-bit integer synapse weights the parallel adder looks
> like it requires 2048 adders varying in size from 8 to 8+10 bits
> producing an 18 bit total. Does that sound correct?

Its only 1023 adders.

In this example each neuron must sum 1024 signed weights,
implemented as a tree of adders for operand pairs,
probaby pipelined because of the number of layers.

512 9-bit adders
256 10-bit adders
...
2 17-bit adders
1 18-bit adder

I was curious whether there was a better way to do this than
a tree of adders. A bit of poking about finds these are called
"multi-operand adders" and there has been a fair amount of
research on these since NN became a thing.

There are a few approaches that have different power, area, delay
properties, e.g. Array tree adder, Wallace tree adder,
Balanced delay tree adder and Overturned-stairs tree adder.

But I can't find anything significantly better than a tree of adders.

MitchAlsup

unread,

Nov 18, 2021, 12:40:59 PM11/18/21

to

Tree of carry save adders ?? !!

Thomas Koenig

unread,

Nov 18, 2021, 2:38:56 PM11/18/21

to

EricP <ThatWould...@thevillage.com> schrieb:

> EricP wrote:
>>
>> The sum and trigger have enough bit to not overflow.
>> For 1024 8-bit integer synapse weights the parallel adder looks
>> like it requires 2048 adders varying in size from 8 to 8+10 bits
>> producing an 18 bit total. Does that sound correct?
>
> Its only 1023 adders.
>
> In this example each neuron must sum 1024 signed weights,
> implemented as a tree of adders for operand pairs,
> probaby pipelined because of the number of layers.
>
> 512 9-bit adders
> 256 10-bit adders
> ...
> 2 17-bit adders
> 1 18-bit adder
>
> I was curious whether there was a better way to do this than
> a tree of adders. A bit of poking about finds these are called
> "multi-operand adders" and there has been a fair amount of
> research on these since NN became a thing.

You can do it as a carry-save adder until there are only two 17-bit
numbers left. Think Wallace (or Dadda) tree.

Using 4:2 compressors will reduce the number of bits by a factor of
(roughly) two for each level, without having to worry about carry
propagation until the last two numbers are added.

Terje Mathisen

unread,

Nov 18, 2021, 2:49:58 PM11/18/21

to

Indeed!

That should be the first option for any structure like this,
particularly when/if you only need to extract the final result quite rarely.

No need to suffer the final carry propagation delay more than once, right?

MitchAlsup

unread,

Nov 18, 2021, 3:37:12 PM11/18/21

to

The rule of thumb for large sums is that you use one (1, uno, a single) carry chain
and as many 3-2 counters or 4-2-compressors as you need to reduce the sums
and carries to 2. All accumulations can be doin in carry save--in fact the entire
integer unit of the NEC-S/X (forgot model) was carry save so that adds would
always be 1 cycle. The only time the sums and carries were integerized was
when they were used in address arithmetic.
<
This was also how the Goldschmidt FDIV algorithm worked on the 360/91.

EricP

unread,

Nov 18, 2021, 5:01:06 PM11/18/21

to

Right. My brain fart there. Apologies.
I read that on FPGA's, for adders less than 32 bits Ripple Carry Adder
(RCA) is faster than Carry Save Adder due to routing delays,
and then interpreted everything as an RCA tree.
Had I looked a little further I would have seen they were
actually making a case for using 4:2 compressors.

MitchAlsup

unread,

Nov 18, 2021, 5:58:18 PM11/18/21

to

Every person who wants to call themselves a computer architect should
understand the "math" behind the delay of an Adder:: whether it be a
ripple carry adder (4+bits), a carry-propogate adder (5+ln4(bits)), a
carry select adder (6+ln8(bits)), and also understand that there are
special adders to better deal with the delay characteristics of multiplier
trees (Baugh Wooley) and adders designed for "other purposes" like
Kooge-Stone adders.
<
Apparently we have a NG filled with people who don't really want to be
known of as computer architects..............sigh......................

robf...@gmail.com

unread,

Nov 21, 2021, 5:33:26 PM11/21/21

to

For Thor the neuron is based on a weighted sum, using a multiplier and adder.
A clocked circuit is used and values are multiplied and added in sequence, the
operation iterates through a table using a counter. This has an advantage that
the values used may be partitioned allowing the same neuron to emulate more
than one neuron. I am not sure but a cascaded tree of 1024 adders with multipliers
in an FPGA is bound to use a lot of resources and be slow. A faster clock can be
used for a sequenced adder. I am curious what the difference in propagation delay
between input to output would be for a sequenced approach versus a direct
approach. I expect the adder tree approach to be many times faster, but it is
possible to fit many more neurons operating in parallel with a sequenced approach
because it uses fewer resources.
1 Thor Neuron uses: 2132 LUTs, 9 DSP blocks, 220 FF’s and 1.5 BRAMs.
Thor currently has 8 neurons operating in parallel, but it may be possible to increase
the count to 16.
Software is going to be relied on to map a virtual neural net to the available neurons.

JimBrakefield

unread,

Nov 21, 2021, 7:27:54 PM11/21/21

to

Barrel processors would seem to be a natural for problems with NN timing on FPGAs?
Decent Wikipedia page on barrel processors in general (with several examples).
Charles LaForest has a summary of his thesis work: http://fpgacpu.ca/octavo/
He utilizes the DSP blocks at their maximum frequency.
Another barrel processor: https://opencores.org/projects/avr_hp, probably others
And other vector and multiple issue designs.

Ivan Godard

unread,

Nov 21, 2021, 8:04:30 PM11/21/21

to

CDC6600 Peripheral Processor.

MitchAlsup

unread,

Nov 21, 2021, 8:44:18 PM11/21/21

to

Qualcom micro-CPU.

JimBrakefield

unread,

Nov 21, 2021, 8:45:03 PM11/21/21

to

On Sunday, November 21, 2021 at 7:04:30 PM UTC-6, Ivan Godard wrote:

|> CDC6600 Peripheral Processor.
Have some experience, could read/write five 12-bit "bytes" to/from 6600 main memory at a time.
Perhaps the best (with documentation) example of a barrel processor.

JCB humor:
Lincoln said God must have loved the poor, he made so many of them.
Cray liked the CDC160 cause he made a lot of them (10 on each 6600).

The ten "160" uP took one of the 16 frames on a 6600. The CPU took eight?
The in-order slow 6400 CPU took one frame. Various combinations were sold.

MitchAlsup

unread,

Nov 21, 2021, 9:19:47 PM11/21/21

to

In a barrel processor there is one execution pipeline and n× fetch-writeback
pipeline-stages.
<
When a PP Read main memory, it had to do so using 5 LD instructions in a row
each LD getting 12-bits. Should such a LD not be fetched, that slot of the R-W
pryamid would simply be skipped. So if PP knew it was reading an address
(18 bits) it could use 2 LDs.
<
The PPs were not as complicated as a DG-NOVA.

MitchAlsup

unread,

Nov 21, 2021, 9:22:29 PM11/21/21

to

Also note: although not claimed as a barrel processor::

reading the B6700 processor patents circa 1970-1985 one could
easily imaging that the B6700 was a 3-deep barrel; especilly those
at a circuit design and pipeline design levels.

JimBrakefield

unread,

Nov 21, 2021, 9:39:05 PM11/21/21

to

Uncle: Deep NN is too big and moving too fast for me to keep up.
https://xilinx.github.io/finn/about looks like my entry point into hacking FPGA NNs

Sean O'Connor

unread,

Dec 12, 2021, 8:26:48 PM12/12/21

to

The fast Walsh Hadamard transform needs only add and subtract operations. It is easy to turn into a neural network. I have a booklet for $25 US. Side-channel browser attacks are making typing very slow. Lol.