I've had both on-forum and off-forum encouragement to expand on the
"smart memory" concept I referred to on another thread, which promises
to increase the short-precision speed of PCs by somewhere around 3
orders of magnitude, allowing many applications that now seem hopelessly
out of reach to run at real-time speeds.
I'll explain the basic concept here. I have lots of proprietary
enhancements to make this run a lot faster/better/cheaper, but they are
nonessential to the basic concept, and so I will withhold these
unless/until someone steps forward to fund this, as the funding decision
can be made in the absence of these enhancements.
The core concept starts with an ALU that is a 16-bit logarithmic vector
scatter/gather - multiply/divide - add/subtract ALU. The data paths are
each through a multiplexer to get to one of, say, 8 or 16 selected
blocks of memory. This would be a bit like a CDC-205 vector ALU without
the scalar processor, so that (hopefully rare) scalar operations would
have to be handled as one-word vectors, and it would compute with
logarithms. Obviously, multiple ALUs could work together just like in
the 205, and even share the same "fudge factor" memory, so very little
space would be needed for the second ALU.
For of you who are not into logarithms, you multiply and divide by
adding or subtracting the logarithms of your numbers. Addition and
subtraction (I know, they tell you in school that this is impossible) is
accomplished by first taking the ratio of the arguments (by subtracting
the logarithms), looking up a "fudge factor" in a table, then
multiplying (by adding the logarithms) the larger of the numbers by the
fudge factor. That his requires a table limits its precision, but the
table can be VERY compressed (~4KB) with some clever analysis. Also,
there are some little-known multiple-precision methods that can be
employed. Negative numbers are flagged-positive, that changes which
table you look at along with some of the logic to manipulate the flags.
Logarithms work more like short FP than like integers.
The BIG reason why logarithms are required is to avoid multipliers and
shift matrices that are NOT compatible with memory fab.
Since an ALU becomes just a small block of ROM with some really trivial
logic attached, it takes very little space/power and can be FABed with
standard memory fab processing.
Next, instead of using one large memory, subdivide a large memory into
hundreds/thousands of small blocks of independently accessible memory.
These would then be clustered around each of the ALUs, so that each ALU
could access some small subset of the entire memory, arranged something
The memories each would be switchable between the two nearest ALUs. If
an ALU goes to access a memory that is busy servicing the other ALU, it
simply waits for the memory to become available.
There has been some discussion of SIMD vs. MIMD. I think MIMD wins.
The external connection would be hooked up to edge memories that are
otherwise hooked to one ALU, to simulate a standard memory SIMM.
Proprietary methods would make almost everything redundant, so that
anything could fail and it could just be reconfigured to work around the
failed portion. The FIRST task on startup would be to run a diagnostic
to test everything out and configure failed portions into the ether.
Hence, a significant fraction could be bad and the unit would work quite
well. A reasonable number of defects are NO PROBLEM, so as long as the
process isn't out of control, the yield should be 100% regardless of the
size of these processors. THIS is what makes them cheaper then other
memory - because you don't have to throw any of them away at the factory.
Externally, this would look like a VERY large memory with an embedded
MMU, and would be EXTREMELY reliable since parts can even fail in
service, and be reconfigured on reset. It would come up as the smaller
memory that its I/O pins are hooked to, and morph to MMU control once a
"key" is deposited into a particular memory location.
There are a zillion details to make this really practical, worked out at
a considerable cost, that I'd prefer not to post to avoid helping my
future competitors. However, there should be enough here to see that it
**WILL** work, and with memory densities being what they now are, a LOT
of these ALUs can be put into a single memory. If I've missed any
details needed to verify the practicality of this, please ask.
Yes, I know, there are a LOT of things that this can't do. However,
there are some BIG things that it could do - like simulate YOU, neuron
by neuron, synapse by synapse, in real time. Not with the first units
off of the assembly line, with a modest amount of development. I worked
the numbers out in the closing presentation at the first IJCNN - without
For those of you who are interested in the "live forever machine", this
appears to be the critical missing computational piece.
Now, you can probably see why I think that it is so important to get
logarithmic representation into the IEEE spec!
OK, does anyone have enough lunch money set aside to do this and
transform the computer industry? This will be a VERY exciting project.
And besides, your (immortal) life may depend on it 8-)