What I am saying is that I don't think the atoms and links can be connected to make a neural network straightforwardly. Of course, one could make atoms that represent the coefficients of the model that the CNN represents and then connect those with links that have weights and then make a function that can take such a hypergraph and tune the weights. But wouldn't that be very inefficient? Wouldn't you want to just represent a feature vector in atomese and then run CNN on it (through an external library perhaps) and get results in atomese that the other algorithms can pick up? But then again, I have very little idea what I am talking about, so I may be way off.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/b3ca200e-c039-4417-96dc-5ef3f37f38ea%40googlegroups.com.
Guys,
The big problem with using GPUs for OpenCog is that most OpenCog
cognitive algorithms would be better suited for MIMD parallelism than
for SIMD parallelism
To put it simply, GPUs are SIMD parallel which means they are suited
for cases where one needs to repetitively do the same thing over and
over to multiple data items ... Neural net algorithms tend to be like
this. In OpenCog, ECAN is also like this (as it's basically a
special variant of an attractor neural net). But the other OpenCog
algorithms are generally not like this. They are tractably
parallelizable, but only on a MIMD parallel substrate...
Another issue is RAM access -- for OpenCog (or any system centered on
manipulation of large graphs) the biggest cost in terms of processing
time is RAM access for small, hard to predict RAM read/writes .... So
if the bulk of RAM is not on the GPU then all the savings realized by
the GPU will be eaten by GPU-CPU messaging
What you really want for OpenCog is a MIMD parallel chip, with a lot
of RAM, and special connects btw the processors' caches..... This
would let you put OpenCog on embedded devices in a useful way, and
also build OpenCog-tailored supercomputers.... These would be
customized for OpenCog in the same sense that the current crop of
"deep learning chips" are customized for hierarchical NNs.
Mandeep Bhatia and I have sketched some ideas about an "OpenCog chip"
along these lines but have been too busy with other stuff to refine
these ideas into a detailed design that can be given to an FPGA
programmer for prototyping... it will happen eventually ;)
For the present, GPUs could be used for certain special purposes
within OpenCog -- e.g.
-- ECAN importance spreading across an Atomspace whose structure does
not frequently change
-- maybe, with a lot of work, some sort of limited (but could still be
very useful) pattern matching against an Atomspace whose structure
does not frequently change
These could be quite valuable but wouldn't constitute "porting the
whole OpenCog to GPU"
Hi Andi,Ben has a good answer, and to emphasize, let me add this: Think of the atomspace as being a collection of trees. The atoms are the nodes in the tree. Any one atom can appear in many trees, and so the whole thing is in fact tangled into a big matt, like a rhizome https://www.google.com/search?q=rhizome&tbm=isch
The pattern matcher starts at one atom, and walks the rhizome, exploring nearest neighbors, until all the entire neighborhood is explored (and a match is found, or some other (local) computation is performed).The problem is that the atoms are scattered randomly through RAM, so when the nearest neighbor walk happens, random locations in RAM get visited. I'm guessing that there is a lot of cache-miss going on two: If you have, say, a CPU cache that is 8-way, 4-associative, then you could have maybe 32 atoms in the cache, but the chance that the 33rd atom will accidentally be in one of the existing cache lines is just about zero, and so the graph walk will have a 99.9% cache-miss rate. (most graphs that get searched have more that 32 atoms in them. )
Hmm, I have an idea -- I guess the atomsapce *could* keep track of individual connected components (create a bag of trees, which are connected by one or more atoms) -- any given search is guaranteed to stay in just one bag, and so maybe one could download the entire bag to the gpu before starting a search. Could work if the bags are small enough to fit in GPU ram.
Maybe allocation could be changed to improve cache locality: allocate atoms so that they are more likely to be on the same cache line if they are also connected. But this becomes a hard, fiddly computer-science problem...