The perfect architecture for OpenCog?

Murilo Saraiva de Queiroz

unread,

Jun 9, 2017, 1:29:47 PM6/9/17

to opencog

This non-von-Neumann approach allows one big map that can be accessed by many processors at the same time, each using its own local scratch-pad memory while simultaneously performing scatter-and-gather operations across global memory."

Graph analytic processors do not exist today, but they theoretically differ from CPUs and GPUs in key ways. First of all, they are optimized for processing sparse graph primitives. Because the items they process are sparsely located in global memory, they also involve a new memory architecture that can access randomly placed memory locations at ultra-high speeds (up to terabytes per second).

DARPA Funds Development of New Type of Processor
Worlds 1st Non-Von-Neumann
http://www.eetimes.com/document.asp?doc_id=1331871&

--

Murilo Saraiva de Queiroz, MSc

Hardware Engineer at NVIDIA

Linas Vepstas

unread,

Jun 9, 2017, 2:01:07 PM6/9/17

to opencog

Oh gosh, yes, they would be. I've been struggling to write code for days, that has to matrix-multiply a vector times a large sparse matrix --- of the 15 trllion possible entries in the 5M x 300K only 15 million are non-zero -- so about 1 in 2^20 are non-zero. I've flip-flopped back and forth several times on how to do this, but for me, currently, I guess I will have to keep intermediate values in a cache that is a hash table or a btree ... which is insane -- just to mutiply two floating-point numbers -- which takes a CPU cycle on modern CPU's, I have to do one hash-table or btree access to get the value of one of the two float! For every multiply!

The hard part of this for me so far is to make sure that 100% of the hash/tree accesses do not miss. which I can do. What a mess.

My "matrix" can be thought of as a graph adjacency matrix. The graph has 5.3 million nodes in it. Each node has anywhere from 10 to 300K edges attached to it, in a scale-free zipfian way.and I have to visit every edge exactly once to do the matrix multiply. And I have to do many many matrix multiplies.

So yes, a native graph architecture would be .. awesome.

--linas

--
You received this message because you are subscribed to the Google Groups "opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opencog+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit https://groups.google.com/d/msgid/opencog/CAJ1aRoEUpu5dQxTkm4OR%2BS%3DT7OCM-KjEq5QQ9he9%2BWYXSSHbRQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Dmitry Ponyatov

unread,

Jun 10, 2017, 2:02:28 AM6/10/17

to opencog

Graph analytic processors do not exist today, but they theoretically differ from CPUs and GPUs in key ways. First of all, they are optimized for processing sparse graph primitives. Because the items they process are sparsely located in global memory, they also involve a new memory architecture that can access randomly placed memory locations at ultra-high speeds (up to terabytes per second).

Very interesting theme. In short words, it is not key property of CPU in whole, but limit of memory addressing architecture, MMU and primary CPU cache.

Most modern CPUs uses paging MMU with 4K granularity (in case of Intel), it can limit efficiency of graph processing, and MMU/BUS/CACHE hardware must be optimized for huge use of pages memory mapping.

Dmitry Ponyatov

unread,

Jun 10, 2017, 2:06:02 AM6/10/17

to opencog, linasv...@gmail.com

Oh gosh, yes, they would be. I've been struggling to write code for days, that has to matrix-multiply a vector times a large sparse matrix

Why you don't use one of wide range of sparse matrix libs ?

Reply all

Reply to author

Forward