How well does ArangoDB cope with the super node problem for graphs?

126 views
Skip to first unread message

F21

unread,
May 12, 2013, 2:16:19 AM5/12/13
to aran...@googlegroups.com
I am planning to build an activity feed using ArangoDB as a data store.
 
A super node is essentially a node with a large amount of relationships.
 
This was a problem with neo4j (although they have introduced optimizations and other tricks to deal with it). I have not played with neo4j too much to see if this is still a problem or if it has been resolved.
 
Essentially, this resulted in a design that looks like so:
 
 
 
This model does introduce some problems. For example, I would like to aggregate statuses (events) across multiple users. With this model, I will still need to get all the status nodes connected to x amount of users and then perform the aggregation. I would prefer a model that has super nodes (users with lots of event nodes connected to them) as it is much simpler to query.
 
Does ArangoDB have any issues with super nodes (from my understanding a full set of data is held in memory, but I could be wrong)?
 
 

Frank Celler

unread,
May 12, 2013, 2:38:43 PM5/12/13
to aran...@googlegroups.com
Basically, the memory usage is (sizeof-nodes + sizeof-edges + number-relations). For instance, if you have two nodes, a fixed edge document type, and n edges of this type between the two nodes, then the memory usage is linear in n.

Does this answer your question? We have no super-nodes per-se.

Cheers
  Frank

F21

unread,
May 12, 2013, 7:20:57 PM5/12/13
to aran...@googlegroups.com
Hi Frank,
 
Thanks for your answer :) How about from a computational perspective?
 
If I have a node, with lots of relationships to event nodes, and I want to get all event nodes and process them:
 
            ---   has_event ---> event_node1
Frank  ---   has_event ---> event_node2
                  ......
           ---   has_event ---> event_nodeX
 
How does this scale computationally? Does computation for getting those event nodes scale linearly? Or is the computational cost flat?

Frank Celler

unread,
May 13, 2013, 7:25:34 AM5/13/13
to aran...@googlegroups.com
The relationship is stored as a multi-value hash map where the key is the out-node. So it takes linear time to find all out (or in or any) edges.

Todd Leo

unread,
Jan 31, 2017, 5:51:13 AM1/31/17
to ArangoDB
Hi Frank,

Does this implies ArangoDB handles super-node problem well, when traversalling through the super-nodes?

Reply all
Reply to author
Forward
0 new messages