Scalability / Memory Usage

31 views
Skip to first unread message

gr...@stanford.edu

unread,
Jul 21, 2016, 1:17:25 PM7/21/16
to The Junto Label Propagation Toolkit Open Discussion
Hi,

Firstly, thanks for your work on this great toolkit! I am trying to get a rough idea of how much memory Junto will need, to figure out the feasibility of running it on large graphs.

Do you have an idea of the memory overhead per node/edge? Or whether it tends to grow as a function of the number of edges or nodes?

For example, is it feasible to run a graph with 50 million edges (and ~5 millon nodes) in the non-Hadoop implementation? In a .csv edge format this occupies roughly 3GB, but I'm not sure how Junto is representing it in memory. I'm just trying to get an idea of the order of magnitude of memory to expect to need.

Thanks in advance.

Partha Talukdar

unread,
Jul 21, 2016, 4:17:14 PM7/21/16
to junto...@googlegroups.com
On Thu, Jul 21, 2016 at 10:47 PM, <gr...@stanford.edu> wrote:

Do you have an idea of the memory overhead per node/edge? Or whether it tends to grow as a function of the number of edges or nodes?

Junto isn't really optimized in terms of storage, a lot more could be done. Trove hashmaps are mostly used, for example, see here  src/main/scala/upenn/junto/graph/Vertex.scala ... It will clearly grow with number of nodes and edges.

For example, is it feasible to run a graph with 50 million edges (and ~5 millon nodes) in the non-Hadoop implementation? In a .csv edge format this occupies roughly 3GB, but I'm not sure how Junto is representing it in memory. I'm just trying to get an idea of the order of magnitude of memory to expect to need.

I guess depends on how much RAM you have. The hadoop implementation is recommended for large graphs.

hth,
Partha 
Reply all
Reply to author
Forward
0 new messages