What's the max graph size for TinkerGraph?

224 views
Skip to first unread message

Pei Daqi

unread,
Jul 25, 2018, 4:03:02 PM7/25/18
to Gremlin-users

I have several graphs and the biggest is about 2GB on disk. I hope to have them entirely in memory for max reading efficiency, and therefore I'm using TinkerGraph and Gryo as the backend.
However, the system freezes when trying to load even the smallest file (only 200MB, with 700K nodes and 14M edges), either through Java code or Gremlin.

I wonder what is the max graph size that works with TinkerGraph? In any case, it feels strange it cannot even handle a 200MB graph...

Stephen Mallette

unread,
Jul 25, 2018, 4:13:51 PM7/25/18
to Gremlin-users
TinkerGraph is limited by memory. How are you loading the data now?



--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/7d80dceb-b1ac-487b-82cc-179bab71ec79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pei Daqi

unread,
Jul 25, 2018, 4:21:16 PM7/25/18
to Gremlin-users

I have 16GB of memory and I feel a 200MB graph file should fit even considering the overhead.
The graph file without edges is 180MB and loaded just fine.

Stephen Mallette

unread,
Jul 25, 2018, 4:24:18 PM7/25/18
to Gremlin-users
That's fine, I was just answering your question as to what the max size of TinkerGraph was and the answer is that it should only be limited by memory.

again, how are you loading the data now?

Pei Daqi

unread,
Jul 25, 2018, 4:28:35 PM7/25/18
to Gremlin-users

Not sure if I understand - what do you mean by how I'm loading the data now?

Like I said, I've persisted the graph data using TinkerGraph and Gryo:

graph.io(IoCore.gryo()).writeGraph("my_graph.gryo")

And loading it by:

graph.io(IoCore.gryo()).readGraph("my_graph.gryo")

simply takes indefinitely. The heap memory usage stabilizes after a while, which I assume means the system is done loading the file, but somehow it's still frozen.

Robert Dale

unread,
Jul 25, 2018, 4:37:18 PM7/25/18
to gremli...@googlegroups.com
File size != mem size.

I created a sample graph with 4M vertex, 2M edges.  It took 4GB in mem and generated a 300MB kryo file.  However, it takes 6GB to load it back in.

If your heap has 'stabilized' maybe it's just hit max mem and now it's churning.  Print your GC stats to verify.

Robert Dale


Stephen Mallette

unread,
Jul 25, 2018, 4:47:35 PM7/25/18
to Gremlin-users
There's more than one way to load data into your graph so I needed to know exactly what you were doing. Anyway, you're using graph.io(). Know that with that approach you're using a GryoReader underneath and that's a simple singlethreaded loader that uses a vertex cache which for tinkergraph is a bit of waste because it's already holding vertices in memory. The cache doesn't release or evict vertices at any point so you gotta throw a lot of -Xmx at the thing to make it work and then you consider the points Robert Dale mentioned you can see why things might seize up.

For larger datasets and TinkerGraph, I'd prefer a custom loader (i.e. just a Gremlin script to run in the Gremlin Console). Unfortunately it's not really safe to do parallel writes to TinkerGraph as it isn't proven completely thread-safe for that, (though i think parallel reads are ok). 

I'm curious to see what happens when this merges:


as it opens up the io() as a first class citizen to the Gremlin language and perhaps we'll see graph providers get legit bulk loaders behind that step or at least make use of Hadoop Input/OutputFormat with CloneVertexProgram. For TinkerGraph, i'm not sure what we'll do.....maybe there could be a more TinkerGraph specific GryoReader that dropped the caching, transaction checking, etc. - that might be good.


Pei Daqi

unread,
Jul 25, 2018, 4:50:07 PM7/25/18
to Gremlin-users

It works after changing the IO backend from gyro to graphson.

But still, it surprises me how slow and memory inefficient TinkerGraph is. The generated JSON file is 800MB and memory consumption is about 5GB, 6 times of the plain text!
Loading is also extremely slow - much slower than my old C++ structs binary file.


Did anyone actually use this in a production environment? End of the day, I hope to use it in combination with Janus to index a 500GB triplets file. And it will not work with similar efficiency.

Stephen Mallette

unread,
Jul 25, 2018, 5:28:31 PM7/25/18
to Gremlin-users
But still, it surprises me how slow and memory inefficient TinkerGraph is

I miss the old days....

Did anyone actually use this in a production environment? 

TinkerGraph is used in production - you might look at the ShiftLeft fork of TinkerGraph as it trades major memory improvements for the structure of a schema:




Reply all
Reply to author
Forward
0 new messages