How to load a big graph dataset?

37 views
Skip to first unread message

Hongjiang Zhang

unread,
Apr 25, 2022, 2:22:18 AM4/25/22
to Gremlin-users
I want to load a big graph (the text file is ~1GB) which can be downloaded from https://snap.stanford.edu/data/soc-LiveJournal1.html. I followed the loading section of the tutorial https://tinkerpop.apache.org/docs/3.6.0/tutorials/getting-started/. Just rename the filename to my soc-LiveJournal.txt.

Unfortunately, the gremlin java process keeps on full GC after the loading process running for a while.

I'm looking for help to efficiently load the big dataset (1GB~100GB). I also try to use HadoopGraph to load it, but found HadoopGraph does not support addVertex. Is there any suggestion?

graph = TinkerGraph.open() graph.createIndex('userId', Vertex.class)
g = traversal().withEmbedded(graph)
getOrCreate = {
 id -> g.V().has('user','userId', id).fold().coalesce(unfold(), addV('user').property('userId', id)).next()
new File('wiki-Vote.txt').eachLine {
  if (!it.startsWith("#")){ (fromVertex, toVertex) = it.split('\t').collect(getOrCreate) //
g.addE('votesFor').from(fromVertex).to(toVertex).iterate()
 }
 }

Stark Arya

unread,
May 1, 2022, 7:46:20 PM5/1/22
to Gremlin-users
What is your Memory size ?at the end of Loading Data part, you need pay attention to the following:
To load larger data sets you should read about the CloneVertexProgram, which provides a generalized method for loading graphs of virtually any size and consider the native bulk loading features of the underlying graph database that you’ve chosen.
Reply all
Reply to author
Forward
0 new messages