How to load a big graph dataset?

37 views

Skip to first unread message

Hongjiang Zhang

unread,

Apr 25, 2022, 2:22:18 AM4/25/22

to Gremlin-users

I want to load a big graph (the text file is ~1GB) which can be downloaded from https://snap.stanford.edu/data/soc-LiveJournal1.html. I followed the loading section of the tutorial https://tinkerpop.apache.org/docs/3.6.0/tutorials/getting-started/. Just rename the filename to my soc-LiveJournal.txt.

Unfortunately, the gremlin java process keeps on full GC after the loading process running for a while.

I'm looking for help to efficiently load the big dataset (1GB~100GB). I also try to use HadoopGraph to load it, but found HadoopGraph does not support addVertex. Is there any suggestion?

graph = TinkerGraph.open() graph.createIndex('userId', Vertex.class)

g = traversal().withEmbedded(graph)

getOrCreate = {

id -> g.V().has('user','userId', id).fold().coalesce(unfold(), addV('user').property('userId', id)).next()

}

new File('wiki-Vote.txt').eachLine {

if (!it.startsWith("#")){ (fromVertex, toVertex) = it.split('\t').collect(getOrCreate) //

g.addE('votesFor').from(fromVertex).to(toVertex).iterate()

}

Stark Arya

unread,

May 1, 2022, 7:46:20 PM5/1/22

to Gremlin-users

What is your Memory size ？at the end of Loading Data part， you need pay attention to the following：

To load larger data sets you should read about the CloneVertexProgram, which provides a generalized method for loading graphs of virtually any size and consider the native bulk loading features of the underlying graph database that you’ve chosen.

Reply all

Reply to author

Forward

0 new messages