bulk loading into titan from graphson

677 views
Skip to first unread message

Jonathan Haddad

unread,
Jan 25, 2013, 1:59:49 PM1/25/13
to aureliu...@googlegroups.com, Eric Scrivner, Blake Eggleston, Mike Khristo
I'm trying to export a graph from Neo4j into titan using GraphSON.  The graphson file is about 500MB, with 300K nodes and 3 million edges.  I still have yet to see the process complete - it's taken over an hour so far.

Any tips for improving load time?  Is everything done in the context of a single transaction?

Does loading graphson drop the existing graph?

If it doesn't drop the graph, would it help to split the json file into multiple files?

Jon

Matthias Broecheler

unread,
Jan 25, 2013, 2:19:24 PM1/25/13
to aureliu...@googlegroups.com, Eric Scrivner, Blake Eggleston, Mike Khristo
Hey Jonathan,

if you are loading from a single machine then the standard batch loading settings should be enabled.
Are you loading all of this through one transaction? Do you have enough memory for that?

What do you mean by "drop the graph"?

Cheers,
Matthias

--
 
 



--
Matthias Broecheler
http://www.matthiasb.com

Marko A. Rodriguez

unread,
Jan 25, 2013, 2:25:31 PM1/25/13
to aureliu...@googlegroups.com, Eric Scrivner, Blake Eggleston, Mike Khristo
Hey,

Here are a few things to note:

1. Going to GraphSON is not so efficient. Best to go to SequenceFile.
2. Faunus+Neo4j is going to be slower than Titan as Neo4j is a single machine.
- but not as slow as you have it there.

Given the size of your data (small), just do in Gremlin:

g = Neo4jGraph(..)
h = TitanGraph(..)
g.V.sideEffect{
copyDataTo(h);
}
Where you implement copyDataTo() as needed.

Stephen will have more comments on Faunus/Rexster as he wrote that code.

HTH,
Marko.
--
 
 

Stephen Mallette

unread,
Jan 25, 2013, 2:26:16 PM1/25/13
to aureliu...@googlegroups.com, Eric Scrivner, Blake Eggleston, Mike Khristo
For what you are trying to accomplish it sounds like simply doing a
Gremlin script for the conversion would be best. Obviously you will
want to tweak to batch the transactions (maybe wrap in BatchGraph or
something), but:

gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> target = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> g.V("name","marko")
==>v[1]
gremlin> target.createKeyIndex("name", Vertex.class)
==>null
gremlin> g.V.sideEffect{ElementHelper.copyProperties(it, target.addVertex())}
==>v[3]
==>v[2]
==>v[1]
==>v[6]
==>v[5]
==>v[4]
gremlin> g.E.sideEffect{ElementHelper.copyProperties(it,
target.addEdge(target.V('name',it.outV.name.next()).next(),
target.V('name', it.inV.name.next()).next(), it.label))}
==>e[10][4-created->5]
==>e[7][1-knows->2]
==>e[9][1-created->3]
==>e[8][1-knows->4]
==>e[11][4-created->3]
==>e[12][6-created->3]

Stephen
> --
>
>

Jonathan Haddad

unread,
Jan 25, 2013, 2:40:05 PM1/25/13
to aureliu...@googlegroups.com, Eric Scrivner, Blake Eggleston, Mike Khristo
Here i'm just loading up the gremlin shell and trying to load in the graph I dumped from Neo4j.
 
root@precise64:/usr/local/titan# bin/gremlin.sh 
gremlin> g = TitanFactory.open('titan.properties')
13/01/25 18:45:59 INFO impl.ConnectionPoolMBeanManager: Registering mbean: com.netflix.MonitoredResources:type=ASTYANAX,name=TitanConnectionPool,ServiceType=connectionpool
==>titangraph[cassandra:127.0.0.1]
loadGraphML(    loadGraphSON(
gremlin> g.loadGraphSON('/vagrant/geplatform/graph2.json')

Jonathan Haddad

unread,
Jan 25, 2013, 2:51:45 PM1/25/13
to aureliu...@googlegroups.com, Eric Scrivner, Blake Eggleston, Mike Khristo
OK - I'll try this in my dev environment and see how it goes.  Thank you!

Jonathan Haddad

unread,
Jan 25, 2013, 5:00:38 PM1/25/13
to aureliu...@googlegroups.com, Eric Scrivner, Blake Eggleston, Mike Khristo
I didn't realized the gremlin shell was using minimal RAM.  After looking through the gremlin.sh, I realized if I just do

export JAVA_OPTIONS="-Xms1024m -Xmx4096m"

It managed to load my 500MB json dump in about 5 minutes.  Much better!

Jon


On Friday, January 25, 2013 11:19:24 AM UTC-8, Matthias wrote:

Abhilash Sharma

unread,
Apr 14, 2017, 7:36:44 AM4/14/17
to Aurelius, Er...@grapheffect.com, bdegg...@gmail.com, mi...@grapheffect.com, jonatha...@gmail.com
I just migrated to Titan 1.0.0 and as far as i know it doesn't have g.loadGraphSON...so how can i load a graphson in Titan 1.0.0.????

I have tried g.io(IoCore.graphson()).readGraph() method but it is giving a null pointer exception when trying to load GraphOfGods.json
Reply all
Reply to author
Forward
0 new messages