migrating 100M nodes/vertices graph from 0.4.4 to 0.9M2

113 views
Skip to first unread message

Edi Bice

unread,
Jun 18, 2015, 10:59:43 AM6/18/15
to aureliu...@googlegroups.com
I'm trying to migrate our current 100M vertices/edges graph from Titan 0.4.4 to Titan 0.9M2.

I used Faunus to export to GraphSON as that seemed the only choice.

Now it turns out that is not a choice either. Titan 0.9M2 uses TinkerPop3 and this changed it's GraphSON format so it is not compatible with TP2


My guess is the only two real options are:

1. sed replace _id with id, _outV with outV etc etc on a 80Gb file
2. use ScriptInputFormat and a custom script which can read TP2 GraphSON

I fear 1. will take forever and may still produce an invalid TP3 GraphSON. So I'm going with 2. but before I do does anyone already have such a script?

Daniel Kuppitz

unread,
Jun 18, 2015, 2:21:19 PM6/18/15
to aureliu...@googlegroups.com
1. sed replace _id with id, _outV with outV etc etc on a 80Gb file

sed is fast, I wouldn't expect it to take too long (I've done some preprocessing over a few GB.sized files in the past and it only took a few minutes (on a SSD)). However, I can't promise, that this is going to solve your problem.

2. use ScriptInputFormat and a custom script which can read TP2 GraphSON

Expect this to take much longer, since ScriptInputFormat is the slowest input format you can get. On the other hand it's far more reliable.

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/70a7d1f2-685e-400a-a4b7-40ebbb16b203%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Edi Bice

unread,
Jun 18, 2015, 4:06:33 PM6/18/15
to aureliu...@googlegroups.com
Thanks Daniel.

Started going down the sed path and am now realizing that the underscore is only part of the bigger problem. Apparently vertex and edge properties - other than _id - moved into the respective "properties" key. And other such changes. Certainly not a job for sed. 

So I'm thinking I'll put the script logic into a converter - sort of like the sed script I was trying to put together - and achieve both speed and reliability. Am looking to reuse code from LegacyGraphSONReader and GraphSONSerializers. 

Still it will be single core and one machine when it COULD take advantage of the the whole bulk framework and be deployed as a cluster job. Will see what those changes might look like.

Ted Wilmes

unread,
Jul 7, 2015, 3:57:53 PM7/7/15
to aureliu...@googlegroups.com
Hi Edi,
We're about to attempt to tackle this same sort of export/import.  I was wondering if you'd made any further progress or had any new lessons learned?

Thanks,
Ted

Edi Bice

unread,
Jul 9, 2015, 10:50:59 AM7/9/15
to aureliu...@googlegroups.com
Yes, both progress made and lessons learned. See more here

https://github.com/apache/incubator-tinkerpop/pull/85

Ted Wilmes

unread,
Jul 9, 2015, 12:49:52 PM7/9/15
to aureliu...@googlegroups.com
Great, thanks Edi.
Reply all
Reply to author
Forward
0 new messages