Titan 0.9 TP3 bulk loading fails to load Titan 0.4.4 GraphSON as it expects to find the vertex id specified via property key 'id' when it is actually specified via '_id'

94 views
Skip to first unread message

Edi Bice

unread,
Jun 18, 2015, 12:22:39 PM6/18/15
to gremli...@googlegroups.com
I'm trying to load into Titan 0.9M2 (which uses TinkerPop 3.0.0-M9-incubating) a Titan 0.4.4 graph exported to GraphSON via Faunus 0.4.4.

The first line of the file looks like:

{"location":"Bali - Indonesia","geo_enabled":true,"statuses_count":1332,"lang":"id","url":"http:\/\/www.facebook.com\/dinarsinarayu","utc_offset":25200,"time_zone":"Jakarta","protected":false,"favourites_count":3,"verified":false,"description":"simple girl! kpopers! mention for followback!","friends_count":133,"name":"Dinar Sinar Ayu","doc_id":"83095039","created_at":"Sat Oct 17 09:17:40 +0000 2009","screen_name":"fishynee","tid":83095039,"followers_count":175,"listed_count":1,"_id":294896116}

I copied and modified load-grateful-dead.properties to make load-twitter-prod.properties.

gremlin> graph = GraphFactory.open('conf/hadoop-graph/load-twitter-prod.properties')
gremlin> r = graph.compute(SparkGraphComputer).program(BulkLoaderVertexProgram.build().titan('conf/titan-cassandra-es.properties').create()).submit().get()

It seems the prescribed bulk loading method fails to load the 0.4.4 GraphSON as it expects to find the vertex id specified via property key 'id' when it is actually specified via '_id' as seen in the GraphSON snippet above.

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 22, sv-devtitan03.smartview.local): java.lang.NullPointerException
        at java.util.Objects.requireNonNull(Objects.java:203)
        at java.util.Optional.<init>(Optional.java:96)
        at java.util.Optional.of(Optional.java:108)
        at org.apache.tinkerpop.gremlin.structure.util.ElementHelper.getIdValue(ElementHelper.java:134)
        at org.apache.tinkerpop.gremlin.structure.util.star.StarGraph.addVertex(StarGraph.java:82)
        at org.apache.tinkerpop.gremlin.structure.util.star.StarGraphGraphSONSerializer.readStarGraphVertex(StarGraphGraphSONSerializer.java:210)
        at org.apache.tinkerpop.gremlin.structure.io.graphson.GraphSONReader.readVertex(GraphSONReader.java:167)
        at org.apache.tinkerpop.gremlin.hadoop.structure.io.graphson.GraphSONRecordReader.nextKeyValue(GraphSONRecordReader.java:63)
        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:145)

public Vertex readVertex(final InputStream inputStream,
final Function<Attachable<Vertex>, Vertex> vertexAttachMethod,
final Function<Attachable<Edge>, Edge> edgeAttachMethod,
final Direction attachEdgesOfThisDirection) throws IOException {
final Map<String, Object> vertexData = mapper.readValue(inputStream, mapTypeReference);
final StarGraph starGraph = StarGraphGraphSONSerializer.readStarGraphVertex(vertexData);

public static StarGraph readStarGraphVertex(final Map<String, Object> vertexData) throws IOException {
final StarGraph starGraph = StarGraph.open();
starGraph.addVertex(T.id, vertexData.get(GraphSONTokens.ID), T.label, vertexData.get(GraphSONTokens.LABEL));

public final class GraphSONTokens {

private GraphSONTokens() {}

public static final String CLASS = "@class";
public static final String ID = "id";



Daniel Kuppitz

unread,
Jul 23, 2015, 8:32:00 AM7/23/15
to Gremlin-users, edi....@gmail.com, edi....@gmail.com
Hi Edi,

as you already noticed, the GraphSON formats are not compatible. I guess the easiest way is to export your data using ScriptOutputFormat and import it using ScriptInputFormat. However, I can see the next problem coming: BulkLoaderVertexProgram - as it currently stands it doesn't seem to work properly.

Cheers,
Daniel
Reply all
Reply to author
Forward
0 new messages