TinkerPop2 Data Migration - Invalid vertex provided: null Exception

211 views
Skip to first unread message

David Kleyla

unread,
Oct 29, 2015, 11:03:38 AM10/29/15
to Gremlin-users
I've been working on an upgrade from Titan 0.4.4 to 1.0 with Tinkerpop 3.0 and I've run into an exception while doing the Data Migration.

I've followed the instructions found here to export the 2.4 graph into GraphSON.
http://tinkerpop.incubator.apache.org/docs/3.0.0-incubating/#_tinkerpop2_data_migration

 A 7.2G file was generated successfully, but on attempting to import it to the new 3.0 graph using the following commands: 
g = TitanFactory.open('conf/titan-cassandra-es.properties')
r
= LegacyGraphSONReader.build().create()
r
.readGraph(new FileInputStream('/data/devGraph.json'), g)

data begins to be loaded into the new graph but the following Exception is thrown a few hours into the process.

 javax.script.ScriptException: java.io.IOException: java.lang.IllegalArgumentException: Invalid vertex provided: null


This is an export/import from a dev environment where I wouldn't be surprised if there was a data issue,  but I'm hoping someone could tell me what to look for in the GraphSON file to indicate a null vertex so that I can remove the bad data with sed before retrying the import.  I assume it would be more than just a pair of empty braces since it has identified the error as an Invalid Vertex but I'm not sure what I would be sed-ing for.  As I said it's a 7.2G file and the json appears to be formatted correctly.  The entire file was generated without a newline, is this an issue as well? 

Any help would be greatly appreciated. 

Thanks 

David Kleyla

unread,
Nov 2, 2015, 9:45:12 AM11/2/15
to Gremlin-users
Just following up of this question and hoping someone can help me here; my upgrade process is stalled until I can find this issue.  I don't have any empty curly braces in the GraphSON and I've removed all {"_id":[0-9]*,"_type":"vertex"} entries in the hopes that it would be considered a null Vertex - but the issue remains.  

David Kleyla

unread,
Nov 2, 2015, 1:48:51 PM11/2/15
to Gremlin-users
I've also removed all the null properties on edges from the file, inspired by this old issue - https://github.com/tinkerpop/blueprints/issues/400 in hopes that it was still affecting imports, but the same error is still thrown.  

Stephen Mallette

unread,
Nov 3, 2015, 2:55:12 PM11/3/15
to Gremlin-users
My first guess is that you have some bad edge data.  "bad" in the sense that either the incoming or outgoing vertex cannot be found while creating an edge.  The legacy loader first loads all vertices followed by all edges.  So if the edge loading portion can't find a vertex (which would show as "null") then you'd end up with an error during the load.  So, I think you would want to find edges that have an invalid vertex id somewhere.  Not 100% sure that's the answer, but perhaps something new you can investigate.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gremlin-users/7d0b8df9-52a7-4770-84e6-f670a18892a2%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

David Kleyla

unread,
Nov 4, 2015, 9:49:46 AM11/4/15
to Gremlin-users
Thanks for the reply!  I've split the GraphSON into smaller chunks and manually added opening and closing syntax to each file in order to isolate the error.  So now I've loaded all but one file from the export using the LegacyGraphSONReader however it looks like the process isn't loaded properly after trying to run some test queries. When running "vertex = g.V(1234).next()" in the gremlin console on a vertex id I know should have been imported I get 
org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException

The backing Cassandra nodes have data in them, but none of my queries return data.  My assumption would be the data conversion from 2.4 to 3 didn't occur properly if I can't even retrieve a vertex by its internal ID.  I'm am wondering if I should have explored this method https://groups.google.com/forum/#!searchin/aureliusgraphs/legacy-graphson-script-input.groovy/aureliusgraphs/YnrlOiVM27M/cwZoDzI3AgAJ instead using the "legacy-graphson-script-input.groovy" script for the upgrade but I've had issue with it as well (the Spark process ran overnight without importing any data into the new DB).  Should that method execute the same mutations as the LegacyGraphSONReader?

Stephen Mallette

unread,
Nov 4, 2015, 9:57:31 AM11/4/15
to Gremlin-users
How do you know a particular id should be in there?  Titan doesn't respect ids - new ones would be generated on the load.  So if you expected "1234" to be present because that was the id in your old Titan graph it is likely a new id in the Titan 1.0 graph.  Better to look up by index to validate if your data is there or not.  Or perhaps you did that already and still have the problem?  

David Kleyla

unread,
Nov 4, 2015, 10:22:59 AM11/4/15
to Gremlin-users
Ahh ok I should have known those ID's would be regenerated.  Unfortunately index queries are failing as well.  Maybe I've defined them improperly? I have a composite index on our entity Id's created when I defined the Schema before any data was loaded.

TitanManagement m = graph.openManagement();
...
PropertyKey yappId = m.makePropertyKey(YappVertex.SY_YAPP_ID).dataType(String.class).cardinality(Cardinality.SINGLE).make();
VertexLabel yappVertex = m.makeVertexLabel(YappVertex.VERTEX_TYPE).partition().make();
m
.buildIndex("yappIdIndex", Vertex.class).addKey(yappId).indexOnly(yappVertex).unique().buildCompositeIndex();
...
m
.commit();


and running this query should return a yapp the was import from one of my split graphSON files -
 
tr = g.traversal();
v
= tr.V().has('sy_yapp_id', '14134705042663177').next()

Unfortunately this is thrown instead.

Could not find a suitable index to answer graph query and graph scans are disabled: [(sy_yapp_id = 14134705042663177)]:VERTEX

I verified that the index exists by doing the following - 

TitanManagement m = g.openManagement();
TitanGraphIndex tgi = m.getGraphIndex("yappIdIndex");
PropertyKey yappId = m.getPropertyKey(YappVertex.SY_YAPP_ID);
logger
.info("getBackingIndex: " + tgi.getBackingIndex());
logger
.info("getFieldKeys: " + tgi.getFieldKeys());
logger
.info("getIndexedElement: " + tgi.getIndexedElement());
logger
.info("getIndexStatus:" + tgi.getIndexStatus(yappId));

With the response being - 

[2015-11-04 10:19:10,694] [main] [INFO ] [Main] - getBackingIndex: internalindex
[2015-11-04 10:19:10,903] [main] [INFO ] [Main] - getFieldKeys: [Lcom.thinkaurelius.titan.core.PropertyKey;@2e179f3e
[2015-11-04 10:19:10,904] [main] [INFO ] [Main] - getIndexedElement: interface com.thinkaurelius.titan.core.TitanVertex
[2015-11-04 10:19:10,904] [main] [INFO ] [Main] - getIndexStatus:ENABLED

So it seems as though everything should work properly... but it doesn't.  Did I define something improperly?

Daniel Kuppitz

unread,
Nov 4, 2015, 10:49:14 AM11/4/15
to gremli...@googlegroups.com
Since you specified .indexOnly(yappVertex),

tr.V().has('sy_yapp_id', '14134705042663177')

is not a valid query to use the index. It should be:

tr.V().has(YappVertex.VERTEX_TYPE, 'sy_yapp_id', '14134705042663177')

Cheers,
Daniel


David Kleyla

unread,
Nov 4, 2015, 11:07:02 AM11/4/15
to Gremlin-users
This also returns

org.apache.tinkerpop.gremlin.process.traversal.util.FastNoSuchElementException

Daniel Kuppitz

unread,
Nov 4, 2015, 11:10:59 AM11/4/15
to gremli...@googlegroups.com
Yea, obviously a vertex with this ID property has never been created. I just wanted to point out that your query won't use the index.

Cheers,
Daniel


Stephen Mallette

unread,
Nov 4, 2015, 11:12:29 AM11/4/15
to Gremlin-users
is there anything in that graph at all after you load?  what does g.V().hasNext() give you?

David Kleyla

unread,
Nov 4, 2015, 11:19:16 AM11/4/15
to Gremlin-users
It appears empty even though Cassandra has roughly 4g of data across all nodes.

gremlin> g.V().hasNext()
No signature of method: com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.V() is applicable for argument types: () values: []
Possible solutions: tx(), io(org.apache.tinkerpop.gremlin.structure.io.Io$Builder), is(java.lang.Object), any(), any(groovy.lang.Closure), use([Ljava.lang.Object;)
Display stack trace? [yN]
y
groovy
.lang.MissingMethodException: No signature of method: com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.V() is applicable for argument types: () values: []
Possible solutions: tx(), io(org.apache.tinkerpop.gremlin.structure.io.Io$Builder), is(java.lang.Object), any(), any(groovy.lang.Closure), use([Ljava.lang.Object;)
 at org
.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:56)
 at org
.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:46)
 at org
.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
 at org
.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
 at org
.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:114)
 at groovysh_evaluate
.run(groovysh_evaluate:3)
 at org
.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
 at org
.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:69)
 at org
.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:185)
 at org
.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:119)
 at org
.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:94)
 at org
.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
 at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java
.lang.reflect.Method.invoke(Method.java:497)
 at org
.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
 at groovy
.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
 at groovy
.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207)
 at org
.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130)
 at org
.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150)
 at org
.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:123)
 at org
.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:58)
 at org
.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
 at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java
.lang.reflect.Method.invoke(Method.java:497)
 at org
.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
 at groovy
.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
 at groovy
.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207)
 at org
.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130)
 at org
.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150)
 at org
.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:82)
 at org
.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
 at org
.apache.tinkerpop.gremlin.console.Console.<init>(Console.groovy:144)
 at org
.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
 at org
.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:303)

Stephen Mallette

unread,
Nov 4, 2015, 11:24:10 AM11/4/15
to Gremlin-users
that error doesn't imply "empty".  your syntax is bad.  "g" in your case is a "Graph" instance.  you need a TraversalSource to issue a traversal over V.

g.traversal().V().hasNext()

of course, we don't recommend that style so your Graph instance is better instanatiated as:

graph = TitanFactory.open(...)
g = graph.traversal()
g.V().hasNext()

What does that return?



David Kleyla

unread,
Nov 4, 2015, 11:33:04 AM11/4/15
to Gremlin-users
My mistake, 

Could not find a suitable index to answer graph query and graph scans are disabled: [()]:VERTEX

I tried adding the "query.force-index=true" property to my titan-cassandra-es.properties file but it still returned the above.
...

David Kleyla

unread,
Nov 4, 2015, 11:36:30 AM11/4/15
to Gremlin-users
Ignore that. I meant to try false obviously.  The result was true


...

Stephen Mallette

unread,
Nov 4, 2015, 11:47:44 AM11/4/15
to Gremlin-users
If the result was "true" you have some data in your graph so that's a decent sign.  You seem to have an indexing problem somewhere as an additional problem to loading all of your data.  I think you should sort out the indexing issues first on a much smaller dataset as the first step.  If your schema is messed up from the outset that's not going to give you a good start and you won't be able to easily validate your data load anyway.

--
You received this message because you are subscribed to the Google Groups "Gremlin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gremlin-user...@googlegroups.com.

David Kleyla

unread,
Nov 5, 2015, 9:57:23 AM11/5/15
to Gremlin-users
Ok I'll circle back to the schema/indexing creation.  I think this may be a problem - Titan 0.4.4 didn't support labels on Vertices so I included a property, 'vertex_type', on all vertices.  Would my method of defining the index with ".indexOnly(yappVertex)" make these Vertices unretrievable after the data import?  If that's the case, I assume removing that constraint would return vertices of other, undesired labels with that property and I'd have to add another filter on the query eg: = tr.V().has('vertex_type', 'yapp').has('sy_yapp_id', '14134705042663177').next()?  Which is a shame going forward with new data and the vertex labels because I won't be able to utilize the .hasLabel() filter that I'd assume would be more performant than the chained "has" statements.  Is this the case?  if I wrap them in a "match" block would that improve performance.  These are all assumptions at this point.

Thanks 

...

David Kleyla

unread,
Nov 16, 2015, 9:52:03 AM11/16/15
to Gremlin-users
In case anyone else encounters this error on import, it was because a Vertex listed on an imported edge did not exist in the GraphSON.  The Vertex Id on the edge would still be listed as normal ie -- "_outV":34142380,"_inV":34145256," -- but the Vertex itself is not in the Vertex collection.  This makes cleaning the GraphSON impossible by search and replace alone without iterating through every Edge and verifying the existence of each adjacent Vertex.  

I ended up decompiling the LegacyGraphSONReader and adding some logging and wrapping the Edge creation logic in a conditional for null adjacent Vertices.  If the Vertex doesn't exist in the list, I wouldn't want the edges created anyway.  Obviously this wouldn't be an issue if those Edges were deleted when the Vertices were during normal use, but this is a dev environment and sometimes things don't function as they should ;) 
...
Reply all
Reply to author
Forward
0 new messages