Hi,
I've started trying to do OLAP on our production graph cluster and am running into a problem with really simple queries (like g.V(1)). The problem always looks like this (the id is always the same):
(0 + 80) / 3957]org.apache.spark.SparkException: Job aborted due to stage failure: Task 50 in stage 0.0 failed 4 times, most recent failure: Lost task 50.3 in stage 0.0 (TID 229, 4.19.20.147): java.lang.IllegalStateException: Could not find type for id: 32269
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)
at com.thinkaurelius.titan.hadoop.formats.util.input.current.TitanHadoopSetupImpl.getTypeInspector(TitanHadoopSetupImpl.java:50)
at com.thinkaurelius.titan.hadoop.formats.util.TitanVertexDeserializer.<init>(TitanVertexDeserializer.java:39)
at com.thinkaurelius.titan.hadoop.formats.util.GiraphInputFormat.lambda$setConf$0(GiraphInputFormat.java:48)
at com.thinkaurelius.titan.hadoop.formats.util.GiraphInputFormat$RefCountedCloseable.acquire(GiraphInputFormat.java:70)
at com.thinkaurelius.titan.hadoop.formats.util.GiraphRecordReader.<init>(GiraphRecordReader.java:33)
at com.thinkaurelius.titan.hadoop.formats.util.GiraphInputFormat.createRecordReader(GiraphInputFormat.java:38)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
. . ..
I opened up and OLTP shell and looked around:
gremlin> g.V(32269)
Could not find type for id: 32269
Display stack trace? [yN] y
java.lang.IllegalStateException: Could not find type for id: 32269
at com.google.common.base.Preconditions.checkState(Preconditions.java:197)
at com.thinkaurelius.titan.graphdb.types.vertices.TitanSchemaVertex.toString(TitanSchemaVertex.java:152)
at
. . .
gremlin> g.V(32269).valueMap()
==>[:]
gremlin> g.V(32269).label()
==>vertex
Well, this vertex didn't look like something that is helping me and after googling related errors it looks like data corruption (and, in fact data corruption problems is why I want to do OLAP). I decided to try removing the vertex. Now I have removed it:
gremlin> g.V(32269).drop()
gremlin> g.tx().commit()
==>null
gremlin> g.V(32269)
gremlin>
But alas the graph computer still encounters the same error. After reading more titan source code it looks like this is a schema element so deleting it was more dangerous than I had thought it would be at the time though I haven't seen any collateral damage. Perhaps I removed something else, schema and data are perhaps not in the same space.
Now I am wondering how to proceed. One idea is to swallow the exception so that I can succeed at reading in the pieces of the schema that work:
diff --git a/titan-hadoop-parent/titan-hadoop-core/src/main/java/com/thinkaurelius/titan/hadoop/formats/util/input/current/TitanHadoopSetupImpl.jav
index 05dda3e..2087f65 100644
--- a/titan-hadoop-parent/titan-hadoop-core/src/main/java/com/thinkaurelius/titan/hadoop/formats/util/input/current/TitanHadoopSetupImpl.java
+++ b/titan-hadoop-parent/titan-hadoop-core/src/main/java/com/thinkaurelius/titan/hadoop/formats/util/input/current/TitanHadoopSetupImpl.java
@@ -47,8 +47,12 @@ public class TitanHadoopSetupImpl extends TitanHadoopSetupCommon {
assert k instanceof TitanSchemaVertex;
TitanSchemaVertex s = (TitanSchemaVertex)k;
if (sc.hasName()) {
- Preconditions.checkNotNull(name);
+ try {
+ Preconditions.checkNotNull(name);
+ } catch (Exception ex) {
+ // ??
+ }
}
TypeDefinitionMap dm = s.getDefinition();
Preconditions.checkNotNull(dm);
but . . I'm not sure how safe this would be.
Thoughts?
Thanks for any ideas,
David