Unknown compressor type with sparkGraphComputer

47 views
Skip to first unread message

Ajay Srivastava

unread,
Oct 4, 2017, 7:11:30 AM10/4/17
to JanusGraph users
Hi,

I am executing gremlin query using SparkGraphComputer and get following exception -

gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured dev-3/192.101.167.171:8182

gremlin> :> olapgraph.traversal().withComputer(org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer).V().count()

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: Unknown compressor type for id: 2
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer$CompressionType.getFromId(StringSerializer.java:273)
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:104)
at org.janusgraph.graphdb.database.serialize.attribute.StringSerializer.read(StringSerializer.java:38)
at org.janusgraph.graphdb.database.serialize.StandardSerializer.readObjectInternal(StandardSerializer.java:236)
at org.janusgraph.graphdb.database.serialize.StandardSerializer.readObject(StandardSerializer.java:224)
at org.janusgraph.graphdb.database.EdgeSerializer.readPropertyValue(EdgeSerializer.java:203)
at org.janusgraph.graphdb.database.EdgeSerializer.readPropertyValue(EdgeSerializer.java:193)
at org.janusgraph.graphdb.database.EdgeSerializer.parseRelation(EdgeSerializer.java:129)
at org.janusgraph.hadoop.formats.util.JanusGraphVertexDeserializer.readHadoopVertex(JanusGraphVertexDeserializer.java:100)
at org.janusgraph.hadoop.formats.util.GiraphRecordReader.nextKeyValue(GiraphRecordReader.java:60)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:168)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$4.advance(IteratorUtils.java:298)
at org.apache.tinkerpop.gremlin.util.iterator.IteratorUtils$4.hasNext(IteratorUtils.java:269)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Here is my olapgraph configuration -

# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=org.janusgraph.hadoop.formats.hbase.HBaseInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat

gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output

#
# JanusGraph HBase InputFormat configuration
#
janusgraphmr.ioformat.conf.storage.backend=hbase
janusgraphmr.ioformat.conf.storage.hostname=dev-1
janusgraphmr.ioformat.conf.storage.hbase.table=tryJanus

#
# SparkGraphComputer Configuration
#
spark.master=local[4]
spark.executor.memory=1g
spark.serializer=org.apache.spark.serializer.KryoSerializer

Is my configuration correct ?
Have I missed setting any property here ?

Regards,
Ajay

Ajay Srivastava

unread,
Oct 5, 2017, 7:17:07 AM10/5/17
to JanusGraph users
The compression in all the column families is set to gz, which is second entry (id = 1) in Enum of both HBase and janusgraph. So there should not be any problem in running this query.
This query and others work well in embedded mode and gremlin’s oltp graph. Is StringSerializer called only when SparkGraphComputer is used ?

I have also run major_compaction twice in HBase but the problem remains.

Is this a bug ?
Is someone already running SparkGraphComputer with HBase as backend without any problem ?


Regards,
Ajay

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/janusgraph-users/89F8CB46-D78F-4DB5-ABAF-48F576E98F60%40guavus.com.
For more options, visit https://groups.google.com/d/optout.

marc.de...@gmail.com

unread,
Oct 5, 2017, 7:51:58 AM10/5/17
to JanusGraph users
Hi Ajay,

Never tried this myself. Could you post the remote.yaml and gremlin-server.yaml files too (if they deviate from the distribution ones)? This way others will easier recognize serialization config problems.

Cheers,    Marc

Op woensdag 4 oktober 2017 13:11:30 UTC+2 schreef Ajay Srivastava:

Ajay Srivastava

unread,
Oct 5, 2017, 7:59:35 AM10/5/17
to marc.de...@gmail.com, JanusGraph users
Hi Marc,

gremlin-server.yaml —>

host: dev-3
port: 8182
scriptEvaluationTimeout: 30000
channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer
graphs: {
  graph: conf/gremlin-server/socket-janusgraph-hbase-server.properties,
  olapgraph: conf/hadoop-graph/read-hbase.properties }
plugins:
  - janusgraph.imports
scriptEngines: {
  gremlin-groovy: {
    imports: [java.lang.Math],
    staticImports: [java.lang.Math.PI],
    scripts: [scripts/empty-sample.groovy]}}
serializers:
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
processors:
  - { className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: { sessionTimeout: 28800000 }}
  - { className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: { cacheExpirationTime: 600000, cacheMaxSize: 1000 }}
metrics: {
  consoleReporter: {enabled: true, interval: 180000},
  csvReporter: {enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},
  jmxReporter: {enabled: true},
  slf4jReporter: {enabled: true, interval: 180000},
  gangliaReporter: {enabled: false, interval: 180000, addressingMode: MULTICAST},
  graphiteReporter: {enabled: false, interval: 180000}}
maxInitialLineLength: 4096
maxHeaderSize: 8192
maxChunkSize: 8192
maxContentLength: 65536
maxAccumulationBufferComponents: 1024
resultIterationBatchSize: 64
writeBufferLowWaterMark: 32768
writeBufferHighWaterMark: 65536
ssl: {
  enabled: false}

remote.yaml —>

hosts: [dev-3]
port: 8182
serializer: { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: { serializeResultToString: true }}


Regards,
Ajay

--
You received this message because you are subscribed to the Google Groups "JanusGraph users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to janusgraph-use...@googlegroups.com.

marc.de...@gmail.com

unread,
Oct 5, 2017, 9:37:42 AM10/5/17
to JanusGraph users
Hi Ajay,

I have no idea what is happening here, but since you use gremlin console, you could try the remote-object.yaml as an alternative.

Any one else?

Marc

Op donderdag 5 oktober 2017 13:59:35 UTC+2 schreef Ajay Srivastava:
Reply all
Reply to author
Forward
0 new messages