hi team,
I am using titan -1.0.0 to read the graph from cassandra using spark graph computer
My script is -
graph = GraphFactory.open('conf/hadoop-graph/read-cassandra.properties')
g = graph.traversal(computer(SparkGraphComputer))
g.V().has("name","root").both().count()
The vertex "root" has 200000 edges.
and I am getting the exception as below. How do I set the max length of thrift message? Do you have any reference?
==>hadoopgraph[cassandrainputformat->gryooutputformat]
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
==>1456989198485
[Stage 0:===============================> 23:14:37 ERROR org.apache.spark.executor.Executor - Exception in task 292.0 in stage 0.0 (TID 292)
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (18440949) larger than max length (15728640)!
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:402)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:408)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:331)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:177)
configuration file -
#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=com.thinkaurelius.titan.hadoop.formats.cassandra.CassandraInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.hadoop.inputLocationRequired=false
#
# Titan Cassandra InputFormat configuration
#
titanmr.ioformat.conf.storage.backend=cassandrathrift
titanmr.ioformat.conf.storage.hostname=10.25.152.154
titanmr.ioformat.conf.storage.port=9160
#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
#
# SparkGraphComputer Configuration
#
spark.master=local[4]
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.executor.memory=1g