Config to set cassandra thrift max length

240 views
Skip to first unread message

Abirami Senthil

unread,
Mar 3, 2016, 6:50:45 AM3/3/16
to Aurelius
hi team,


I am using titan -1.0.0 to read the graph from cassandra using spark graph computer

My script is -

graph = GraphFactory.open('conf/hadoop-graph/read-cassandra.properties')

g = graph.traversal(computer(SparkGraphComputer))
g.V().has("name","root").both().count()


The vertex "root" has 200000 edges.

and I am getting the exception as below. How do I set the max length of thrift message? Do you have any reference?


==>hadoopgraph[cassandrainputformat->gryooutputformat]
==>graphtraversalsource[hadoopgraph[cassandrainputformat->gryooutputformat], sparkgraphcomputer]
==>1456989198485
[Stage 0:===============================>      23:14:37 ERROR org.apache.spark.executor.Executor  - Exception in task 292.0 in stage 0.0 (TID 292)
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (18440949) larger than max length (15728640)!
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:402)
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:408)
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:331)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
        at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:177)


configuration file -

#
# Hadoop Graph Configuration
#
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph
gremlin.hadoop.graphInputFormat=com.thinkaurelius.titan.hadoop.formats.cassandra.CassandraInputFormat
gremlin.hadoop.graphOutputFormat=org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoOutputFormat
gremlin.hadoop.memoryOutputFormat=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat

gremlin.hadoop.deriveMemory=false
gremlin.hadoop.jarsInDistributedCache=true
gremlin.hadoop.inputLocation=none
gremlin.hadoop.outputLocation=output
gremlin.hadoop.inputLocationRequired=false

#
# Titan Cassandra InputFormat configuration
#
titanmr.ioformat.conf.storage.backend=cassandrathrift
titanmr.ioformat.conf.storage.hostname=10.25.152.154
titanmr.ioformat.conf.storage.port=9160


#
# Apache Cassandra InputFormat configuration
#
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

#
# SparkGraphComputer Configuration
#
spark.master=local[4]
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.executor.memory=1g


Matt Aldridge

unread,
Mar 3, 2016, 4:52:08 PM3/3/16
to Aurelius
I haven't tried it with Titan 1.0 yet, but with titan-hadoop and Titan 0.5 I set the cassandra.thrift.framed.size_mb property to increase the frame size.

Jean-Baptiste Musso

unread,
Mar 3, 2016, 6:24:12 PM3/3/16
to aureliu...@googlegroups.com
Abirami,

You might want to have a look at this thread and the answer by Jason Plurad:

https://groups.google.com/d/msg/aureliusgraphs/LEiO42jt9Ao/HPCy0eJC_a8J

This was for Titan v0.5.x but I suppose this still applies for v1.0.0.
Basically, when using Hadoop (OLAP), you need to tweak:

titan.hadoop.input.conf.storage.cassandra.thrift.frame-size=20
titan.hadoop.output.conf.storage.cassandra.thrift.frame-size=20

Notice how this differs from:

cassandra.thrift.framed.size_mb

Jean-Baptiste
> --
> You received this message because you are subscribed to the Google Groups
> "Aurelius" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to aureliusgraph...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/aureliusgraphs/d2090c88-75ac-4607-a657-feb29e10bbef%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Abirami Senthil

unread,
Mar 4, 2016, 4:24:39 AM3/4/16
to Aurelius
Thanks Jean-Baptiste.

The name was slightly different for titan 1.0.0 version. I used,

storage.cassandra.frame-size-mb = 200

and it worked. 

But, I couldn't find the thrift setting for the OLAP Hadoop graph. What needs to be appended for the hadoop graph to pick up the frame size settings?

John

unread,
Mar 10, 2016, 11:10:55 AM3/10/16
to Aurelius
Hello Abirami,

Did you ever get past the OLAP thrift frame size error? If so, can you please post the config options related to thrift, from the properties file?

Thanks.
Reply all
Reply to author
Forward
0 new messages