cassandrathrift frame-size configuration not taking effect

812 views
Skip to first unread message

Chao Wang

unread,
Oct 31, 2014, 1:31:41 PM10/31/14
to aureliu...@googlegroups.com
Added storage.cassandra.thrift.frame-size=100 to titan-cassandra-es.properties
Then use the following to do reindexing
TitanIndexRepair.cassandraRepair("conf/titan-cassandra-es.properties", "byDocumentId", "", "org.apache.cassandra.dht.Murmur3Partitioner")

Got the exception below:
13:18:37 WARN  org.apache.hadoop.mapred.LocalJobRunner  - job_local1757555819_0001
java.lang.Exception: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (19784570) larger than max length (15728640)!
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (19784570) larger than max length (15728640)!


Tried on both 0.5.0 and 0.5.1, same exception. Anything wrong?

Chao Wang

unread,
Oct 31, 2014, 5:06:26 PM10/31/14
to aureliu...@googlegroups.com
changed cassandra server configuration as well for the max frame size, but still getting the same exception

Dan LaRocque

unread,
Oct 31, 2014, 9:03:48 PM10/31/14
to aureliu...@googlegroups.com
Hi Chao,

Please share your titan-cassandra-es.properties (as used for the repair) and the exception stacktrace.

thanks,
Dan
--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/4dc3818b-7e55-4a05-b5d0-49c7deb637bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean-Baptiste Musso

unread,
Mar 11, 2015, 7:50:41 AM3/11/15
to aureliu...@googlegroups.com
Greetings,

Using Titan v0.5.3-hadoop2, we're having the same issue when starting a reindexing process using the Cassandra helper.

Here's the .properties file:

storage.backend=cassandrathrift
storage.hostname=127.0.0.1
storage.cassandra.keyspace=dcbraindev
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.25

# Automatically start an ES node in Titan's JVM...
#index.search.backend=elasticsearch
#index.search.directory=../db/es
#index.search.elasticsearch.client-only=false
#index.search.elasticsearch.local-mode=true

## ... or connect to an already-running ES process on localhost
index.search.backend=elasticsearch
index.search.hostname=127.0.0.1
index.search.elasticsearch.client-only = true
schema.default=blueprints
storage.cassandra.thrift.frame-size=512

And here's the relevant part of the log, including the stack trace: https://gist.github.com/jbmusso/fdfd51f459c7053ba738

java.lang.Exception: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (509336132) larger than max length (15728640)!
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (509336132) larger than max length (15728640)!

I would happily give more information if required.

Thanks,

Jean-Baptiste

Jean-Baptiste Musso

unread,
Mar 11, 2015, 9:33:11 AM3/11/15
to aureliu...@googlegroups.com
On Wed, Mar 11, 2015 at 12:50 PM, Jean-Baptiste Musso <jbm...@gmail.com> wrote:
>
> Using Titan v0.5.3-hadoop2, we're having the same issue when starting a reindexing process using the Cassandra helper.

Just checked - we're also experiencing this issue with v0.5.4-hadoop2.

Jean-Baptiste

David

unread,
Mar 13, 2015, 12:11:05 PM3/13/15
to aureliu...@googlegroups.com
For what it is wroth, I am receiving the same error performing a g.V().count() using titan-0.5.4-hadoop2, hadoop 2.2.0.0-2041, on a graph of 10 million vertices.
Any luck in figuring this out ?


gremlin> g=HadoopFactory.open('./my.properties')
==>titangraph[hadoop:titancassandrainputformat->noopoutputformat]
gremlin> g.V().count()
.......

java.lang.Exception: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (21415423) larger than max length (15728640)!
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (21415423) larger than max length (15728640)!
    at org.apache.hadoop.mapreduce.lib.chain.Chain.joinAllThreads(Chain.java:526)
    at org.apache.hadoop.mapreduce.lib.chain.ChainMapper.run(ChainMapper.java:169)
......


The renaming of the properties from faunus to titan-hadoop is not completely documented as far as I can tell except
by a post from Dan here: https://groups.google.com/forum/#!topic/aureliusgraphs/4ya9Dwa_Tkc

(some of the example properties in the conf/hadoop directory for 0.5.4 show old properties:
#cassandra.thrift.framed.size_mb=49 (commented out)  but I think that has changed to storage.cassandra.thrift.frame-size if I managed
to weave my way back through all the namespaces in CassandraThriftStoreManager.java, AbstractCassandraStoreManager.java
and GraphDatabaseConfiguration.java and it isn't clear what all the settings for the .output.formats can be now , etc.)


# input graph parameters
titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraInputFormat
titan.hadoop.input.conf.storage.backend=cassandrathrift
titan.hadoop.input.conf.storage.hostname=x.x.x.x
titan.hadoop.input.conf.storage.port=9160
titan.hadoop.input.conf.storage.cassandra.keyspace=titan

storage.cassandra.thrift.frame-size=100
storage.cassandra.thrift.max_message_size_mb=100
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

# output graph parameters
titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.noop.NoOpOutputFormat
titan.hadoop.jobdir.overwrite=true
titan.hadoop.output.infer-schema=true
# controls size of transaction
mapred.max.split.size=5242880
# mapred.reduce.tasks=10
mapred.job.reuse.jvm.num.tasks=-1
titan.hadoop.sideeffect.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat

Stephen Mallette

unread,
Mar 13, 2015, 12:25:03 PM3/13/15
to aureliu...@googlegroups.com
It should help if you increase:

storage.cassandra.thrift.frame-size

well beyond the size of the error message.  so for: "Frame size (21415423) larger than max length (15728640)!", bump that setting to something like: 64 (size in MB).

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

David

unread,
Mar 14, 2015, 10:37:50 AM3/14/15
to aureliu...@googlegroups.com
Thanks Stephen.

The sizes I posted previously were definitely too small, but setting:

storage.cassandra.thrift.frame-size=2000
storage.cassandra.thrift.max_message_size_mb=2048

or to any size hasn't solved the problem for me.  Hopefully,
I spelled the properties correctly.

Still not sure what the problem is.

The "larger than max length (15728640)" error continues to come out as if the default 15mb
setting is still in effect.

09:28:15 INFO  org.apache.hadoop.mapreduce.Job  - Task Id : attempt_1426281278753_0013_m_000116_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (18426391) larger than max length (15728640)!

Jason Plurad

unread,
Mar 15, 2015, 9:59:20 PM3/15/15
to aureliu...@googlegroups.com
Just figured it out... you need to prefix the property so it gets passed along.


So if you're using TitanFactory to do OLTP with Titan/Cassandra, use:

# The thrift frame size in mega bytes
storage.cassandra.thrift.frame-size=20


But if you're using HadoopFactory to do an OLAP job with Titan-Hadoop, use:


# input graph parameters
titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraInputFormat
# The thrift frame size in mega bytes
titan.hadoop.input.conf.storage.cassandra.thrift.frame-size=20

# output data (graph or statistic) parameters
titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraOutputFormat
# The thrift frame size in mega bytes
titan.hadoop.output.conf.storage.cassandra.thrift.frame-size=20


I haven't tried to do reindexing, but maybe the answer there is along the same lines.


Hope this helps,
Jason



For more options, visit https://groups.google.com/d/optout.



--
Have a good one,
Jason

David

unread,
Mar 16, 2015, 5:48:06 PM3/16/15
to aureliu...@googlegroups.com
Here is a config file that finally worked for me for a simple count of Edges using Titan 0.5.4 rebuilt for Hadoop 2.6.


# input graph parameters
titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraInputFormat
titan.hadoop.input.conf.storage.backend=cassandrathrift
titan.hadoop.input.conf.storage.hostname=<ip.address.here>
titan.hadoop.input.conf.storage.port=9160
titan.hadoop.input.conf.storage.cassandra.keyspace=titan
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner


# this seems to get picked up... see org.apache.cassandra.hadoop.ConfigHelper
cassandra.thrift.framed.size_mb=30


# output data (graph or statistic) parameters
titan.hadoop.sideeffect.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.graphson.GraphSONOutputFormat



Prior to starting gremlin shell (running from gremlin and not a groovy script at the moment), source the environment:

# hdp version
export HDP_VERSION="2.2.0.0-2041"
# location of hadoop home and conf
export HADOOP_PREFIX="/usr/hdp/$HDP_VERSION/hadoop"
# include the native libs and the hdp.version
export TITAN_JAVA_OPTS="-Djava.library.path=/usr/hdp/$HDP_VERSION/hadoop/lib/native -Dhdp.version=$HDP_VERSION"

./gremlin.sh
g=HadoopFactory.open('./configfile.properties')
g.E().count()

Tsao Ranger

unread,
Mar 17, 2015, 5:37:18 AM3/17/15
to aureliu...@googlegroups.com
some times because of thriftlib version.just check your thriftlib version.

在 2014年11月1日星期六 UTC+8上午1:31:41,Chao Wang写道:

Daniel Mau

unread,
Jun 5, 2015, 8:45:05 AM6/5/15
to aureliu...@googlegroups.com
Hi all-

Please help!
We are also in the process of reindexing a few properties, and have faced the same issue with frame size.

We are using:
hadoop-1.2.1
We have tried using both titan-0.5.4-hadoop1 and titan-0.5.1-hadoop1

Our config file (titan-cassandra-es.properties) has these relevant properties:
storage.backend=cassandrathrift

//We have all three versions below to try to cover our bases
storage.cassandra.thrift.frame_size_mb=400

storage.cassandra.thrift.frame-size=400

cassandra.thrift.framed.size_mb=400
cassandra.thrift.message.max_size_mb=401



We are still getting the same exception as Chao above:

java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: Frame size (20769719) larger than max length (15728640)!
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:400)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:406)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:329)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:102)
	at com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraRecordReader.getProgress(TitanCassandraRecordReader.java:78)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:513)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:538)
	at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.thrift.transport.TTransportException: Frame size (20769719) larger than max length (15728640)!
	at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:137)
	at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
	at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
	at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
	at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
	at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
	at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:802)
	at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:786)
	at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:362)
	... 17 more


Please help! Thanks in advance,
-Daniel

Daniel Mau

unread,
Jun 5, 2015, 8:51:16 AM6/5/15
to aureliu...@googlegroups.com
Sorry forgot to mention we're using Cassandra 2.0.9
-Daniel

Daniel Kuppitz

unread,
Jun 5, 2015, 9:23:01 AM6/5/15
to aureliu...@googlegroups.com
Hi Daniel,

the error message still show the default thrift frame size (15 MB), thus your configuration must be wrong. I guess you've ignored Jason's answer:

So if you're using TitanFactory to do OLTP with Titan/Cassandra, use:
# The thrift frame size in mega bytes
storage.cassandra.thrift.frame-size=20


But if you're using HadoopFactory to do an OLAP job with Titan-Hadoop, use:

# input graph parameters
titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraInputFormat
# The thrift frame size in mega bytes
titan.hadoop.input.conf.storage.cassandra.thrift.frame-size=20
# output data (graph or statistic) parameters
titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraOutputFormat
# The thrift frame size in mega bytes
titan.hadoop.output.conf.storage.cassandra.thrift.frame-size=20

Cheers,
Daniel


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

Daniel Mau

unread,
Jun 5, 2015, 9:27:17 AM6/5/15
to aureliu...@googlegroups.com
Thanks Daniel, I'll try that.  Since the reindexing process uses TitanFactory rather than HadoopFactory, I overlooked those options.

-Daniel

Daniel Mau

unread,
Jun 5, 2015, 3:56:21 PM6/5/15
to aureliu...@googlegroups.com
Hi Daniel-

No luck... We added those in as well, but still get the same default frame size...  
We even looked at the source code and it seems to expect "storage.cassandra.thrift.frame-size" but somehow it just doesn't seem to respect our parameters.

Did Chao ever get his issue solved?

-Daniel

Dan LaRocque

unread,
Jun 5, 2015, 6:08:47 PM6/5/15
to aureliu...@googlegroups.com
Hi,
 
This is a reindexing-specific configuration limitation.  The `TitanIndexRepair.{cassandra,hbase}Repair` methods take a Titan configuration file.  They don't take a Hadoop Configuration file (though they honor site defaults).  There's no way, as the API stands, to customize arbitrary Hadoop Configuration settings.  That includes cassandra.thrift.framed.size_mb (which affects the ColumnFamilyRecordReader code in the trace).  If you put cassandra.thrift.framed.size_mb into your Titan configuration file, `TitanIndexRepair.cassandraRepair` will effectively end up treating it as an unrecognized Titan option that has no effect.  CFRR will never see it.
 
 
As mentioned in the issue, there are workarounds for this problem that involve constructing your own Hadoop Configuration object when running reindex jobs, though they aren't pretty, and they require reading TitanIndexRepair to figure out which config keys are needed to support a MapReduce reindex job.
 
thanks,
Dan

Daniel Mau

unread,
Jun 10, 2015, 7:45:09 AM6/10/15
to aureliu...@googlegroups.com
Thanks so much Dan, this helped us move on to different approaches rather than remain stuck.
-Daniel
Reply all
Reply to author
Forward
0 new messages