Hi There,
My issues are similar to most but not sure if they are the same root cause. The issue is:
write timeouts using LOCAL_QUORUM.
Running spark 1.3.0, with Cassandra 2.1.2. We have included the latest maven release (1.3.0-m1) into my jar and spark-submit the job.
Spark and Cassandra are co-located on each machine, and I have a total of 8 machines in Ec2. We are talking about 8 machines with 30GB of ram, although my Cassandra JVMs max heap is only 8GB.
bin/spark-submit --conf spark.cassandra.output.throughput_mb_per_sec=5 --conf spark.cassandra.connection.timeout_ms=20000 --conf spark.cassandra.read.timeout_ms=20000 --class MyClass --deploy-mode cluster --executor-memory 4G --driver-memory 4G --master spark://
spark-datacrunch-ingest-master01.prod.vungle.com:7077 s3n://key:secret@s3_bucket/MY.jar
The Driver and Application start fine, and I can see in the spark logs that things are connecting and everything looks good.
That's when spark lies to me! Here's the stack trace.
15/06/10 00:07:01 ERROR writer.QueryExecutor: Failed to execute: com.datastax.spark.connector.writer.RichBoundStatement@78d67619
com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica were required but only 1 acknowledged the write)
at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
at com.datastax.driver.core.Responses$Error.asException(Responses.java:99)
at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:140)
at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:293)
at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:467)
at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:734)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream(IdleStateAwareChannelUpstreamHandler.java:36)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.handler.timeout.IdleStateHandler.messageReceived(IdleStateHandler.java:294)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency LOCAL_QUORUM (2 replica were required but only 1 acknowledged the write)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:58)
at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:38)
at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:168)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
... 21 more
In OpsCenter, I can see that my write requests are about 7/sec, and my Write request latency is about 1700ms/op. I have even increased my
write_request_timeout_in_ms
in the cassandra.yaml. I've tried ridiculous timeouts and all it does is delay the inevitable error message.
Memory and CPU are relatively low. Please..anyone..have a heart and throw me a bone!
Thanks all,
Tony