We are iterating over a very large table. When we run the job in the spark shell it completes within about 10 minutes. When do the same using the notebook several tasks are stalled (marked as running but don't finish within 2 hours).
Our setup is the as follows:
20 nodes running dse cassandra 2.0. We also have mesos 0.22 and apache spark 1.2.1. To connect to cassandra we are using datastax' connector v. 1.2.1. The notebook's version is 0.5
To run the shell job we assembled a single jar with all the connector's dependencies and put it on the shell's class path. Then we created an rdd representing our table and used its count method. As I said before everything completed within about 10 minutes.
Next, in order to test our notebook setup, we specified the connector as a dependency, imported all the necessary classes and started the notebook. When we counted the rows here there was a small number of tasks that were shown in mesos as running but
they never finished. The error we've found in individual node's logs is pasted below. However this was never reported (meaning the task status was "running"). Now, when we went to the problematic node and killed each process manually everything completed (meaning
the master restarted the tasks somewhere and they finished quickly). We tried this several times as the same thing always happens.
Also it is worth saying that we can run some other simple tasks (like take(10)) so we know the problem is not with connecting to the DB or anything.
Help appreciated!
Marcin
15/06/11 10:19:05 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x3b19e9c3, /10.11.105.101:42226 :> /10.11.105.137:9042] EXCEPTION: com.datastax.driver.core.ConnectionException: [/10.11.105.137:9042] Connection has been closed)
java.util.concurrent.RejectedExecutionException: Task com.google.common.util.concurrent.ListenableFutureTask@6024131 rejected from java.util.concurrent.ThreadPoolExecutor@7e05fb0b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 20]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:49)
at com.datastax.driver.core.Cluster$Manager.onSuspected(Cluster.java:1553)
at com.datastax.driver.core.Cluster$Manager.signalConnectionFailure(Cluster.java:1907)
at com.datastax.driver.core.Connection.defunct(Connection.java:300)
at com.datastax.driver.core.Connection$Dispatcher.exceptionCaught(Connection.java:770)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream(IdleStateAwareChannelUpstreamHandler.java:36)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:48)
at org.jboss.netty.handler.timeout.IdleStateHandler.channelIdle(IdleStateHandler.java:392)
at org.jboss.netty.handler.timeout.IdleStateHandler$1.run(IdleStateHandler.java:382)
at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/11 10:19:05 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x5bd942d4, /10.11.105.101:33290 :> /10.11.105.101:9042] EXCEPTION: com.datastax.driver.core.ConnectionException: [/10.11.105.101:9042] Connection has been closed)
java.util.concurrent.RejectedExecutionException: Task com.google.common.util.concurrent.ListenableFutureTask@77eb6b08 rejected from java.util.concurrent.ThreadPoolExecutor@7e05fb0b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 20]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:49)
at com.datastax.driver.core.Cluster$Manager.onSuspected(Cluster.java:1553)
at com.datastax.driver.core.Cluster$Manager.signalConnectionFailure(Cluster.java:1907)
at com.datastax.driver.core.Connection.defunct(Connection.java:300)
at com.datastax.driver.core.Connection$Dispatcher.exceptionCaught(Connection.java:770)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream(IdleStateAwareChannelUpstreamHandler.java:36)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:48)
at org.jboss.netty.handler.timeout.IdleStateHandler.channelIdle(IdleStateHandler.java:392)
at org.jboss.netty.handler.timeout.IdleStateHandler$1.run(IdleStateHandler.java:382)
at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/11 10:19:05 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x312a7952, /10.11.105.101:54999 :> /10.11.105.103:9042] EXCEPTION: com.datastax.driver.core.ConnectionException: [/10.11.105.103:9042] Connection has been closed)
java.util.concurrent.RejectedExecutionException: Task com.google.common.util.concurrent.ListenableFutureTask@258259ab rejected from java.util.concurrent.ThreadPoolExecutor@7e05fb0b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 20]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:49)
at com.datastax.driver.core.Cluster$Manager.onSuspected(Cluster.java:1553)
at com.datastax.driver.core.Cluster$Manager.signalConnectionFailure(Cluster.java:1907)
at com.datastax.driver.core.Connection.defunct(Connection.java:300)
at com.datastax.driver.core.Connection$Dispatcher.exceptionCaught(Connection.java:770)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream(IdleStateAwareChannelUpstreamHandler.java:36)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:48)
at org.jboss.netty.handler.timeout.IdleStateHandler.channelIdle(IdleStateHandler.java:392)
at org.jboss.netty.handler.timeout.IdleStateHandler$1.run(IdleStateHandler.java:382)
at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/11 10:19:05 WARN DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x04cd364f, /10.11.105.101:57599 :> /10.11.105.98:9042] EXCEPTION: com.datastax.driver.core.ConnectionException: [/10.11.105.98:9042] Connection has been closed)
java.util.concurrent.RejectedExecutionException: Task com.google.common.util.concurrent.ListenableFutureTask@c017ba rejected from java.util.concurrent.ThreadPoolExecutor@7e05fb0b[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 20]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at com.google.common.util.concurrent.MoreExecutors$ListeningDecorator.execute(MoreExecutors.java:480)
at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:49)
at com.datastax.driver.core.Cluster$Manager.onSuspected(Cluster.java:1553)
at com.datastax.driver.core.Cluster$Manager.signalConnectionFailure(Cluster.java:1907)
at com.datastax.driver.core.Connection.defunct(Connection.java:300)
at com.datastax.driver.core.Connection$Dispatcher.exceptionCaught(Connection.java:770)
at org.jboss.netty.handler.timeout.IdleStateAwareChannelUpstreamHandler.handleUpstream(IdleStateAwareChannelUpstreamHandler.java:36)
at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:60)
at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:48)
at org.jboss.netty.handler.timeout.IdleStateHandler.channelIdle(IdleStateHandler.java:392)
at org.jboss.netty.handler.timeout.IdleStateHandler$1.run(IdleStateHandler.java:382)
at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:40)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/06/11 10:24:06 WARN AkkaUtils: Error sending message in 1 attempts
java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:187)
at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:407)
15/06/11 10:29:03 INFO LocalNodeFirstLoadBalancingPolicy: Suspected host 10.11.105.99 (analytics1)
15/06/11 10:36:16 ERROR Session: Error creating pool to /10.11.105.99:9042
com.datastax.driver.core.ConnectionException: [/10.11.105.99:9042] Unexpected error during transport initialization (com.datastax.driver.core.OperationTimedOutException: [/10.11.105.99:9042] Operation timed out)
at com.datastax.driver.core.Connection.initializeTransport(Connection.java:186)
at com.datastax.driver.core.Connection.<init>(Connection.java:116)
at com.datastax.driver.core.PooledConnection.<init>(PooledConnection.java:32)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:586)
at com.datastax.driver.core.DynamicConnectionPool.<init>(DynamicConnectionPool.java:74)
at com.datastax.driver.core.HostConnectionPool.newInstance(HostConnectionPool.java:33)
at com.datastax.driver.core.SessionManager.replacePool(SessionManager.java:271)
at com.datastax.driver.core.SessionManager.access$400(SessionManager.java:40)
at com.datastax.driver.core.SessionManager$3.call(SessionManager.java:308)
at com.datastax.driver.core.SessionManager$3.call(SessionManager.java:300)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.datastax.driver.core.OperationTimedOutException: [/10.11.105.99:9042] Operation timed out
at com.datastax.driver.core.Connection$Future.onTimeout(Connection.java:917)
at com.datastax.driver.core.Connection$ResponseHandler$1.run(Connection.java:981)
at org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
at org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
... 1 more
15/06/11 10:55:22 ERROR Session: Error creating pool to /10.11.105.143:9042
com.datastax.driver.core.TransportException: [/10.11.105.143:9042] Cannot connect
at com.datastax.driver.core.Connection.<init>(Connection.java:109)
at com.datastax.driver.core.PooledConnection.<init>(PooledConnection.java:32)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:586)
at com.datastax.driver.core.DynamicConnectionPool.<init>(DynamicConnectionPool.java:74)
at com.datastax.driver.core.HostConnectionPool.newInstance(HostConnectionPool.java:33)
at com.datastax.driver.core.SessionManager.replacePool(SessionManager.java:271)
at com.datastax.driver.core.SessionManager.access$400(SessionManager.java:40)
at com.datastax.driver.core.SessionManager$3.call(SessionManager.java:308)
at com.datastax.driver.core.SessionManager$3.call(SessionManager.java:300)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.jboss.netty.channel.ConnectTimeoutException: connection timed out: /10.11.105.143:9042
at org.jboss.netty.channel.socket.nio.NioClientBoss.processConnectTimeout(NioClientBoss.java:137)
at org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:83)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
... 3 more
15/06/11 10:51:26 WARN Connection: Forcing termination of Connection[/10.11.105.135:9042-2, inFlight=0, closed=true]. This should not happen and is likely a bug, please report.
15/06/11 10:50:54 WARN Connection: Forcing termination of Connection[/10.11.105.135:9042-2, inFlight=0, closed=true]. This should not happen and is likely a bug, please report.
15/06/11 10:48:04 WARN Connection: Forcing termination of Connection[/10.11.105.135:9042-2, inFlight=0, closed=true]. This should not happen and is likely a bug, please report.
15/06/11 10:47:28 WARN Connection: Forcing termination of Connection[/10.11.105.135:9042-2, inFlight=0, closed=true]. This should not happen and is likely a bug, please report.
15/06/11 10:45:46 WARN Connection: Forcing termination of Connection[/10.11.105.135:9042-2, inFlight=0, closed=true]. This should not happen and is likely a bug, please report.
15/06/11 10:45:18 WARN Connection: Forcing termination of Connection[/10.11.105.135:9042-2, inFlight=0, closed=true]. This should not happen and is likely a bug, please report.
This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited.
If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.