The pio version says 0.9.6. I tried with the UR template master branch, but I've got the same error messages.2016-05-26 22:08 GMT+02:00 Pat Ferrel <p...@occamsmachete.com>:what does `pio version` say?
On May 26, 2016, at 10:04 AM, adam....@gmail.com wrote:
Hey Pat,
Something strange happens in stage 14, and I don't know the reason. We use UR template 3.0, with eventWindow property:
"eventWindow": {
"duration": "28 days",
"removeDuplicates": true,
"compressProperties": true
}
Everything worked 'till now. The train throws an error in stage 14:
I've attached the spark ui.
Maybe we should simply increase the timeout? Where should we set the timeout
[Stage 14:===================> (1 + 2) / 3][WARN] [HeartbeatReceiver] Removing executor 0 with no recent heartbeats: 131961 ms exceeds timeout 120000 ms
[ERROR] [TaskSchedulerImpl] Lost executor 0 on sparkmaster.profession.hu: Executor heartbeat timed out after 131961 ms
[WARN] [TaskSetManager] Lost task 0.0 in stage 14.0 (TID 82, sparkmaster.profession.hu): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 131961 ms
[WARN] [TaskSetManager] Lost task 2.0 in stage 14.0 (TID 84, sparkmaster.profession.hu): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 131961 ms
[WARN] [TransportChannelHandler] Exception in connection from sparkmaster.profession.hu/172.31.3.141:63250
[ERROR] [TaskSchedulerImpl] Lost executor 0 on sparkmaster.profession.hu: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 2.1 in stage 14.0 (TID 85, sparkmaster.profession.hu): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 2.2 in stage 14.0 (TID 87, sparkslave01.profession.hu): FetchFailed(BlockManagerId(0, sparkmaster.profession.hu, 61901), shuffleId=5, mapId=1, reduceId=2, message=
org.apache.spark.shuffle.FetchFailedException: Failed to connect to sparkmaster.profession.hu/172.31.3.141:61901
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:122)
at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to connect to sparkmaster.profession.hu/172.31.3.141:61901
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
at org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: java.net.ConnectException: Connection refused: sparkmaster.profession.hu/172.31.3.141:61901
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
)
[WARN] [TaskSetManager] Lost task 0.1 in stage 14.0 (TID 86, sparkslave01.profession.hu): FetchFailed(BlockManagerId(0, sparkmaster.profession.hu, 61901), shuffleId=5, mapId=1, reduceId=0, message=
org.apache.spark.shuffle.FetchFailedException: Failed to connect to sparkmaster.profession.hu/172.31.3.141:61901
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:122)
at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
Regards,
Adam Krajcs
<PredictionIO Training org.template.RecommendationEngine - Details for Stage 14 (Attempt 0).htm>