There is an event storage with 15731525 event. I tried to run the train, with the eventWindow setting:
After an hour of train, one of the executor lost and the 8 stage was failed. With the eventWindow.duration property enabled, will be the older events deleted in the Hbase, or it creates a "temporary table" for the events in the time window?
If it creates only a temp table, the creation of this table takes too much time to use it. I've attached a html, the view of the spark ui maybe it helps.
[Stage 8:> (0 + 3) / 3][WARN] [HeartbeatReceiver] Removing executor 1 with no recent heartbeats: 163511 ms exceeds timeout 120000 ms
[WARN] [TaskSetManager] Lost task 2.0 in stage 8.0 (TID 23,
sparkslave01.profession.hu): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 163511 ms
[WARN] [TaskSetManager] Lost task 0.0 in stage 8.0 (TID 21,
sparkslave01.profession.hu): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 163511 ms
[ERROR] [TaskSchedulerImpl] Lost executor 1 on
sparkslave01.profession.hu: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 0.1 in stage 8.0 (TID 24,
sparkslave01.profession.hu): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 8:> (0 + 3) / 3][WARN] [HeartbeatReceiver] Removing executor 0 with no recent heartbeats: 176955 ms exceeds timeout 120000 ms
[WARN] [TaskSetManager] Lost task 0.2 in stage 8.0 (TID 26,
sparkmaster.profession.hu): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 176955 ms
[WARN] [TaskSetManager] Lost task 1.0 in stage 8.0 (TID 22,
sparkmaster.profession.hu): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 176955 ms
[WARN] [TaskSetManager] Lost task 2.1 in stage 8.0 (TID 25,
sparkmaster.profession.hu): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 176955 ms
[ERROR] [TaskSchedulerImpl] Lost executor 0 on
sparkmaster.profession.hu: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 1.1 in stage 8.0 (TID 28,
sparkmaster.profession.hu): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 2.2 in stage 8.0 (TID 27,
sparkslave01.profession.hu): FetchFailed(null, shuffleId=3, mapId=-1, reduceId=2, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 3
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:542)
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:538)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:538)
at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:121)
at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
)
[WARN] [TaskSetManager] Lost task 0.3 in stage 8.0 (TID 29,
sparkslave01.profession.hu): FetchFailed(null, shuffleId=3, mapId=-1, reduceId=0, message=
org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle 3
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:542)
at org.apache.spark.MapOutputTracker$$anonfun$org$apache$spark$MapOutputTracker$$convertMapStatuses$2.apply(MapOutputTracker.scala:538)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.MapOutputTracker$.org$apache$spark$MapOutputTracker$$convertMapStatuses(MapOutputTracker.scala:538)
at org.apache.spark.MapOutputTracker.getMapSizesByExecutorId(MapOutputTracker.scala:155)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:47)
at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:121)
at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)