Train not working

165 views
Skip to first unread message

Digambar Bhat

unread,
Jun 8, 2016, 3:51:55 AM6/8/16
to actionml-user
I am using UR template. It was working when I had only one event played. Now I have three events: played,download,liked

For all of these events I have data and imported to event server. But when I run pio train Its not working.

Please find below log for same.
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver...@10.0.0.9:53231]
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.template.DataSource@10131289
[INFO] [Engine$] Preparator: org.template.Preparator@11e17893
[INFO] [Engine$] AlgorithmList: List(org.template.URAlgorithm@ac4915e)
[INFO] [Engine$] Data sanity check is on.
[Stage 4:>                                                          (0 + 2) / 2][WARN] [TaskSetManager] Lost task 1.0 in stage 4.0 (TID 9, ip-10-0-0-9): java.lang.OutOfMemoryError: GC overhead limit exceeded
        at com.twitter.chill.ObjectSerializer.cachedRead(ObjectSerializer.scala:38)
        at com.twitter.chill.ObjectSerializer.read(ObjectSerializer.scala:41)
        at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:651)
        at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:605)
        at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
        at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
        at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)
        at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:169)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:201)
        at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:198)
        at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
        at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:122)
        at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:127)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

[WARN] [TaskSetManager] Lost task 0.0 in stage 4.0 (TID 8, ip-10-0-0-9): FetchFailed(BlockManagerId(0, ip-10-0-0-9, 47745), shuffleId=0, mapId=1, reduceId=0, message=
org.apache.spark.shuffle.FetchFailedException: Error in opening FileSegmentManagedBuffer{file=/tmp/spark-a01777ba-3b4a-44c6-aeff-44f30a9d4f3d/executor-f085563e-fe3b-4716-9938-fe726ff3d317/blockmgr-69cb01c0-2fc5-4194-8b2f-ec8e799ebe57/15/shuffle_0_1_0.data, offset=0, length=81610299}
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:307)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
        at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
        at org.apache.spark.rdd.SubtractedRDD.integrate$1(SubtractedRDD.scala:122)
        at org.apache.spark.rdd.SubtractedRDD.compute(SubtractedRDD.scala:127)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error in opening FileSegmentManagedBuffer{file=/tmp/spark-a01777ba-3b4a-44c6-aeff-44f30a9d4f3d/executor-f085563e-fe3b-4716-9938-fe726ff3d317/blockmgr-69cb01c0-2fc5-4194-8b2f-ec8e799ebe57/15/shuffle_0_1_0.data, offset=0, length=81610299}
        at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:113)
        at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:304)
        ... 24 more
Caused by: java.io.FileNotFoundException: /tmp/spark-a01777ba-3b4a-44c6-aeff-44f30a9d4f3d/executor-f085563e-fe3b-4716-9938-fe726ff3d317/blockmgr-69cb01c0-2fc5-4194-8b2f-ec8e799ebe57/15/shuffle_0_1_0.data (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.apache.spark.network.buffer.FileSegmentManagedBuffer.createInputStream(FileSegmentManagedBuffer.java:98)
        ... 25 more

)
[WARN] [TransportChannelHandler] Exception in connection from ip-10-0-0-9/10.0.0.9:57654
[ERROR] [TaskSchedulerImpl] Lost executor 0 on ip-10-0-0-9: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 1.1 in stage 4.0 (TID 10, ip-10-0-0-9): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 0.0 in stage 1.1 (TID 11, ip-10-0-0-9): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 1.0 in stage 1.1 (TID 12, ip-10-0-0-9): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 1.0 in stage 3.1 (TID 14, ip-10-0-0-9): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[WARN] [TaskSetManager] Lost task 0.0 in stage 3.1 (TID 13, ip-10-0-0-9): ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
[Stage 4:>                                                          (0 + 2) / 2]

Pat Ferrel

unread,
Jun 8, 2016, 8:44:26 PM6/8/16
to Digambar Bhat, actionml-user
This is a typical problem, indicating the need for more memory.

read this: http://www.actionml.com/docs/ur_advanced_tuning The first discussion is the OOM error.

--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/b7333317-d238-4f4e-a2d1-89fe68fa989b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages