Universal Recommender - pio train ([ERROR] [Executor] Exception in task 0.0 in stage 6.0 (TID 11))

389 views
Skip to first unread message

Jarrad Salmon

unread,
May 25, 2016, 9:04:00 PM5/25/16
to actionml-user
Hi,

I am having an issue with training the Universal Recommender. I have ran through the integration test and that passed fine.
I also have the same data set training correctly with the Similar Product template.

Here is the error I get:

[INFO] [Console$] Using existing engine manifest JSON at /home/PredictionTemplates/RecommendedListings/manifest.json
[INFO] [Runner$] Submission command: /home/PredictionIO/vendors/spark-1.6.0/bin/spark-submit --class io.prediction.workflow.CreateWorkflow --jars fil                                                                                        e:/home/PredictionTemplates/RecommendedListings/target/scala-2.10/template-scala-parallel-universal-recommendation_2.10-0.3.0.jar,file:/home/Predicti                                                                                        onTemplates/RecommendedListings/target/scala-2.10/template-scala-parallel-universal-recommendation-assembly-0.3.0-deps.jar --files file:/home/Predict                                                                                        ionIO/conf/log4j.properties,file:/home/PredictionIO/vendors/hbase-1.1.2/conf/hbase-site.xml --driver-class-path /home/PredictionIO/conf:/home/Predict                                                                                        ionIO/lib/postgresql-9.4-1204.jdbc41.jar:/home/PredictionIO/lib/mysql-connector-java-5.1.37.jar:/home/PredictionIO/vendors/hbase-1.1.2/conf file:/hom                                                                                        e/PredictionIO/assembly/pio-assembly-0.9.7-aml.jar --engine-id zbWDvq5CRx0TzTsPdUcieAwMP1cDMw4x --engine-version f2165152d2aa92c1f2efdfcfc8eb72b114a4                                                                                        34af --engine-variant file:/home/PredictionTemplates/RecommendedListings/engine.json --verbosity 0 --json-extractor Both --env PIO_STORAGE_SOURCES_HB                                                                                        ASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_HBASE_HOME=/home/                                                                                        PredictionIO/vendors/hbase-1.1.2,PIO_HOME=/home/PredictionIO,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/root/.pio_s                                                                                        tore/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODE                                                                                        LDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/home/PredictionIO/vendors/elasticsearc                                                                                        h-1.7.3,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF                                                                                        _DIR=/home/PredictionIO/conf,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(DesignerWardrobeLocal,List(purchase, view, watch, offer),None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[WARN] [ThreadLocalRandom] Failed to generate a seed from SecureRandom within 3 seconds. Not enough entrophy?
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver...@43.229.61.219:35025]
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.template.DataSource@3711c71c
[INFO] [Engine$] Preparator: org.template.Preparator@30508066
[INFO] [Engine$] AlgorithmList: List(org.template.URAlgorithm@2f09e6b2)
[INFO] [Engine$] Data sanity check is on.
[INFO] [Engine$] org.template.TrainingData does not support data sanity check. Skipping check.
[Stage 6:>                                                          (0 + 6) / 6][ERROR] [Executor] Exception in task 0.0 in stage 6.0 (TID 11)
[ERROR] [Executor] Exception in task 5.0 in stage 6.0 (TID 16)
[ERROR] [Executor] Exception in task 4.0 in stage 6.0 (TID 15)
[ERROR] [SparkUncaughtExceptionHandler] Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
[ERROR] [SparkUncaughtExceptionHandler] Uncaught exception in thread Thread[Executor task launch worker-3,5,main]
[ERROR] [SparkUncaughtExceptionHandler] Uncaught exception in thread Thread[Executor task launch worker-1,5,main]
[WARN] [TaskSetManager] Lost task 5.0 in stage 6.0 (TID 16, localhost): java.lang.OutOfMemoryError: Java heap space
        at com.esotericsoftware.kryo.io.Output.<init>(Output.java:35)
        at org.apache.spark.serializer.KryoSerializer.newKryoOutput(KryoSerializer.scala:80)
        at org.apache.spark.serializer.KryoSerializerInstance.output$lzycompute(KryoSerializer.scala:289)
        at org.apache.spark.serializer.KryoSerializerInstance.output(KryoSerializer.scala:289)
        at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:293)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:239)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

[ERROR] [TaskSetManager] Task 5 in stage 6.0 failed 1 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 6.0 failed 1 times, most recent failure                                                                                        : Lost task 5.0 in stage 6.0 (TID 16, localhost): java.lang.OutOfMemoryError: Java heap space
        at com.esotericsoftware.kryo.io.Output.<init>(Output.java:35)
        at org.apache.spark.serializer.KryoSerializer.newKryoOutput(KryoSerializer.scala:80)
        at org.apache.spark.serializer.KryoSerializerInstance.output$lzycompute(KryoSerializer.scala:289)
        at org.apache.spark.serializer.KryoSerializerInstance.output(KryoSerializer.scala:289)
        at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:293)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:239)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
        at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
        at org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark$.apply(IndexedDatasetSpark.scala:73)
        at org.template.Preparator$$anonfun$1.apply(Preparator.scala:45)
        at org.template.Preparator$$anonfun$1.apply(Preparator.scala:41)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at org.template.Preparator.prepare(Preparator.scala:41)
        at org.template.Preparator.prepare(Preparator.scala:27)
        at io.prediction.controller.PPreparator.prepareBase(PPreparator.scala:34)
        at io.prediction.controller.Engine$.train(Engine.scala:668)
        at io.prediction.controller.Engine.train(Engine.scala:174)
        at io.prediction.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:65)
        at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:247)
        at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.OutOfMemoryError: Java heap space
        at com.esotericsoftware.kryo.io.Output.<init>(Output.java:35)
        at org.apache.spark.serializer.KryoSerializer.newKryoOutput(KryoSerializer.scala:80)
        at org.apache.spark.serializer.KryoSerializerInstance.output$lzycompute(KryoSerializer.scala:289)
        at org.apache.spark.serializer.KryoSerializerInstance.output(KryoSerializer.scala:289)
        at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:293)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:239)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



This is my engine.json:


  "comment":" This config file uses default settings for all but the required values see README.md for docs",
  "id": "default",
  "description": "Default settings",
  "engineFactory": "org.template.RecommendationEngine",
  "datasource": {
    "params" : {
      "name": "RL for DW",
      "appName": "DesignerWardrobeLocal",
      "eventNames": ["purchase", "view", "watch", "offer"]
    }
  },
  "sparkConf": {
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "spark.executor.memory": "4g",
    "es.index.auto.create": "true"
  },
  "algorithms": [
    {
      "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
      "name": "ur",
      "params": {
        "appName": "DesignerWardrobeLocal",
        "indexName": "urindex",
        "typeName": "items",
        "comment": "must have data for the first event or the model will not build, other events are optional",
        "eventNames": ["purchase", "view", "watch", "offer"]
      }
    }
  ]
}






Pat Ferrel

unread,
May 25, 2016, 9:19:09 PM5/25/16
to Jarrad Salmon, actionml-user
"Caused by: java.lang.OutOfMemoryError: Java heap space"

you need to allocate more memory for driver and executor, they need roughly the same amount. Try increasing from the default 4g

pio train -- --driver-memory 4g --executor-memory 4g

until the job completes. 

Do not increase beyond the physical memory of you machine, the above command line uses up 8g of memory in total and remember that the other services need memory too.


--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/fc10db2d-ee30-414c-be95-2fabdeb779cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jarrad Salmon

unread,
May 25, 2016, 9:32:47 PM5/25/16
to actionml-user, p...@occamsmachete.com
Great! It looks like I have managed to get it to work by tweaking these values.

This was only a very small data set (1000 user, 1000 listings, 2000 events), so I am wondering what my requirements will be on my production data. Is their any documentation around memory requirements and data set size?

Also is it possible to increase the memory via a swapfile or does PIO need actual RAM?

Thanks
Reply all
Reply to author
Forward
0 new messages