[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(next_buy))
[INFO] [Engine] Extracting preparator params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Preparator params: (,PreparatorParams(1800000))
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://spark...@192.168.1.79:49704]
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: NEXT_BUY.DataSource@14fded9d
[INFO] [Engine$] Preparator: NEXT_BUY.Preparator@34c70b5e
[INFO] [Engine$] AlgorithmList: List(NEXT_BUY.CooccurrenceAlgorithm@6198e9b5)
[INFO] [Engine$] Data sanity check is on.
[INFO] [Engine$] NEXT_BUY.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] NEXT_BUY.PreparedData does not support data sanity check. Skipping check.
[INFO] [Engine$] NEXT_BUY.CooccurrenceModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=26acd352-99a2-4ee9-9c5d-b287054c0651
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.{
"id": "default",
"description": "Default settings",
"engineFactory": "NEXT_BUY.ViewedThenBoughtProductEngine",
"datasource": {
"params" : {
"appName": "next_buy"
}
},
"algorithms": [
{
"name": "cooccurrence",
"params": {
"n": 20
}
}
],
"preparator": {
"params" : {
"sessionTimeout": 1800000
}
}
Hi, Pat thanks for the answer.
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(handmade,List(purchase, view, add-to-cart),Some(EventWindow(Some(24 days),true,true))))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver...@192.168.1.71:53951]
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.template.DataSource@6a6c7f42
[INFO] [Engine$] Preparator: org.template.Preparator@32f32623
[INFO] [Engine$] AlgorithmList: List(org.template.URAlgorithm@79476a4e)
[INFO] [Engine$] Data sanity check is on.
[Stage 0:> (0 + 4) / 4][ERROR] [Executor] Exception in task 1.0 in stage 0.0 (TID 1)
[ERROR] [SparkUncaughtExceptionHandler] Uncaught exception in thread Thread[Executor task launch worker-1,5,main]
[WARN] [TaskSetManager] Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Output.<init>(Output.java:35)
at org.apache.spark.serializer.KryoSerializer.newKryoOutput(KryoSerializer.scala:80)
at org.apache.spark.serializer.KryoSerializerInstance.output$lzycompute(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.output(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:293)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:265)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[ERROR] [TaskSetManager] Task 1 in stage 0.0 failed 1 times; aborting job
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Output.<init>(Output.java:35)
at org.apache.spark.serializer.KryoSerializer.newKryoOutput(KryoSerializer.scala:80)
at org.apache.spark.serializer.KryoSerializerInstance.output$lzycompute(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.output(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:293)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:265)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:264)
at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:126)
at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62)
at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61)
at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:551)
at org.apache.spark.rdd.RDD$$anonfun$sortBy$1.apply(RDD.scala:552)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.sortBy(RDD.scala:549)
at io.prediction.core.SelfCleaningDataSource$class.cleanPEvents(SelfCleaningDataSource.scala:225)
at org.template.DataSource.cleanPEvents(DataSource.scala:48)
at io.prediction.core.SelfCleaningDataSource$class.cleanPersistedPEvents(SelfCleaningDataSource.scala:147)
at org.template.DataSource.cleanPersistedPEvents(DataSource.scala:48)
at org.template.DataSource.readTraining(DataSource.scala:62)
at org.template.DataSource.readTraining(DataSource.scala:48)
at io.prediction.controller.PDataSource.readTrainingBase(PDataSource.scala:37)
at io.prediction.controller.Engine$.train(Engine.scala:641)
at io.prediction.controller.Engine.train(Engine.scala:174)
at io.prediction.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:65)
at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:247)
at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Output.<init>(Output.java:35)
at org.apache.spark.serializer.KryoSerializer.newKryoOutput(KryoSerializer.scala:80)
at org.apache.spark.serializer.KryoSerializerInstance.output$lzycompute(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.output(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:293)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:265)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped! Dropping event SparkListenerJobEnd(0,1470163472286,JobFailed(org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.lang.OutOfMemoryError: Java heap space
at com.esotericsoftware.kryo.io.Output.<init>(Output.java:35)
at org.apache.spark.serializer.KryoSerializer.newKryoOutput(KryoSerializer.scala:80)
at org.apache.spark.serializer.KryoSerializerInstance.output$lzycompute(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.output(KryoSerializer.scala:289)
at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:293)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:265)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:))
[Stage 0:> (0 + 3) / 4][ERROR] [LiveListenerBus] SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(driver, localhost, 53958),rdd_0_3,StorageLevel(false, true, false, true, 1),118744,0,0))
[WARN] [TaskSetManager] Lost task 3.0 in stage 0.0 (TID 3, localhost): TaskKilled (killed intentionally)
[ERROR] [LiveListenerBus] SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(0,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3d5743e0,null)
[INFO] [App$] Name | ID | Access Key | Allowed Event(s)
[INFO] [App$] handmade | 4 | Jq3TctwIRnQcfx10xPSHLGW3mXCeRD9OC0VahM9QhqUbJ1htH2cqURVLJwlhm5j4 | (all)
{
"comment":" This config file uses default settings for all but the required values see README.md for docs",
"id": "default",
"description": "Default settings",
"engineFactory": "org.template.RecommendationEngine",
"datasource": {
"params" : {
"name": "sample-handmade-data.txt",
"appName": "handmade",
"eventNames": ["purchase", "view", "add-to-cart"],
"eventWindow": {
"duration": "24 days",
"removeDuplicates":true,
"compressProperties":true
}
}
},
"sparkConf": {
"spark.serializer": "org.apache.spark.serializer.KryoSerializer",
"spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
"spark.kryo.referenceTracking": "false",
"spark.kryoserializer.buffer": "300m",
"spark.executor.memory": "4g",
"es.index.auto.create": "true"
},
"algorithms": [
{
"comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",
"name": "ur",
"params": {
"appName": "handmade",
"indexName": "urindex",
"typeName": "items",
"comment": "must have data for the first event or the model will not build, other events are optional",
"eventNames": ["purchase", "view", "add-to-cart"],
"availableDateName": "available",
"expireDateName": "expires",
"dateName": "date",
"num": 4
}
}
]
}
There is still something wrong:
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(handmade,List(purchase, view, add-to-cart),Some(EventWindow(Some(24 days),true,true))))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [Remoting] Starting remoting
[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver...@192.168.1.71:54296]
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.template.DataSource@6a6c7f42
[INFO] [Engine$] Preparator: org.template.Preparator@32f32623
[INFO] [Engine$] AlgorithmList: List(org.template.URAlgorithm@79476a4e)
[INFO] [Engine$] Data sanity check is on.
Exception in thread "main" java.sql.SQLException: No suitable driver
at java.sql.DriverManager.getDriver(DriverManager.java:315)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$2.apply(JdbcUtils.scala:50)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.createConnectionFactory(JdbcUtils.scala:49)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:278)
at io.prediction.data.storage.jdbc.JDBCPEvents.write(JDBCPEvents.scala:158)
at io.prediction.data.storage.PEvents$class.write(PEvents.scala:170)
at io.prediction.data.storage.jdbc.JDBCPEvents.write(JDBCPEvents.scala:29)
at io.prediction.core.SelfCleaningDataSource$class.wipePEvents(SelfCleaningDataSource.scala:171)
at org.template.DataSource.wipePEvents(DataSource.scala:48)
at io.prediction.core.SelfCleaningDataSource$class.cleanPersistedPEvents(SelfCleaningDataSource.scala:154)