[ERROR] [ActorSystemImpl] Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatche

743 views

Skip to first unread message

Firman Gautama

unread,

Jun 1, 2015, 5:57:42 AM6/1/15

to predicti...@googlegroups.com

Hello All,

I would like to report an error happened when I'm trying to train around 1.4millions items of a data with the recommendation module.

driver memory = 4g

executor memory = 12g (tried with 16g too)

Here are the verbose output:

[INFO] [Console$] Using existing engine manifest JSON at /home/firman/30days/manifest.json

[INFO] [Runner$] Submission command: /pio/PredictionIO-0.9.3/vendors/spark-1.3.1-bin-hadoop2.6/bin/spark-submit --master spark://nn01.staging.us-tmp.xxxxxxxxxx.net:7077 --driver-memory 4G --executor-memory 16G --conf spark.akka.frameSize=1024 --class io.prediction.workflow.CreateWorkflow --jars file:/home/firman/30days/target/scala-2.10/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/home/firman/30days/target/scala-2.10/template-scala-parallel-recommendation_2.10-0.1-SNAPSHOT.jar --files file:/pio/PredictionIO-0.9.3/conf/log4j.properties,file:/etc/hadoop/conf/core-site.xml,file:/etc/hbase/conf/hbase-site.xml --driver-class-path /pio/PredictionIO-0.9.3/conf:/etc/hadoop/conf:/etc/hbase/conf file:/pio/PredictionIO-0.9.3/lib/pio-assembly-0.9.3.jar --engine-id WtB0kwNl9oPmR4HSN6HyQdTJydjXHfJE --engine-version c8ab3be2e4e998f4c7a675dd3e7babbf05a84504 --engine-variant file:/home/firman/30days/engine.json --verbosity 0 --json-extractor Both --env PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/firman/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_HBASE_HOME=/opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hbase,PIO_HOME=/pio/PredictionIO-0.9.3,PIO_FS_ENGINESDIR=/home/firman/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/home/firman/.pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/PredictionIO-0.9.3/vendors/elasticsearch-1.4.4,PIO_FS_TMPDIR=/home/firman/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/PredictionIO-0.9.3/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs

[INFO] [Engine] Extracting datasource params...

[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.

[INFO] [Engine] Datasource params: (,DataSourceParams(30days,None))

[INFO] [Engine] Extracting preparator params...

[INFO] [Engine] Preparator params: (,Empty)

[INFO] [Engine] Extracting serving params...

[INFO] [Engine] Serving params: (,Empty)

[INFO] [Remoting] Starting remoting

[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://spark...@nn01.staging.us-tmp.xxxxxxxxxx.net:49886]

[INFO] [Engine$] EngineWorkflow.train

[INFO] [Engine$] DataSource: xxxxxxxxxx.data30days.DataSource@136bc6a7

[INFO] [Engine$] Preparator: xxxxxxxxxx.data30days.Preparator@6388062b

[INFO] [Engine$] AlgorithmList: List(xxxxxxxxxx.data30days.ALSAlgorithm@6528571)

[INFO] [Engine$] Data santiy check is on.

[INFO] [Engine$] xxxxxxxxxx.data30days.TrainingData does not support data sanity check. Skipping check.

[INFO] [Engine$] xxxxxxxxxx.data30days.PreparedData does not support data sanity check. Skipping check.

[Stage 16:> (0 + 0) / 32][WARN] [TaskSetManager] Stage 16 contains a task of very large size (77479 KB). The maximum recommended task size is 100 KB.

[Stage 16:> (0 + 32) / 32][ERROR] [ActorSystemImpl] Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem [sparkDriver]

Regards,

Firman

Donald Szeto

unread,

Jun 1, 2015, 1:33:07 PM6/1/15

to predicti...@googlegroups.com, firman....@gmail.com

Hi Firman,

Do you see other detail messages inside "pio.log" where you launched the "pio train" command?

Regards,

Donald

On Monday, June 1, 2015 at 2:57:42 AM UTC-7, Firman Gautama wrote:

Hello All,

I would like to report an error happened when I'm trying to train around 1.4millions items of a data with the recommendation module.

driver memory = 4g
executor memory = 12g (tried with 16g too)

Here are the verbose output:

[INFO] [Console$] Using existing engine manifest JSON at /home/firman/30days/manifest.json
[INFO] [Runner$] Submission command: /pio/PredictionIO-0.9.3/vendors/spark-1.3.1-bin-hadoop2.6/bin/spark-submit --master spark://nn01.staging.us-tmp.xxxxxxxxxx.net:7077 --driver-memory 4G --executor-memory 16G --conf spark.akka.frameSize=1024 --class io.prediction.workflow.CreateWorkflow --jars file:/home/firman/30days/target/scala-2.10/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/home/firman/30days/target/scala-2.10/template-scala-parallel-recommendation_2.10-0.1-SNAPSHOT.jar --files file:/pio/PredictionIO-0.9.3/conf/log4j.properties,file:/etc/hadoop/conf/core-site.xml,file:/etc/hbase/conf/hbase-site.xml --driver-class-path /pio/PredictionIO-0.9.3/conf:/etc/hadoop/conf:/etc/hbase/conf file:/pio/PredictionIO-0.9.3/lib/pio-assembly-0.9.3.jar --engine-id WtB0kwNl9oPmR4HSN6HyQdTJydjXHfJE --engine-version c8ab3be2e4e998f4c7a675dd3e7babbf05a84504 --engine-variant file:/home/firman/30days/engine.json --verbosity 0 --json-extractor Both --env PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/firman/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_HBASE_HOME=/opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hbase,PIO_HOME=/pio/PredictionIO-0.9.3,PIO_FS_ENGINESDIR=/home/firman/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/home/firman/.pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/PredictionIO-0.9.3/vendors/elasticsearch-1.4.4,PIO_FS_TMPDIR=/home/firman/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/PredictionIO-0.9.3/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
[INFO] [Engine] Extracting datasource params...
[INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used.
[INFO] [Engine] Datasource params: (,DataSourceParams(30days,None))
[INFO] [Engine] Extracting preparator params...
[INFO] [Engine] Preparator params: (,Empty)
[INFO] [Engine] Extracting serving params...
[INFO] [Engine] Serving params: (,Empty)
[INFO] [Remoting] Starting remoting

[INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver@nn01.staging.us-tmp.xxxxxxxxxx.net:49886]

[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: xxxxxxxxxx.data30days.DataSource@136bc6a7
[INFO] [Engine$] Preparator: xxxxxxxxxx.data30days.Preparator@6388062b
[INFO] [Engine$] AlgorithmList: List(xxxxxxxxxx.data30days.ALSAlgorithm@6528571)
[INFO] [Engine$] Data santiy check is on.
[INFO] [Engine$] xxxxxxxxxx.data30days.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] xxxxxxxxxx.data30days.PreparedData does not support data sanity check. Skipping check.
[Stage 16:> (0 + 0) / 32][WARN] [TaskSetManager] Stage 16 contains a task of very large size (77479 KB). The maximum recommended task size is 100 KB.
[Stage 16:> (0 + 32) / 32][ERROR] [ActorSystemImpl] Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-4] shutting down ActorSystem [sparkDriver]

Regards,
Firman

Message has been deleted

Firman Gautama

unread,

Jun 1, 2015, 6:41:18 PM6/1/15

to predicti...@googlegroups.com, firman....@gmail.com

Hi Donald,

I tried to increase executor memory but it didn't solve the problem,

then I increased the driver memory, it seems fix the problem.

The "pio train" worked.

Below are the previous pio.log" verbose dump.

-----

2015-06-01 09:04:14,387 INFO io.prediction.tools.console.Console$ [main] - Using existing engine manifest JSON at /home/firman/30days/manifest.json

2015-06-01 09:04:16,618 INFO io.prediction.tools.Runner$ [main] - Submission command: /pio/PredictionIO-0.9.3/vendors/spark-1.3.1-bin-hadoop2.6/bin/spark-submit --driver-memory 4G --executor-memory 2G --class io.prediction.workflow.CreateWorkflow --jars file:/home/firman/30days/target/scala-2.10/template-scala-parallel-recommendation-assembly-0.1-SNAPSHOT-deps.jar,file:/home/firman/30days/target/scala-2.10/template-scala-parallel-recommendation_2.10-0.1-SNAPSHOT.jar --files file:/pio/PredictionIO-0.9.3/conf/log4j.properties,file:/etc/hadoop/conf/core-site.xml,file:/etc/hbase/conf/hbase-site.xml --driver-class-path /pio/PredictionIO-0.9.3/conf:/etc/hadoop/conf:/etc/hbase/conf file:/pio/PredictionIO-0.9.3/lib/pio-assembly-0.9.3.jar --engine-id WtB0kwNl9oPmR4HSN6HyQdTJydjXHfJE --engine-version c8ab3be2e4e998f4c7a675dd3e7babbf05a84504 --engine-variant file:/home/firman/30days/engine.json --verbosity 0 --json-extractor Both --env PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/firman/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_HBASE_HOME=/opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hbase,PIO_HOME=/pio/PredictionIO-0.9.3,PIO_FS_ENGINESDIR=/home/firman/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/home/firman/.pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/pio/PredictionIO-0.9.3/vendors/elasticsearch-1.4.4,PIO_FS_TMPDIR=/home/firman/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/pio/PredictionIO-0.9.3/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs

2015-06-01 09:04:20,424 INFO io.prediction.controller.Engine [main] - Extracting datasource params...

2015-06-01 09:04:20,539 INFO io.prediction.workflow.WorkflowUtils$ [main] - No 'name' is found. Default empty String will be used.

2015-06-01 09:04:20,561 INFO io.prediction.controller.Engine [main] - Datasource params: (,DataSourceParams(30days,None))

2015-06-01 09:04:20,562 INFO io.prediction.controller.Engine [main] - Extracting preparator params...

2015-06-01 09:04:20,564 INFO io.prediction.controller.Engine [main] - Preparator params: (,Empty)

2015-06-01 09:04:20,920 INFO io.prediction.controller.Engine [main] - Extracting serving params...

2015-06-01 09:04:20,920 INFO io.prediction.controller.Engine [main] - Serving params: (,Empty)

2015-06-01 09:04:23,367 INFO Remoting [sparkDriver-akka.actor.default-dispatcher-3] - Starting remoting

2015-06-01 09:04:23,639 INFO Remoting [sparkDriver-akka.actor.default-dispatcher-3] - Remoting started; listening on addresses :[akka.tcp://spark...@nn01.staging.us-tmp.xxxxxxxxxx.net:22898]

2015-06-01 09:04:24,897 INFO io.prediction.controller.Engine$ [main] - EngineWorkflow.train

2015-06-01 09:04:24,898 INFO io.prediction.controller.Engine$ [main] - DataSource: xxxxxxxxxx.data30days.DataSource@29c134e1

2015-06-01 09:04:24,899 INFO io.prediction.controller.Engine$ [main] - Preparator: xxxxxxxxxx.data30days.Preparator@86369c6

2015-06-01 09:04:24,900 INFO io.prediction.controller.Engine$ [main] - AlgorithmList: List(xxxxxxxxxx.data30days.ALSAlgorithm@5a3770d2)

2015-06-01 09:04:24,901 INFO io.prediction.controller.Engine$ [main] - Data santiy check is on.

2015-06-01 09:04:27,394 INFO io.prediction.controller.Engine$ [main] - xxxxxxxxxx.data30days.TrainingData does not support data sanity check. Skipping check.

2015-06-01 09:04:27,395 INFO io.prediction.controller.Engine$ [main] - xxxxxxxxxx.data30days.PreparedData does not support data sanity check. Skipping check.

2015-06-01 09:08:35,547 WARN org.elasticsearch.transport [elasticsearch[Mark Todd][transport_client_worker][T#1]{New I/O worker #1}] - [Mark Todd] Received response for a request that has timed out, sent [20760ms] ago, timed out [3499ms] ago, action [cluster:monitor/nodes/info], node [[#transport#-1][nn01.staging.us-tmp][inet[localhost/127.0.0.1:9300]]], id [47]

2015-06-01 09:18:17,497 ERROR org.apache.spark.executor.Executor [Executor task launch worker-2] - Exception in task 2.0 in stage 7.0 (TID 15)

java.lang.OutOfMemoryError: GC overhead limit exceeded

2015-06-01 09:18:17,503 ERROR org.apache.spark.executor.Executor [Executor task launch worker-5] - Exception in task 5.0 in stage 7.0 (TID 18)

java.lang.OutOfMemoryError: GC overhead limit exceeded

at scala.collection.immutable.HashMap$HashTrieMap.updated0(HashMap.scala:328)

at scala.collection.immutable.HashMap$HashTrieMap.updated0(HashMap.scala:326)

at scala.collection.immutable.HashMap.updated(HashMap.scala:54)

at scala.collection.immutable.HashMap$SerializationProxy.readObject(HashMap.scala:516)

at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)

at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)