Re: java.util.NoSuchElementException: head of empty list when running train

463 views
Skip to first unread message

Pat Ferrel

unread,
Jun 18, 2018, 11:25:09 AM6/18/18
to us...@predictionio.apache.org, Anuj Kumar, actionml-user
This sounds like some missing required config in engine.json. Can you share the file?


From: Anuj Kumar <anuj....@timesinternet.in>
Reply: us...@predictionio.apache.org <us...@predictionio.apache.org>
Date: June 18, 2018 at 5:05:22 AM
To: us...@predictionio.apache.org <us...@predictionio.apache.org>
Subject:  java.util.NoSuchElementException: head of empty list when running train

Getting this while running "pio train". Please help 

Exception in thread "main" java.util.NoSuchElementException: head of empty list

at scala.collection.immutable.Nil$.head(List.scala:420)

at scala.collection.immutable.Nil$.head(List.scala:417)

at org.apache.mahout.math.cf.SimilarityAnalysis$.crossOccurrenceDownsampled(SimilarityAnalysis.scala:177)

at com.actionml.URAlgorithm.calcAll(URAlgorithm.scala:343)

at com.actionml.URAlgorithm.train(URAlgorithm.scala:295)

at com.actionml.URAlgorithm.train(URAlgorithm.scala:180)

at org.apache.predictionio.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:49)

at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690)

at org.apache.predictionio.controller.Engine$$anonfun$18.apply(Engine.scala:690)

at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)

at scala.collection.immutable.List.foreach(List.scala:381)

at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)

at scala.collection.immutable.List.map(List.scala:285)

at org.apache.predictionio.controller.Engine$.train(Engine.scala:690)

at org.apache.predictionio.controller.Engine.train(Engine.scala:176)

at org.apache.predictionio.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:67)

at org.apache.predictionio.workflow.CreateWorkflow$.main(CreateWorkflow.scala:251)

at org.apache.predictionio.workflow.CreateWorkflow.main(CreateWorkflow.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
-
Best,
Anuj Kumar

Anuj Kumar

unread,
Jun 19, 2018, 1:11:15 AM6/19/18
to p...@occamsmachete.com, us...@predictionio.apache.org, action...@googlegroups.com
Sure, here it is. 

{

  "comment":" This config file uses default settings for all but the required values see README.md for docs",

  "id": "default",

  "description": "Default settings",

  "engineFactory": "com.actionml.RecommendationEngine",

  "datasource": {

    "params" : {

      "name": "sample-handmad",

      "appName": "np",

      "eventNames": ["read", "search", "view", "category-pref"],

      "minEventsPerUser": 1,

      "eventWindow": {

        "duration": "300 days",

        "removeDuplicates": true,

        "compressProperties": true

      }

    }

  },

  "sparkConf": {

    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",

    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",

    "spark.kryo.referenceTracking": "false",

    "spark.kryoserializer.buffer": "300m",

    "spark.executor.memory": "4g",

    "spark.executor.cores": "2",

    "spark.task.cpus": "2",

    "spark.default.parallelism": "16",

    "es.index.auto.create": "true"

  },

  "algorithms": [

    {

      "comment": "simplest setup where all values are default, popularity based backfill, must add eventsNames",

      "name": "ur",

      "params": {

        "appName": "np",

        "indexName": "np",

        "typeName": "items",

        "blacklistEvents": [],

        "comment": "must have data for the first event or the model will not build, other events are optional",

        "indicators": [

          {

            "name": "read"

          },{

            "name": "search",

            "maxCorrelatorsPerItem": 5

          },{

            "name": "category-pref",

            "maxCorrelatorsPerItem": 50

          },{

            "name": "view",

            "maxCorrelatorsPerItem": 50

          }

        ],

        "expireDateName": "itemExpiry",

        "dateName": "date",

        "num": 5

      }

    }

  ]

}


Anuj Kumar

unread,
Jun 19, 2018, 4:24:02 AM6/19/18
to p...@occamsmachete.com, us...@predictionio.apache.org, action...@googlegroups.com
Tried with basic engine.json mentioned at UL site examples. Seems to work but got stuck at "pio deploy" throwing following error 

[ERROR] [OneForOneStrategy] Failed to invert: [B@35c7052



before that "pio train" was successful but gave following error. I suspect because of this reason "pio deploy" is not working. Please help

[ERROR] [HDFSModels] File /models/pio_modelAWQXIr4APcDlNQi8DwVj could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.

at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1726)

at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2565)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:829)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:510)

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)

at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:850)

at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:793)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1840)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2489)


Pat Ferrel

unread,
Jun 19, 2018, 11:46:45 AM6/19/18
to Anuj Kumar, us...@predictionio.apache.org, action...@googlegroups.com
Can you show me where on the AML site it says to store models in HDFS, it should not say that? I think that may be from the PIO site so you should ignore it.

Can you share your pio-env? You need to go through the whole workflow from pio build, pio train, to pio deploy using a template from the same directory and with the same engine.json and pio-env and I suspect something is wrong in pio-env. 

Anuj Kumar

unread,
Jun 19, 2018, 1:30:52 PM6/19/18
to p...@occamsmachete.com, us...@predictionio.apache.org, action...@googlegroups.com
Hi Pat, 
          Read it on the below link 


here is the pio-env.sh 

SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.6

POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-42.0.0.jar

MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.41.jar

HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

HBASE_CONF_DIR=/usr/local/hbase/conf

PIO_FS_BASEDIR=$HOME/.pio_store

PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines

PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta

PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event

PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model

PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS

PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc

PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio

PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio

PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio

PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch

PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/usr/local/els

PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=pio

PIO_STORAGE_SOURCES_HDFS_TYPE=hdfs

PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://localhost:9000/models

PIO_STORAGE_SOURCES_HBASE_TYPE=hbase

PIO_STORAGE_SOURCES_HBASE_HOME=/usr/local/hbase


Thanks, 
Anuj Kumar


Pat Ferrel

unread,
Jun 19, 2018, 3:15:47 PM6/19/18
to Anuj Kumar, us...@predictionio.apache.org, action...@googlegroups.com
Yes, those instructions tell you to run HDFS in pseudo-cluster mode. What do you see in the HDFS GUI on localhost:50070 ?

Those setup instructions create a pseudo-clustered Spark, and HDFS/HBase. This runs on a single machine but as the page says, are configured so you can easily expand to a cluster by replacing config to point to remote HDFS or Spark clusters.

One fix, if you don’t want to run those services in pseudo-cluster mode is:

1) remove any mention of PGSQL or jdbc, we are not using it. These are not found on the page you linked to and are not used.
2) on a single machine you can put the dummy/empty model file in LOCALFS so change the lines
    PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=HDFS
    PIO_STORAGE_SOURCES_HDFS_PATH=hdfs://localhost:9000/models
to 
    PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE= LOCALFS
    PIO_STORAGE_SOURCES_HDFS_PATH=/path/to/models
substituting with a directory where you want to save models

Running them in a pseudo-cluster mode gives you GUIs to see job progress and browse HDFS for files, among other things. We recommend it for helping to debug problems when you get to large amounts of data and begin running out of resources.
--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/CAN5v0zfsuiGHsqgVdtAgc0t8%3DopRTGg6WE7KPEhhkjfrPvWVeg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages