Universal Recommender on PredictionIO 0.12.0 - Spark-2.1.0 - How To

360 views
Skip to first unread message

nuno.go...@g.xarevision.pt

unread,
Oct 18, 2017, 11:38:40 AM10/18/17
to actionml-user
Hi all,

Pre-requisites:
 - Scala 2.11.8
 - Spark-2.1.0
 - ElasticSearch-1.7.5
 - Git
 - mvn (maven)
 - sbt


Steps:
 1) Clone  PredictionIO ; cd to folder and run: ./make-distribution.sh -Dscala.version=2.11.8 -Dspark.version=2.1.0 -Delasticsearch.version=1.7.5
 2) Clone  Mahout (https://github.com/apache/mahout) becase there is no mahout to scala 2.11 on maven, but lucky for us, someone added as POM profile; cd to folder: mvn -Pscala-2.11 -Pspark-2.1 clean install -DskipTests
 3) Clone  Uniiversal Recommender; cd to folder; create lib folder; copy mahout jars created at step 2 to  UR lib folder;
 4) On same folder, mkdir  -p src/main/scala/com/actionml/; cp scr/main/scala/* src/main/scala/com/actionml # Move files to STANDART java structure.
 5) Open build.sbt and paste

import scalariform.formatter.preferences._
import com.typesafe.sbt.SbtScalariform
import com.typesafe.sbt.SbtScalariform.ScalariformKeys

name := "universal-recommender"

version := "0.6.0"

scalaVersion := "2.11.8"

//autoScalaLibrary := false

organization := "com.actionml"

val mahoutVersion = "0.13.1"

val pioVersion = "0.12.0-incubating"

val elasticsearch1Version = "1.7.5"

//val elasticsearch5Version = "5.1.2"

unmanagedJars in Compile ++= 
  (file("file:///ur_folder/lib/") * "*.jar").classpath

libraryDependencies ++= Seq(
  "org.apache.predictionio" % "apache-predictionio-core_2.11" % "0.12.0-incubating" % "provided" from "file:///ur_folder/lib/pio-assembly-0.12.0-incubating.jar",
  "org.apache.predictionio" % "apache-predictionio-data-elasticsearch1_2.11" % "0.11.0-incubating" % "provided",
  "org.apache.spark" %% "spark-core" % "2.1.0" % "provided",
  "org.apache.spark" %% "spark-mllib" % "2.1.0" % "provided",
  "org.xerial.snappy" % "snappy-java" % "1.1.1.7",
  // Mahout's Spark libs
  "it.unimi.dsi" % "fastutil" % "7.0.12",
  "org.apache.commons" % "commons-math3" % "3.2",
  "com.tdunning" % "t-digest" % "3.1",
  "org.apache.mahout" % "mahout-math" % mahoutVersion from "file:///ur_folder/lib/libs/mahout-math-0.13.1-SNAPSHOT.jar",
  "org.apache.mahout" % "mahout-math-scala_2.11" % mahoutVersion from "file:///ur_folder/lib/mahout-math-scala_2.11-0.13.1-SNAPSHOT.jar",
  "org.apache.mahout" % "mahout-spark_2.11" % mahoutVersion
    exclude("org.apache.spark", "spark-core_2.11") from "file:///ur_folder/lib/mahout-spark_2.11-0.13.1-SNAPSHOT-spark_2.1.jar",
  "org.apache.mahout"  % "mahout-math" % mahoutVersion from "file:///ur_folder/lib/mahout-math-0.13.1-SNAPSHOT.jar",
  "org.apache.mahout"  % "mahout-hdfs" % mahoutVersion
    exclude("com.thoughtworks.xstream", "xstream")
    exclude("org.apache.hadoop", "hadoop-client") from "file:///ur_folder/lib/mahout-hdfs-0.13.1-SNAPSHOT.jar",
  //"org.apache.hbase"        % "hbase-client"   % "0.98.5-hadoop2" % "provided",
  //  exclude("org.apache.zookeeper", "zookeeper"),
  // other external libs
  "com.thoughtworks.xstream" % "xstream" % "1.4.4"
    exclude("xmlpull", "xmlpull"),
  // possible build for es5 
  //"org.elasticsearch"       %% "elasticsearch-spark-13" % elasticsearch5Version % "provided",
  "org.elasticsearch" % "elasticsearch" % "1.7.5",
  "org.elasticsearch" % "elasticsearch-spark-20_2.11" % "5.4.1"
    exclude("org.apache.spark", "spark-core_2.11"),
  "org.json4s" %% "json4s-native" % "3.2.10",
  "org.apache.lucene" % "lucene-core" % "4.10.4" )

resolvers += Resolver.mavenLocal

SbtScalariform.scalariformSettings

libraryDependencies ~= { _ map {
  case m if m.organization == "org.elasticsearch" =>
    m.exclude("org.elasticsearch.hadoop", "org.elasticsearch.hadoop")
  case m => m
}}


ScalariformKeys.preferences := ScalariformKeys.preferences.value
  .setPreference(AlignSingleLineCaseStatements, true)
  .setPreference(DoubleIndentClassDeclaration, true)
  .setPreference(DanglingCloseParenthesis, Prevent)
  .setPreference(MultilineScaladocCommentsStartOnFirstLine, true)

assemblyMergeStrategy in assembly := {
 case PathList("apache", "lucene-core", "util") => MergeStrategy.concat
 case PathList("META-INF", xs @ _*) => MergeStrategy.discard
 case x => MergeStrategy.first
}
  

6) Update in project/plugins.sbt to

resolvers += Resolver.typesafeRepo("releases")

addSbtPlugin("org.scalariform" % "sbt-scalariform" % "1.8.1")

addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "1.0.0")

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")

7) Open src/main/scala/com/actionml/EsClient and edit line 85  to 

private lazy val client = {
    val settings = org.elasticsearch.common.settings.ImmutableSettings.settingsBuilder().put("cluster.name", "elasticsearch").build()

    new org.elasticsearch.client.transport.TransportClient(settings).addTransportAddress(new org.elasticsearch.common.transport.InetSocketTransportAddress("localhost", 9300))
  }

// PredictionIO NOT SUPPORTS ELASTIC

8 ) now you can run UR


Hope to be useful, any trouble, just ask me :) 



Pat Ferrel

unread,
Oct 18, 2017, 11:58:40 AM10/18/17
to nuno.go...@g.xarevision.pt, actionml-user
Thanks, much appreciated. A PR would be appreciated, make it against the develop branch. Note that this hardcodes the ES URI. Tthere is a better way to get this fro the PIO config, which allows clustered ES. 

Our plans are to leave pio-0.11.0 support with the UR v0.6.x line and not update it for PIO 0.12.0.

The next UR will be v0.7.0, which will come with pio-0.12.0 support AND ES 5.x. This requires a minor code change because ES drops the TransportClient and does everything through the REST client. This should be testable by the end of this week. I guess the question for users is; do you need to upgrade to 0.12.0 WITHOUT upgrading the service dependencies? If you are going to upgrade service dependencies, why not use the faster ES 5.x?

the UR v0.7.0 will require:
 - Scala 2.11.8
 - Spark-2.2.x <— the latest stable
 - HDFS-tp to 2.8.1 <— the latest stable
 - ElasticSearch-5.x (6.x soon, which is in RC1 stage now)
 - Git
 - Maven since Scala 2.11. will require a custom build of Mahout, which is undergoing a release now for Scala and Spark upgrades 
 - sbt



--
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/actionml-user/19ec0c4d-3661-415f-8ef5-022e3f8cdb97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Pat Ferrel

unread,
Oct 19, 2017, 10:46:43 AM10/19/17
to bernhard...@maxodus.net, actionml-user
Not when it is ready to test. Please wait for an announcement.


On Oct 19, 2017, at 4:27 AM, bernhard...@maxodus.net wrote:

Hi Pat, 

Perfect, so this change will be in the 0.7.0-SNAPSHOT branch?

Cheers, 

Bernhard

nuno.go...@g.xarevision.pt

unread,
Oct 20, 2017, 4:49:54 AM10/20/17
to actionml-user
Hi,
 
Until today, i will PR an non intrusive way to lead with the different denpendencies.


I hope to be useful anyway  :) 

 

 

mahshid khatami

unread,
Nov 29, 2017, 3:10:17 AM11/29/17
to actionml-user
thank you very much for your help , i do what you said and i get this error : (attached the post)
i have checked the directory and i found i have not jar file mahout-math-scala_2.11-0.13.1-SNAPSHOT.jar , instead i have mahout-math-scala_2.10-0.13.1-SNAPSHOT.jar

what should i do ? 
thank you very much again :)
mahout.png

Pat Ferrel

unread,
Nov 29, 2017, 1:44:57 PM11/29/17
to mahshid khatami, actionml-user
You should be following the instructions in the UR 0.7.0-SNAPSHOT README.md. It says to build a version of Mahout 0.13.0 from an ActionML repo. So your step #2 is wrong. There is also a script to create a special local repo of the code so it will run with the UR build in the branch.

Sorry for the confusion but Mahout was not building correctly for Scala 2.11 in the Apache repo, that has now been fixed but the instructions above are safest to use until the release of Mahout some time next week. At that point lots of docs will be updated including Mahout, the ActionML site, the UR readme etc.


-- 
You received this message because you are subscribed to the Google Groups "actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionml-use...@googlegroups.com.
To post to this group, send email to action...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages