--
You received this message because you are subscribed to the Google Groups "spark-notebook-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-notebook-...@googlegroups.com.
To post to this group, send email to spark-not...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spark-notebook-user/FF25CAA07602924FA1D3D78C549C3A0D6D3648%40MSGEXOXM1115.ent.wfb.bank.corp.
For more options, visit https://groups.google.com/d/optout.
Hello, thank you for your response.
It seems it was wrongly compiled, now I tried again by running
sbt -D"spark.version"="1.4.0" -> dist -> unzip
output ---> spark-notebook-0.6.0-scala-2.10.4-spark-1.4.0-hadoop-1.0.4/
(I Dont have scala nor hadoop in the system).
now it seems to be working, although throwing the following error, I tried pintln(‘hi’) successfully. Next step is trying local classpath with some jars.
[DEBUG] [06/17/2015 15:35:12.809] [main] [EventStream(akka://Remote)] Default Loggers started
java.lang.ExceptionInInitializerError
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at notebook.kernel.Repl.liftedTree1$1(Repl.scala:206)
at notebook.kernel.Repl.evaluate(Repl.scala:203)
at notebook.client.ReplCalculator.notebook$client$ReplCalculator$$eval$1(ReplCalculator.scala:384)
at notebook.client.ReplCalculator$$anonfun$preStartLogic$2.apply(ReplCalculator.scala:399)
at notebook.client.ReplCalculator$$anonfun$preStartLogic$2.apply(ReplCalculator.scala:397)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at notebook.client.ReplCalculator.preStartLogic(ReplCalculator.scala:397)
at notebook.client.ReplCalculator.preStart(ReplCalculator.scala:404)
at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)
at notebook.client.ReplCalculator.aroundPreStart(ReplCalculator.scala:20)
at akka.actor.ActorCell.create(ActorCell.scala:580)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.IllegalArgumentException: requirement failed: size=0 and step=0, but both must be positive
at scala.collection.Iterator$GroupedIterator.<init>(Iterator.scala:866)
at scala.collection.Iterator$class.grouped(Iterator.scala:1000)
at scala.collection.AbstractIterator.grouped(Iterator.scala:1157)
at scala.collection.IterableLike$class.grouped(IterableLike.scala:158)
at scala.collection.AbstractIterable.grouped(Iterable.scala:54)
at notebook.front.widgets.package$.table(package.scala:176)
at notebook.front.widgets.package$.layout(package.scala:170)
at notebook.front.LowPriorityRenderers$arrayAsTable$.render(Renderer.scala:69)
at notebook.front.Widget$.fromRenderer(Widget.scala:32)
at $line5.$rendered$$iwC$$iwC.<init>(<console>:7)
at $line5.$rendered$$iwC.<init>(<console>:9)
at $line5.$rendered$.<init>(<console>:11)
at $line5.$rendered$.<clinit>(<console>)
... 24 more
Thank you very much Andy, allow me to proceed through email since I don’t have access to IMs.
My current environment has the following setup:
I have created a folder with a couple of jars I would like to use with Spark.
$ pwd
/tmp/repository/spark_repo
$ ls -l | head -n 4
total 1056
drwxr-xr-x 3 cmor cmorgrp 4096 Jun 16 13:10 classes
-rw-r--r-- 1 cmor cmorgrp 902746 Jun 17 12:08 noarch-aster-jdbc-driver.jar
-rw-r--r-- 1 cmor cmorgrp 85072 Jun 16 15:59 spark-csv_2.11-1.1.0.jar
In this case, this spark-csv jar, has the following use:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("cars.csv")
Currently, im trying to set this path into "customLocalRepo" : "/tmp/repository/spark_repo/", with no success:
java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
Can you please advice on how should we proceed in order to have jar (or if needed for better behavior, as packages) and be able to use these modules in Spark notebook?
Thank you,
Saif
no problemo!
There are several ways to do that, but my preference goes to use a local maven repo for instance to do that, I’ll explain it after.
The first way to include a jar is to use the :cp
context that will (if you use the last version) update the notebook classpath but also will include it into the spark.jars conf, and thus spark will ship it to workers — as usual.
The other way to do that is to use a local repository where the deps have been beforehand installed in.
To install it using maven:
mvn install:install-file -Dfile=spark-csv_2.11-1.1.0.jar -DgroupId=org.apache.spark -DartifactId=spark-csv_2.11 -Dversion=1.1.0 -Dpackaging=jar
This will install that in the local /home/<you>/.m2/repository
.
Since the customLocalRepo
uses maven in scala-2.11 you can add m2 as the local path so you’ll do:
customLocalRepo: "/home/<user>/.m2/repository",
customDeps: [
"org.apache.spark % spark-csv_2.11 % 1.1.0"
]
If you use scala-2.10, it’ll be a different story ;-).
HTH ^^
andy
at notebook.client.ReplCalculator.notebook$client$ReplCalculator$eval$1(ReplCalculator.scala:384)
at notebook.client.ReplCalculator$anonfun$preStartLogic$2.apply(ReplCalculator.scala:399)
at notebook.client.ReplCalculator$anonfun$preStartLogic$2.apply(ReplCalculator.scala:397)
at scala.collection.TraversableLike$WithFilter$anonfun$foreach$1.apply(TraversableLike.scala:772)
at $line5.$rendered$iwC$iwC.<init>(<console>:7)
at $line5.$rendered$iwC.<init>(<console>:9)
Thank you very much Andy again!.
I would leave the maven approach for later, for now, I would like to focus in the :cp approach, which is the one I’ve been meddling with.
The problem I was having with the manual classpath-jar approach, is that SBT would not definetly compile the package dependencies, so for example, spark-csv relies on org.apache.commons.csv.*
By simply running sbt compile package, and next, loading the jar, I would get the following error
:cp /opt/cmor/repository/spark_repo/spark-csv_2.11-1.1.0.jar
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("cars.csv")
>>> java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
So, I would prefer to have a clean and beautiful way to deal with dependencies, without (for now) falling into definite repositories.
The workaround I found was to manually run sbt assemblyPackageDependency to manually mount a dependency package, per defined in the build.sbt into an individual .jar file, and next, using :cp again, this time, working succesfully.
Do you think there could be a happier way? will the maven approach solve these kind of issues?
Thank you!
Okay I see
Actually when using Scala 2.10 you'll can use your ivy local repo where you can publishLocal your project and it'll use all downloaded deps from there
So yes if you need deps, using a repo will definitively easier
My2c of course 😆
Great!
I am curious now since you repeatedly mention the switchi between scala 2.10 and 2.11, I might have missed an important point here, since I don’t have scala explicitly installed in my environment and simply using Spark 1.4.0, are you meaning that Spark-Notebook bundles both scala 2.10 and 2.11 with the possibility to use any (being specified in sbt)?
Thank you!
And that would be it for now :-)
Nevermind me. Last one was the silly question of the day. It is the scala version my Spark uses.
Saif
From: Ellafi, Saif A.
Sent: Wednesday, June 17, 2015 6:59 PM
To: 'andy petrella'; spark-not...@googlegroups.com
Cc: Liu, Weicheng
Subject: RE: Please help running spark notebook
Great!
I am curious now since you repeatedly mention the switchi between scala 2.10 and 2.11, I might have missed an important point here, since I don’t have scala explicitly installed in my environment and simply using Spark 1.4.0, are you meaning that Spark-Notebook bundles both scala 2.10 and 2.11 with the possibility to use any (being specified in sbt)?
Thank you!
And that would be it for now :-)
Saif