Please help running spark notebook

678 views
Skip to first unread message

Saif.A...@wellsfargo.com

unread,
Jun 17, 2015, 3:26:21 PM6/17/15
to spark-not...@googlegroups.com
Hello,
 
I am trying to execute the spark-notebook server, and it launches successfully, but as soon as I create a new Scala Spark Hadoop Notebook, I get the following errors:
 
java.lang.NoSuchMethodError: scala.tools.nsc.Global.classPath()Lscala/tools/nsc/util/ClassFileLookup;
...
[error] a.a.OneForOneStrategy - Futures timed out after [10 seconds]
akka.actor.ActorInitializationException: exception during creation
...
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
 
I feel there is something with the classpaths?
Importing scala.tools.nsc.Global in spark-shell works fine.
 
Thank you
Saif
 

andy petrella

unread,
Jun 17, 2015, 4:35:52 PM6/17/15
to Saif.A...@wellsfargo.com, spark-not...@googlegroups.com
Hello,

That sounds indeed pretty weirdo oO.

Can you tell me which versions you're using and how you got/built it? So I can try on my side too.

Sorry about that :-/

Cheers
andy

--
You received this message because you are subscribed to the Google Groups "spark-notebook-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-notebook-...@googlegroups.com.
To post to this group, send email to spark-not...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spark-notebook-user/FF25CAA07602924FA1D3D78C549C3A0D6D3648%40MSGEXOXM1115.ent.wfb.bank.corp.
For more options, visit https://groups.google.com/d/optout.

Saif.A...@wellsfargo.com

unread,
Jun 17, 2015, 4:39:26 PM6/17/15
to andy.p...@gmail.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com

Hello, thank you for your response.

 

It seems it was wrongly compiled, now I tried again by running  

 

sbt -D"spark.version"="1.4.0" -> dist -> unzip

output ---> spark-notebook-0.6.0-scala-2.10.4-spark-1.4.0-hadoop-1.0.4/

(I Dont have scala nor hadoop in the system).

 

now it seems to be working, although throwing the following error, I tried pintln(‘hi’) successfully. Next step is trying  local classpath with some jars.

 

[DEBUG] [06/17/2015 15:35:12.809] [main] [EventStream(akka://Remote)] Default Loggers started

java.lang.ExceptionInInitializerError

        at java.lang.Class.forName0(Native Method)

        at java.lang.Class.forName(Class.java:348)

        at notebook.kernel.Repl.liftedTree1$1(Repl.scala:206)

        at notebook.kernel.Repl.evaluate(Repl.scala:203)

        at notebook.client.ReplCalculator.notebook$client$ReplCalculator$$eval$1(ReplCalculator.scala:384)

        at notebook.client.ReplCalculator$$anonfun$preStartLogic$2.apply(ReplCalculator.scala:399)

        at notebook.client.ReplCalculator$$anonfun$preStartLogic$2.apply(ReplCalculator.scala:397)

        at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)

        at scala.collection.immutable.List.foreach(List.scala:318)

        at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)

        at notebook.client.ReplCalculator.preStartLogic(ReplCalculator.scala:397)

        at notebook.client.ReplCalculator.preStart(ReplCalculator.scala:404)

        at akka.actor.Actor$class.aroundPreStart(Actor.scala:470)

        at notebook.client.ReplCalculator.aroundPreStart(ReplCalculator.scala:20)

        at akka.actor.ActorCell.create(ActorCell.scala:580)

        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)

        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)

        at akka.dispatch.Mailbox.run(Mailbox.scala:219)

        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)

        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by: java.lang.IllegalArgumentException: requirement failed: size=0 and step=0, but both must be positive

        at scala.collection.Iterator$GroupedIterator.<init>(Iterator.scala:866)

        at scala.collection.Iterator$class.grouped(Iterator.scala:1000)

        at scala.collection.AbstractIterator.grouped(Iterator.scala:1157)

        at scala.collection.IterableLike$class.grouped(IterableLike.scala:158)

        at scala.collection.AbstractIterable.grouped(Iterable.scala:54)

        at notebook.front.widgets.package$.table(package.scala:176)

        at notebook.front.widgets.package$.layout(package.scala:170)

        at notebook.front.LowPriorityRenderers$arrayAsTable$.render(Renderer.scala:69)

        at notebook.front.Widget$.fromRenderer(Widget.scala:32)

        at $line5.$rendered$$iwC$$iwC.<init>(<console>:7)

        at $line5.$rendered$$iwC.<init>(<console>:9)

        at $line5.$rendered$.<init>(<console>:11)

        at $line5.$rendered$.<clinit>(<console>)

        ... 24 more

andy petrella

unread,
Jun 17, 2015, 4:53:06 PM6/17/15
to Saif.A...@wellsfargo.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com
oh that's fine thus, cool

yeah I saw that exception, I'll take a look at it, but AFAIK it doesn't hurt.... I mean only the eyes in the console (which is enough pain to me, though ^^).

For local classpath, you should abuse the use of the custom metadata (or in application.conf) or `:cp` or `:dp` contexts

come on gitter https://gitter.im/andypetrella/spark-notebook if you want some additional support, it's probably faster than via mails :-D

cheers and have fun

andy



Saif.A...@wellsfargo.com

unread,
Jun 17, 2015, 5:01:52 PM6/17/15
to andy.p...@gmail.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com

Thank you very much Andy, allow me to proceed through email since I don’t have access to IMs.

 

My current environment has the following setup:

I have created a folder with a couple of jars I would like to use with Spark.

$ pwd

/tmp/repository/spark_repo

$ ls -l | head -n 4

total 1056

drwxr-xr-x 3 cmor cmorgrp   4096 Jun 16 13:10 classes

-rw-r--r-- 1 cmor cmorgrp 902746 Jun 17 12:08 noarch-aster-jdbc-driver.jar

-rw-r--r-- 1 cmor cmorgrp  85072 Jun 16 15:59 spark-csv_2.11-1.1.0.jar

 

In this case, this spark-csv jar, has the following use:

 

import org.apache.spark.sql.SQLContext

 

val sqlContext = new SQLContext(sc)

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("cars.csv")

 

Currently, im trying to set this path into "customLocalRepo" : "/tmp/repository/spark_repo/", with no success:

 

java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv

        at scala.sys.package$.error(package.scala:27)

        at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:216)

        at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:229)

        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)

        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)

 

 

Can you please advice on how should we proceed in order to have jar (or if needed for better behavior, as packages) and be able to use these modules in Spark notebook?

Thank you,

Saif

andy petrella

unread,
Jun 17, 2015, 5:23:25 PM6/17/15
to Saif.A...@wellsfargo.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com

no problemo!

There are several ways to do that, but my preference goes to use a local maven repo for instance to do that, I’ll explain it after.

The first way to include a jar is to use the :cp context that will (if you use the last version) update the notebook classpath but also will include it into the spark.jars conf, and thus spark will ship it to workers — as usual.

The other way to do that is to use a local repository where the deps have been beforehand installed in.

To install it using maven:

mvn install:install-file -Dfile=spark-csv_2.11-1.1.0.jar -DgroupId=org.apache.spark -DartifactId=spark-csv_2.11 -Dversion=1.1.0 -Dpackaging=jar

This will install that in the local /home/<you>/.m2/repository.

Since the customLocalRepo uses maven in scala-2.11 you can add m2 as the local path so you’ll do:

customLocalRepo: "/home/<user>/.m2/repository",
customDeps: [
  "org.apache.spark % spark-csv_2.11 % 1.1.0"
]

If you use scala-2.10, it’ll be a different story ;-).

HTH ^^

andy

        at notebook.client.ReplCalculator.notebook$client$ReplCalculator$eval$1(ReplCalculator.scala:384)

        at notebook.client.ReplCalculator$anonfun$preStartLogic$2.apply(ReplCalculator.scala:399)

        at notebook.client.ReplCalculator$anonfun$preStartLogic$2.apply(ReplCalculator.scala:397)

        at scala.collection.TraversableLike$WithFilter$anonfun$foreach$1.apply(TraversableLike.scala:772)

        at $line5.$rendered$iwC$iwC.<init>(<console>:7)

        at $line5.$rendered$iwC.<init>(<console>:9)

Saif.A...@wellsfargo.com

unread,
Jun 17, 2015, 5:47:55 PM6/17/15
to andy.p...@gmail.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com

Thank you very much Andy again!.

 

I would leave the maven approach for later, for now, I would like to focus in the :cp approach, which is the one I’ve been meddling with.

 

The problem I was having with the manual classpath-jar approach, is that SBT would not definetly compile the package dependencies, so for example, spark-csv relies on org.apache.commons.csv.*

By simply running sbt compile package, and next, loading the jar, I would get the following error

 

:cp /opt/cmor/repository/spark_repo/spark-csv_2.11-1.1.0.jar

 

import org.apache.spark.sql.SQLContext

 

val sqlContext = new SQLContext(sc)

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("cars.csv")

 

>>> java.lang.NoClassDefFoundError: org/apache/commons/csv/CSVFormat
 

So, I would prefer to have a clean and beautiful way to deal with dependencies, without (for now) falling into definite repositories.

 

The workaround I found was to manually run sbt assemblyPackageDependency to manually mount a dependency package, per defined in the build.sbt into an individual .jar file, and next, using :cp again, this time, working succesfully.

 

Do you think there could be a happier way? will the maven approach solve these kind of issues?

 

Thank you!

andy petrella

unread,
Jun 17, 2015, 5:54:46 PM6/17/15
to Saif.A...@wellsfargo.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com

Okay I see
Actually when using Scala 2.10 you'll can use your ivy local repo where you can publishLocal your project and it'll use all downloaded deps from there

So yes if you need deps, using a repo will definitively easier

My2c of course 😆

Saif.A...@wellsfargo.com

unread,
Jun 17, 2015, 5:59:28 PM6/17/15
to andy.p...@gmail.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com

Great!

 

I am curious now since you repeatedly mention the switchi between scala 2.10 and 2.11, I might have missed an important point here, since I don’t have scala explicitly installed in my environment and simply using Spark 1.4.0, are you meaning that Spark-Notebook bundles both scala 2.10 and 2.11 with the possibility to use any (being specified in sbt)?

 

Thank you!

And that would be it for now :-)

Saif.A...@wellsfargo.com

unread,
Jun 17, 2015, 6:15:43 PM6/17/15
to andy.p...@gmail.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com

Nevermind me. Last one was the silly question of the day. It is the scala version my Spark uses.

 

Saif

 

From: Ellafi, Saif A.

Sent: Wednesday, June 17, 2015 6:59 PM
To: 'andy petrella'; spark-not...@googlegroups.com
Cc: Liu, Weicheng

Subject: RE: Please help running spark notebook

 

Great!

 

I am curious now since you repeatedly mention the switchi between scala 2.10 and 2.11, I might have missed an important point here, since I don’t have scala explicitly installed in my environment and simply using Spark 1.4.0, are you meaning that Spark-Notebook bundles both scala 2.10 and 2.11 with the possibility to use any (being specified in sbt)?

 

Thank you!

And that would be it for now :-)

Saif

andy petrella

unread,
Jun 18, 2015, 4:05:29 AM6/18/15
to Saif.A...@wellsfargo.com, spark-not...@googlegroups.com, Weiche...@wellsfargo.com
yup no problemo, actually I'll probably introduce the feature to the spark-notebook that the server will use its own scala version and the notebook can declare whichever it wants to use... but I'm not there yet :-D
Reply all
Reply to author
Forward
0 new messages