Ooops, exception in the cell: java.lang.NoSuchMethodException: org.apache.spark.io.SnappyCompression

698 views
Skip to first unread message

Henrik Behrens

unread,
Jan 21, 2016, 4:39:14 AM1/21/16
to spark-notebook-user
Hi,

I'm using Spark Notebook with the default configuration (local mode).

When I display a DataFrame that has been loaded from nontemporary a Hive Table, the following exception is thrown and the DataFrame is not rendered as a table:

java.lang.ExceptionInInitializerError
    at sun.misc.Unsafe.ensureClassInitialized(Native Method)
    at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43)
    at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:142)
    at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1088)
    at java.lang.reflect.Field.getFieldAccessor(Field.java:1069)
    at java.lang.reflect.Field.get(Field.java:393)
    at notebook.kernel.Repl.getModule$1(Repl.scala:203)
    at notebook.kernel.Repl.iws$1(Repl.scala:212)
    at notebook.kernel.Repl.liftedTree1$1(Repl.scala:219)
    at notebook.kernel.Repl.evaluate(Repl.scala:199)
    at notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:378)
    at notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:375)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.NoSuchMethodException: org.apache.spark.io.SnappyCompressionCodec.<init>(org.apache.spark.SparkConf)
    at java.lang.Class.getConstructor0(Class.java:3082)
    at java.lang.Class.getConstructor(Class.java:1825)
    at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:71)
    at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:65)
    at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73)
    at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:80)
    at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
    at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63)
    at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1326)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:108)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    at org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:113)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    at org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:113)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    at org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:113)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    at org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:113)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
    at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
    at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
    at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
    at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
    at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
    at org.apache.spark.sql.DataFrame.toJSON(DataFrame.scala:1724)
    at notebook.front.widgets.DataFrameView$class.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:40)
    at notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json$lzycompute(DataFrame.scala:64)
    at notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:64)
    at notebook.front.widgets.DataFrameView$class.$init$(DataFrame.scala:41)
    at notebook.front.widgets.DataFrameWidget.<init>(DataFrame.scala:69)
    at notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:13)
    at notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:12)
    at notebook.front.Widget$.fromRenderer(Widget.scala:32)
    at $line17.$rendered$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$.<init>(<console>:49)
    at $line17.$rendered$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$.<clinit>(<console>)
    ... 20 more
[debug] application - Termination of op calculator

This happens for all Hive tables (Parquet or JSON format), written using saveAsTable() and read using sqlContext.sql("select * from Person_AdressType").
It does not happen for DataFrames coming from a JDBC source.

In order to use Hive Tables, I added a hive-site.xml to the conf folder and set export HADOOP_CONF_DIR=./conf.
I do not use Hadoop.

Workaround: Use some action to read the DataFrame into Memory and display the table afterwards:
 
val x = sqlContext.sql("select * from Person_AddressType").cache
val c = x.count
x



andy petrella

unread,
Jan 21, 2016, 6:32:05 AM1/21/16
to Henrik Behrens, spark-notebook-user
Hello Henrik,

Thanks for sharing this, the solution is pretty interesting, why would cache work differently... maybe due to some code gen? dunno

The thing is that it might be related to gzip mess, don't this issue looks the same for you? https://github.com/andypetrella/spark-notebook/issues/380

LMK
andy


--
You received this message because you are subscribed to the Google Groups "spark-notebook-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-notebook-...@googlegroups.com.
To post to this group, send email to spark-not...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spark-notebook-user/b4f4c44e-98bc-4eb0-80d9-7f7c2c92983e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
andy

andy petrella

unread,
Jan 21, 2016, 6:35:24 AM1/21/16
to Henrik Behrens, spark-notebook-user
Damn... sorry Henrik, just figured out the last comment was from you :-)
--
andy

Henrik Behrens

unread,
Jan 21, 2016, 6:40:33 AM1/21/16
to spark-notebook-user, henrik....@shs-viveon.com
Maybe it has nothing to do with gzip, because I am having the problem using Parquet and JSON formats, and in issue 380 someone reported it using CSV.

Henrik

andy petrella

unread,
Jan 21, 2016, 6:45:36 AM1/21/16
to Henrik Behrens, spark-notebook-user
indeed, but the snappy codec constructor is not found... which is weird because it's in the spark core lib :-S

--
You received this message because you are subscribed to the Google Groups "spark-notebook-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-notebook-...@googlegroups.com.
To post to this group, send email to spark-not...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
andy

andy petrella

unread,
Jan 21, 2016, 6:47:38 AM1/21/16
to Henrik Behrens, spark-notebook-user
also, which versions are you using? scala, spark, hadoop and notebook (with hive I guess and parquet enabled)

On Thu, Jan 21, 2016 at 10:39 AM Henrik Behrens <henrik....@shs-viveon.com> wrote:
--
You received this message because you are subscribed to the Google Groups "spark-notebook-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-notebook-...@googlegroups.com.
To post to this group, send email to spark-not...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
andy

Henrik Behrens

unread,
Jan 21, 2016, 8:48:17 AM1/21/16
to spark-notebook-user, henrik....@shs-viveon.com
Version ist spark-notebook-0.6.2-scala-2.11.7-spark-1.6.0-hadoop-2.7.1-with-hive-with-parquet on CentOS 6

Henrik Behrens

unread,
Jan 21, 2016, 8:55:24 AM1/21/16
to spark-notebook-user, henrik....@shs-viveon.com
The snappy codec constructor can be called from the notebook successfully.
Can you reproduce the problem using permament Hive Tables?

Dean Wampler

unread,
Jan 21, 2016, 9:07:22 AM1/21/16
to Henrik Behrens, spark-notebook-user
I also see this exception trying to use Parquet  in Spark Notebook 0.6.2, Scala 2.11.7 Spark 1.5.2, with Hadoop 2.2.0, Hive, and Parquet

dean

--
You received this message because you are subscribed to the Google Groups "spark-notebook-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spark-notebook-...@googlegroups.com.
To post to this group, send email to spark-not...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Dean Wampler, Ph.D.


andy petrella

unread,
Jan 21, 2016, 9:37:35 AM1/21/16
to Dean Wampler, Henrik Behrens, spark-notebook-user
Yeah, I know that's why I try to gather more info, it's a really strange one actually. Since, like Hernik pointed out we can instantiate the codec.
It looks more like a bytecode clash, actually, the SparkConf classes might not be the same. Probably something to do either with multiple deps on the CP or different CPs.
Weirdo


For more options, visit https://groups.google.com/d/optout.
--
andy

Dean Wampler

unread,
Jan 21, 2016, 9:50:22 AM1/21/16
to andy petrella, Henrik Behrens, spark-notebook-user
For what it's worth, I just tried my same notebook using the Hadoop 2.7.1 version, everything else the same, and I get the same exception.

andy petrella

unread,
Jan 21, 2016, 9:54:43 AM1/21/16
to Dean Wampler, Henrik Behrens, spark-notebook-user
Thanks that might help.
I'll try one or two things too. The thing is that the codec class is in spark-core, and this constructor taking the SparkConf as parameter is there for a while already :-S. So, I'm puzzled.

Might be worth trying with scala 2.10 actually (?)

--
andy

Dean Wampler

unread,
Jan 21, 2016, 10:16:29 AM1/21/16
to andy petrella, Henrik Behrens, spark-notebook-user
I just tried using a 2.10 notebook and the exception isn't thrown.

Spark Notebook 0.6.2, Scala 2.10.4 Spark 1.5.2, with Hadoop 2.7.1, Hive, and Parquet

andy petrella

unread,
Jan 21, 2016, 10:41:30 AM1/21/16
to Dean Wampler, Henrik Behrens, spark-notebook-user
Okay, I think this has to deal with implicit injection of the default widget for the dataframe (why the heck I do not know... yet)

But "funny" thing:
if you try
```scala
val df = ??? // whatever here that is throwing an exception
df // boom
```
You can execute it several times, you'll keep having the exception.

Now, change this cell into
```scala
val df = ??? // whatever here that is throwing an exception
//df
new DataFrameWidget(df, 25, "consoleDir")
```
execute it, it works.
Now change it back to the original:
```scala
val df = ??? // whatever here that is throwing an exception
df // WORKS !!!!
```
And it's gonna work, all the time... pretty weirdo!

So there is a workaround that will work, but not great, I'll try to nail down to the real issue...
Meanwhile, I hope this will be good enough :-S.

So, the issue https://github.com/andypetrella/spark-notebook/issues/380 is actually presenting the same crap, and I guess it's not related to the format (csv or whatever) but the way to load data.

Note: I'll paste this in the ticket, for the record!

LMKWYT,
andy


--
andy
Reply all
Reply to author
Forward
0 new messages