I got this error "User class threw exception: java.lang.NoClassDefFoundError: com/google/protobuf/CodedInputStream$" when I run on AWS EMR Spark:
Release label:emr-6.0.0
Hadoop distribution:Amazon 3.2.1
Applications:Spark 2.4.4, Hive 3.1.2
My configuration runs fine locally:
scalaVersion := "2.12.10"
build.sbt:
libraryDependencies ++= Seq(
...
// ScalaPB with SparkSQL
"com.thesamet.scalapb" %% "sparksql-scalapb" % "0.9.0"
)
// Hadoop contains an old protobuf runtime that is not binary compatible
// with 3.0.0. We shared ours to prevent runtime issues.
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.protobuf.**" -> "shadeproto.@1").inAll,
ShadeRule.rename("scala.collection.compat.**" -> "shadecompat.@1").inAll
)
PB.targets in Compile := Seq(
scalapb.gen() -> (sourceManaged in Compile).value
)
assembly.sbt:
build.properties:
sbt.version = 1.3.13
scalapb.sbt:
addSbtPlugin("com.thesamet" % "sbt-protoc" % "0.99.27")
libraryDependencies += "com.thesamet.scalapb" %% "compilerplugin" % "0.9.6"
Class:
sbt: com.google.protobuf:protobuf-java:3.8.0:jar
I tried copying the protobuf-java:3.8.0:jar to /home/hadoop/extrajars folder and reference it in the spark default config:
- sudo vim /etc/spark/conf/spark-defaults.conf
spark.driver.extraClassPath :/home/hadoop/extrajars/*
spark.executor.extraClassPath :/home/hadoop/extrajars/*
I also reference it in the spark-submit:
--driver-library-path /home/hadoop/extrajars/postgresql-42.2.11.jar,/home/hadoop/extrajars/spark-sql-kafka-0-10_2.12-2.4.4.jar,/home/hadoop/extrajars/spark-streaming-kafka-0-10_2.12-2.4.4.jar,/home/hadoop/extrajars/jdbc-4.50.3.jar,/home/hadoop/extrajars/config-1.4.0.jar,/home/hadoop/extrajars/scala-logging_2.13-3.9.2.jar,/home/hadoop/extrajars/sparksql-scalapb_2.12-0.9.0.jar,/home/hadoop/extrajars/frameless-dataset_2.12-0.8.0.jar,/home/hadoop/extrajars/frameless-core_2.12-0.8.0.jar,/home/hadoop/extrajars/scalapb-runtime_sjs0.6_2.12-0.9.6.jar,/home/hadoop/extrajars/lenses_sjs0.6_2.12-0.9.6.jar,/home/hadoop/extrajars/protobuf-java-3.8.0.jar \
--driver-class-path /home/hadoop/extrajars/postgresql-42.2.11.jar,/home/hadoop/extrajars/spark-sql-kafka-0-10_2.12-2.4.4.jar,/home/hadoop/extrajars/spark-streaming-kafka-0-10_2.12-2.4.4.jar,/home/hadoop/extrajars/jdbc-4.50.3.jar,/home/hadoop/extrajars/config-1.4.0.jar,/home/hadoop/extrajars/scala-logging_2.13-3.9.2.jar,/home/hadoop/extrajars/sparksql-scalapb_2.12-0.9.0.jar,/home/hadoop/extrajars/frameless-dataset_2.12-0.8.0.jar,/home/hadoop/extrajars/frameless-core_2.12-0.8.0.jar,/home/hadoop/extrajars/scalapb-runtime_sjs0.6_2.12-0.9.6.jar,/home/hadoop/extrajars/lenses_sjs0.6_2.12-0.9.6.jar,/home/hadoop/extrajars/protobuf-java-3.8.0.jar \
--jars /home/hadoop/extrajars/postgresql-42.2.11.jar,/home/hadoop/extrajars/spark-sql-kafka-0-10_2.12-2.4.4.jar,/home/hadoop/extrajars/spark-streaming-kafka-0-10_2.12-2.4.4.jar,/home/hadoop/extrajars/jdbc-4.50.3.jar,/home/hadoop/extrajars/config-1.4.0.jar,/home/hadoop/extrajars/scala-logging_2.13-3.9.2.jar,/home/hadoop/extrajars/sparksql-scalapb_2.12-0.9.0.jar,/home/hadoop/extrajars/frameless-dataset_2.12-0.8.0.jar,/home/hadoop/extrajars/frameless-core_2.12-0.8.0.jar,/home/hadoop/extrajars/scalapb-runtime_sjs0.6_2.12-0.9.6.jar,/home/hadoop/extrajars/lenses_sjs0.6_2.12-0.9.6.jar,/home/hadoop/extrajars/protobuf-java-3.8.0.jar \
The default version on AWS spark 2.4.4 is protobuf-java-2.5.0.jar.
Any advise is much appreciated.