java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig

Cameron Larson

unread,

May 5, 2017, 11:48:34 AM5/5/17

to genie

I am running into an error when I add an EMR (v 5.4.0) cluster to the Genie 3.0.5 demo and run a Spark Submit 201 command via ./run_spark_submit_job.py emr 2.0.1

I am also running into this from a custom built genie app docker image. In that instance I used a python file.

I am running Genie on a local development box that has access to my EMR.

Attached is the job output. I have googled around and it seems to be a problem w/ yarn. Is this accurate? Am I misconfigured somewhere?

Thank you!

Cam

stderr:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/05/05 15:18:59 INFO SparkContext: Running Spark version 2.0.1
17/05/05 15:18:59 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/05 15:18:59 INFO SecurityManager: Changing view acls to: root
17/05/05 15:18:59 INFO SecurityManager: Changing modify acls to: root
17/05/05 15:18:59 INFO SecurityManager: Changing view acls groups to:
17/05/05 15:18:59 INFO SecurityManager: Changing modify acls groups to:
17/05/05 15:18:59 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
17/05/05 15:19:00 INFO Utils: Successfully started service 'sparkDriver' on port 39889.
17/05/05 15:19:00 INFO SparkEnv: Registering MapOutputTracker
17/05/05 15:19:00 INFO SparkEnv: Registering BlockManagerMaster
17/05/05 15:19:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f2acbb60-e81e-4907-b02e-b8fff244170d
17/05/05 15:19:00 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
17/05/05 15:19:01 INFO SparkEnv: Registering OutputCommitCoordinator
17/05/05 15:19:01 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/05/05 15:19:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://172.20.0.5:4040
17/05/05 15:19:01 INFO SparkContext: Added JAR file:/tmp/genie/jobs/1d1f6a85-31a6-11e7-8ff4-0242ac140006/genie/applications/spark201/dependencies/spark-2.0.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.0.1.jar at spark://172.20.0.5:39889/jars/spark-examples_2.11-2.0.1.jar with timestamp 1493997541368
Exception in thread "main" java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:149)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:497)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2275)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more
17/05/05 15:19:02 INFO DiskBlockManager: Shutdown hook called
17/05/05 15:19:02 INFO ShutdownHookManager: Shutdown hook called
17/05/05 15:19:02 INFO ShutdownHookManager: Deleting directory /tmp/spark-8d142231-47b0-4483-9dde-8f6315daee66/userFiles-805cbaee-df26-4710-bde9-48e2908be081
17/05/05 15:19:02 INFO ShutdownHookManager: Deleting directory /tmp/spark-8d142231-47b0-4483-9dde-8f6315daee66

genie_job_emr_fail.tar.gz

Tom Gianos

unread,

May 8, 2017, 12:21:59 PM5/8/17

to genie

Hey Cam,

It's tough to say without seeing the containers and everything you're building against. Are you using the OSS Apache Hadoop client provided within the original demo or have you swapped it out for some EMR specific one? I've recently been able to run the demo with the latest snapshot versions and things seemed to work fine for Spark so I'm wondering if your binaries changed somehow. I found this with some quick googling not sure if it will help: https://issues.apache.org/jira/browse/SPARK-15343

Tom

Cameron Larson

unread,

May 8, 2017, 1:32:08 PM5/8/17

to genie

Steps Taken:

Follow Netflix Genie 3.0.5 Demo Guide found at https://netflix.github.io/genie/docs/3.0.5/demo/ (local ubuntu 16.04)
Modified docker-compose.yml to port forward 8080 to 8090
Created an EMR version 5.4.0

Configuration Details
Release label:emr-5.4.0
Hadoop distribution:Amazon 2.7.3
Applications:Hive 2.1.1, Pig 0.16.0, Hue 3.11.0, Sqoop 1.4.6, Presto 0.166, Spark 2.1.0

Copied core-site.xml, mapred-site.xml, yarn-site.xml to S3.
Modified the address from internal address to public in the 3 copied files in S3.
Created a new cluster in the Netflix Genie 3.0.5 demo instance for the EMR
Added existing commands to the EMR cluster
Executed the ./run-spark-submit emr 2.0.1 from the Genie Demo Client container
Received the error about jersey :(

Could this be due to the mismatch of version of Spark? Genie Demo running 2.0.1 and EMR running 2.1.0?

In answer to your question "Are you using the OSS Apache Hadoop client provided within the original demo" yes. using everything unchanged except for the port number exposed and added in a new cluster. How did you setup and get it to run? (assuming against an EMR cluster)

Thanks,

Cam

Tom Gianos

unread,

May 30, 2017, 2:19:27 PM5/30/17

to genie

Hey Cam,

Sorry about super delayed response. This got lost in the shuffle of my inbox.

So a couple things you could do to further debug:

1. You can change how genie starts up to set the property `genie.jobs.cleanup.deleteDependencies` to false. That way your entire dependency tree for your apps isn't deleted. Then you could grep for this ClientConfig class within the dependencies directory to see if it truly doesn't exist or not. If it does exist it's probably a classpath variable definition error of either the spark or hadoop config. (one of these env variables you can see in genie.env inside the genie directory of your job output like `HADOOP_DEPENDENCIES_DIR`)

2. For hadoop on EMR we would go to an EMR master node of the cluster we were launching and zip up the hadoop directory to serve as our hadoop application rather than the Apache one since amazon has different jars they include in their distros and you might want to match them to your AMI. You could then create a new hadoop application from this zip and link the spark app to that.

I've asked some guys on our team who worked with spark on EMR via Genie to take a look at this as well to see if they have any aditional input. I'll let you know if they have additional thoughts.