Problems reading and writing to Avro from SparkSQL on CDAP 3.4.2 Distributed against CDH 5.5.4

397 views
Skip to first unread message

Manoj Seshan

unread,
Jul 4, 2016, 5:35:56 PM7/4/16
to CDAP User
Hi CDAP team ... your help would be greatly appreciated.

Simple SparkSQL program has runtime exceptions reading from Avro and writing to Avro datasets, 
with the Exceptions being as listed below. Have tried various explicit dependency inclusion, 
and scope options in the pom. Simple source code & POM attached, stripped down from examples/SparkPageRank.

a)On read [SRC: DataFrame df = sqlContext.read().format("com.databricks.spark.avro").load("people.avro");]

Diagnostics:User class threw exception: java.lang.NoClassDefFoundError: org/apache/avro/mapred/FsInput

b)On write [SRC:     df.write().options(avroParams).format("com.databricks.spark.avro").save("df-avro-test-dir");]

User class threw exception: java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location: org/apache/avro/SchemaBuilder$NamespacedBuilder.completeSchema(Lorg/apache/avro/Schema;)Lorg/apache/avro/Schema; @2: invokevirtual
Reason: Type 'org/apache/avro/Schema' (current frame, stack[1]) is not assignable to 'org/apache/avro/JsonProperties'
Current Frame:
bci: @2
flags: { }
locals: { 'org/apache/avro/SchemaBuilder$NamespacedBuilder', 'org/apache/avro/Schema' }
stack: { 'org/apache/avro/SchemaBuilder$NamespacedBuilder', 'org/apache/avro/Schema' }
Bytecode:
0000000: 2a2b b600 0857 2a2b b600 0957 2ab6 0006
0000010: 2bb8 000a 2bb0

pom.xml
package-info.java
SparkTestApp.java
SparkTestProgram.java

Manoj Seshan

unread,
Jul 4, 2016, 7:33:54 PM7/4/16
to CDAP User
Updating this POST after more research. Despite explicitly included avro-mapred.jar and avro.jar, at Version 1.8.1 as dependencies, and including them in the Application Artifact jar; CDAP seems to be:
a) Always adding avro 1.6.2 jar early in the Classpaths built for the Spark containers
b) Not including avro-mapred,jar, a compile and run-time dependancy for the SparkSQL Avro code
c) Not supplying the jars explicitly included in the Application Artifact jar to the Yarn Spark containers

See attached. Your assistance in resolving this is highly appreciated.

Thanks.
Avro-Jar-Probs.png

Terence Yim

unread,
Jul 4, 2016, 9:47:20 PM7/4/16
to Manoj Seshan, CDAP User
Hi,

The CDAP doesn't provide avro classes at runtime. You always need to have the right version included in your application jar. 
If you create the project with the maven archetype, it is as simple as adding avro as a direct dependency on your project.

Terence

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cdap-user/cdbb7eb7-4ac2-4dd8-ba24-3dd512656550%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<Avro-Jar-Probs.png>

Manoj Seshan

unread,
Jul 5, 2016, 5:11:56 AM7/5/16
to CDAP User, manoj....@thomsonreuters.com
Hi Terence, thanks for your reply.

Please note:
1) We are including the precise version of avro & avro-mapred jars needed via the Maven POM (I have attached the POM file again)
2) We are using the CDAP maven archetype
3) The CDAP application (jar) artifact does contain precisely the Avro (& its dependency) jars the application needs (I have attached the jar content listing, and the Manifest file within)

However:
4) The list of jars on the CLASSPATH for the Spark Action - which is spark-submitted by the CDAP framework does not include the jars that were packaged in our Application jar artifact (I have attached system.java.path (split per line for readability) listing from the Spark Driver container)
5) In addition to the CDH 5.5.4 jars (automatically provided by the CDH cluster), CDAP is including a number of twill, avro, etc. jars on the Spark (Driver) Container classpath that are not packaged with the little CDAP application we built.
6) There are exceptions on the SparkSQL methods reading from and writing to Avro (I had and have attached the Source code again)

Your assistance in digging deep and helping resolving this issue at the earliest is much appreciated.

Regards
pom.xml
jar_listing.txt
MANIFEST.MF
classpath.txt
SparkTestProgram.java

Terence Yim

unread,
Jul 5, 2016, 1:38:10 PM7/5/16
to Manoj Seshan, CDAP User
Hi Manoj,

Sorry that I missed the attached pom file. From the pom file and the jar file content, it seems correct to me. I’ve tried to reproduce the problem on both Spark 1.5.2 and Spark 1.6.2 (w/ Hadoop 2.6) and both ran successfully. I’ve attached the project and the instructions for your to try it out. 

Regarding the classpath, CDAP has it own ClassLoader for program execution, which doesn’t rely on the system classpath, hence it’s normal that you don’t see the application jars in the system classpath.

Terence

SparkTest.tgz

Manoj Seshan

unread,
Jul 6, 2016, 9:36:36 AM7/6/16
to CDAP User, manoj....@thomsonreuters.com
Thanks Terence - I used your POM, and the code worked. That's a lot of dependencies that needed to be included / excluded. Would you be able to point us to tools that you use to figure out the ClassPath that the CDAP class loader is using?

Terence Yim

unread,
Jul 7, 2016, 1:00:32 AM7/7/16
to Manoj Seshan, CDAP User
Hi Manoj,

In short, only classes in the cdap-api artifact, hadoop classes (org.apache.hadoop.*, except hbase) and Spark classes (those inside the spark-assembly jar, only for spark programs) are available to user program and nothing else. The class loading model in CDAP is hierarchical instead of a flat classpath in order to support class isolation (e.g. a program can uses different guava version then the one used by the underlying platform).

Here are more information about the class loading model.


Terence

--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap-user+...@googlegroups.com.
To post to this group, send email to cdap...@googlegroups.com.

virgilio pierini

unread,
Apr 9, 2020, 7:51:16 AM4/9/20
to CDAP User
Hi all
I've encountered the same error Manoj reported opening this thread, but on a different scenario.
I'm trying to develop pipeliens with the visual editor and load additinal plugins in the execution environment.
But even with the dependencies listed in Terence pom.xml I still have the issue

I filed this https://groups.google.com/forum/#!topic/cdap-user/4ACA4IvDdSU and if you can read it it would be highly appreciated

My regards
Virgilio


Il giorno giovedì 7 luglio 2016 07:00:32 UTC+2, Terence Yim ha scritto:
Hi Manoj,

In short, only classes in the cdap-api artifact, hadoop classes (org.apache.hadoop.*, except hbase) and Spark classes (those inside the spark-assembly jar, only for spark programs) are available to user program and nothing else. The class loading model in CDAP is hierarchical instead of a flat classpath in order to support class isolation (e.g. a program can uses different guava version then the one used by the underlying platform).

Here are more information about the class loading model.


Terence

On Jul 6, 2016, at 6:36 AM, Manoj Seshan <manoj...@thomsonreuters.com> wrote:

Thanks Terence - I used your POM, and the code worked. That's a lot of dependencies that needed to be included / excluded. Would you be able to point us to tools that you use to figure out the ClassPath that the CDAP class loader is using?

On Tuesday, July 5, 2016 at 11:08:10 PM UTC+5:30, Terence Yim wrote:
Hi Manoj,

Sorry that I missed the attached pom file. From the pom file and the jar file content, it seems correct to me. I’ve tried to reproduce the problem on both Spark 1.5.2 and Spark 1.6.2 (w/ Hadoop 2.6) and both ran successfully. I’ve attached the project and the instructions for your to try it out. 

Regarding the classpath, CDAP has it own ClassLoader for program execution, which doesn’t rely on the system classpath, hence it’s normal that you don’t see the application jars in the system classpath.

Terence


--
You received this message because you are subscribed to the Google Groups "CDAP User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cdap...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages