Problem with parquet-cascading

james

unread,

Jan 1, 2014, 12:05:12 PM1/1/14

to cascadi...@googlegroups.com

hii,

Trying to run this code

Main.java:
public static void main(String[] args) {
...
..
Properties properties = new Properties();
AppProps.setApplicationJarClass(properties, Main.class);
HadoopFlowConnector flowConnector = new HadoopFlowConnector(properties);

Scheme sourceScheme = new queries.ParquetTupleScheme(new Fields("a", "b", "c"));
Tap inTap = new Hfs(sourceScheme, inPath);
...
...
...
}

And i'm getting this error:
java.lang.NoClassDefFoundError: cascading/scheme/Scheme

Here is what i tried so far

1)
When I replace this:
Scheme sourceScheme = new ParquetTupleScheme(new Fields("a", "b", "c"));
with this:
Scheme sourceScheme = null;
The error goes away

2)
When I'm creating class that extends Scheme<JobConf, RecordReader, OutputCollector, Object[], Object[]> like ParquetTupleScheme
The error goes away

3)
When I'm trying to check if this is a specific parquet-cascading error
Object a = new PigCombiner()
the error goes away

I'm using :
cascading 2.5.1
parquet-cascading 1.3.0
hadoop-core 1.2.1

What i'm doing wrong?

Andre Kelpe

unread,

Jan 7, 2014, 4:39:50 AM1/7/14

to cascadi...@googlegroups.com

Hi,

how are you building your project? Which exact dependencies are you using?

- André

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/30119812-263f-409e-951c-88d6966fbb00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

Soren Macbeth

unread,

Mar 30, 2014, 9:52:47 PM3/30/14

to cascadi...@googlegroups.com

I'm hitting this same issue trying to use parquet-cascading 1.3.2 with cascalog 2.1.0, cascading-hadoop2-mr1 2.5.3 on CDH4.6.0

I can see the cascading.scheme.Scheme class file in my uberjar. I can import cascading.scheme.Scheme in a repl on the cluster. I can run other cascalog queries fine as long as I don't use parquet-cascading.

The only thing I could think might possibly happening is that parquet-cascading is using a different classloader or something bizarre?

Any help appreciated!

Soren Macbeth

unread,

Mar 31, 2014, 12:08:41 AM3/31/14

to cascadi...@googlegroups.com

turns out this is caused by CDH4.6.0 having older version of parquet on the class path.

Deepak Subhramanian

unread,

Jun 9, 2014, 7:14:30 AM6/9/14

to cascadi...@googlegroups.com

I am getting similar error related to parquet.cascading.ParquetTupleScheme method not found while trying to use scalding with Parquet. I am using CDH4.5 . Is there a way to override the CDH jars. My code works in hdfs mode in IntelliJ. But when I run with the cluster I am getting the error.

Exception in thread "main" java.lang.reflect.InvocationTargetException

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)

at java.lang.reflect.Constructor.newInstance(Constructor.java:513)

at com.twitter.scalding.Job$.apply(Job.scala:49)

at com.twitter.scalding.Tool.getJob(Tool.scala:51)

at com.twitter.scalding.Tool.run(Tool.scala:71)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at JobRunner$.main(JobRunner.scala:28)

at RawLogsJobRunner$delayedInit$body.apply(RawLogsJobRunner.scala:21)

at scala.Function0$class.apply$mcV$sp(Function0.scala:40)

at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)

at scala.App$$anonfun$main$1.apply(App.scala:71)

at scala.collection.immutable.List.foreach(List.scala:318)

at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)

at scala.App$class.main(App.scala:71)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Caused by: java.lang.NoSuchMethodError: parquet.cascading.ParquetTupleScheme.<init>(Lcascading/tuple/Fields;Lcascading/tuple/Fields;Ljava/lang/String;)V

Andre Kelpe

unread,

Jun 10, 2014, 4:52:25 AM6/10/14

to cascadi...@googlegroups.com

Try setting mapreduce.job.user.classpath.first=true to put your jars first on the classpath.

- André

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/9be3738f-5cc0-4015-bb52-f2f26c2952b5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Deepak Subhramanian

unread,

Jun 12, 2014, 4:57:29 AM6/12/14

to cascadi...@googlegroups.com

Thanks Andre. I tried that hadoop setting. For some reason it is not working.

Thanks, Deepak

Andre Kelpe

unread,

Jun 12, 2014, 5:44:58 AM6/12/14

to cascadi...@googlegroups.com

You will have to talk to your hadoop vendor then. Sorry about that.

- André

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/39071f55-7501-4a41-bf7e-413dd336e705%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Antonios Chalkiopoulos

unread,

Jun 12, 2014, 10:01:07 AM6/12/14

to cascadi...@googlegroups.com

Andre you are right - parquet-cascading-1.5.0 seems to be working ok both in local/HDFS mode ...

This is a vendor issue - and will try to get it resolved with Cloudera

In case someone visits this thread - because of not being able to get parquet-cascading work in HDFS mode .. our quick fix till we get the vendor chipping in is:

mkdir BACKUP
mv -f *-cdh4.5.0.jar BACKUP/

After getting the proper libraries in place all we need to do is :

$ hadoop jar uber-jar.jar com.twitter.scalding.Tool com.foo.MyJob --hdfs

and it reads PARQUET files and writes as well in HDFS nicely :)

- Antonios

Andre Kelpe

unread,

Jun 12, 2014, 10:28:25 AM6/12/14

to cascadi...@googlegroups.com

Thanks for sharing. I wonder why vendors keep on adding jars by default. I would prefer if distros shipped with less (outdated) jars by default, but the opposite seems to be the case...

- André

To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/a2d8cf0b-fde0-4560-936b-062a772e60f1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Deepak Subhramanian

unread,

Jul 2, 2014, 11:09:26 AM7/2/14

to cascadi...@googlegroups.com

We got it working with manually replacing the old cdh parquet jars with 1.5 parquet jars.

We tried using the hadoop parameter (export HADOOP_USER_CLASSPATH_FIRST=true) to override the vendor jars . When we try to override the cdh jars with latest jars we are getting a different error. It is using DeprecratedParquetInputFormat. Is it because the input data is created using old version of Parquet Serde for Hive.

2014-07-02 15:49:49,128 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:180)
	at java.io.DataInputStream.readUTF(DataInputStream.java:592)
	at java.io.DataInputStream.readUTF(DataInputStream.java:547)
	at parquet.hadoop.ParquetInputSplit.readFields(ParquetInputSplit.java:177)
	at parquet.hadoop.mapred.DeprecatedParquetInputFormat$ParquetInputSplitWrapper.readFields(DeprecatedParquetInputFormat.java:196)
	at cascading.tap.hadoop.io.MultiInputSplit.readFields(MultiInputSplit.java:151)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:73)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:44)
	at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:356)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:388)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
2014-07-02 15:49:49,132 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

Reply all

Reply to author

Forward