Cascading on Hadoop 0.17

17 views
Skip to first unread message

Kirill Shabunov

unread,
Jun 26, 2008, 12:40:53 PM6/26/08
to cascading-user
Thanks for your hard work, Chris!

Here is a new question for you: I tried to run my Cascading program on
Hadoop 0.17.0 and it failed:
2008-06-26 07:52:36,647 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.lang.AbstractMethodError: cascading.flow.FlowMapper.map(Ljava/
lang/Object;Ljava/lang/Object;Lorg/apache/hadoop/mapred/
OutputCollector;Lorg/apache/hadoop/mapred/Reporter;)V
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
2124)

Everything works fine on the old version. I googled and found other
people got the same "java.lang.AbstractMethodError" with the new
Hadoop:
http://article.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/5996
http://www.nabble.com/svn-nutch-with-hadoop-0.17-td17441505.html

Are you planning to make Cascading work on the new Hadoop version?

Chris K Wensel

unread,
Jun 26, 2008, 1:06:29 PM6/26/08
to cascadi...@googlegroups.com
I haven't used hadoop 0.17 yet, so I haven't upgrade the trunk to it
yet. I am looking for a large dataset I can process in EC2 to sanity
check Hadoop and Cascading before releases. So far I've done this with
customer proprietary processes, but it makes sense to have something
public other people can run. I've heard mixed things on 0.17, so am
kinda waiting for .1 or .2.

That said, I was recently emailed by someone who said it works fine,
they made a couple quick changes in the trunk source to make it
compile. It should be trivial to do a build on your own. It would be
great to hear how Hadoop 0.17 works for you.

Otherwise I can patch something in trunk and run the standard unit
tests.

cheers,
chris

--
Chris K Wensel
ch...@wensel.net
http://chris.wensel.net/
http://www.cascading.org/


montag

unread,
Jun 26, 2008, 1:16:11 PM6/26/08
to cascading-user
Hi Kirill,

If you're in a rush and willing to dive into some source code, the
changes necessary to fix the AbstractMethodError are minimal. Changes
mainly need to be made to classes in the cascading.flow,
cascading.operation, cascading.operation.expression,
cascading.operation.xml, cascading.tap, cascading.tap.hadoop, and
cascading.util packages. Almost all of these changes are pretty
quick. Hadoop 17 seemed to abstract certain classes even further.
Most of the quick fixes really boil down to changing
WritableComparable and Writable objects to Object. If you use an IDE
like Eclipse, just compile the Cascading source against the Hadoop 17
jars, and follow the errors.

Again, this is a quick fix and nowhere near the rigorous way Chris
is probably writing and testing his code, but it worked for me. I
haven't seen any problems yet.

Cheers,
Mike

On Jun 26, 12:40 pm, Kirill Shabunov <e2...@yahoo.com> wrote:
> Thanks for your hard work, Chris!
>
> Here is a new question for you: I tried to run my Cascading program on
> Hadoop 0.17.0 and it failed:
> 2008-06-26 07:52:36,647 WARN org.apache.hadoop.mapred.TaskTracker:
> Error running child
> java.lang.AbstractMethodError: cascading.flow.FlowMapper.map(Ljava/
> lang/Object;Ljava/lang/Object;Lorg/apache/hadoop/mapred/
> OutputCollector;Lorg/apache/hadoop/mapred/Reporter;)V
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
> 2124)
>
> Everything works fine on the old version. I googled and found other
> people got the same "java.lang.AbstractMethodError" with the new
> Hadoop:http://article.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/5996http://www.nabble.com/svn-nutch-with-hadoop-0.17-td17441505.html

Chris K Wensel

unread,
Jun 26, 2008, 1:50:02 PM6/26/08
to cascadi...@googlegroups.com
Thanks Mike

--

Kirill Shabunov

unread,
Jun 27, 2008, 9:27:32 AM6/27/08
to cascading-user
Mike, Chris, thanks a lot!

I followed your advice and mimed the Cascading source myself. :-) It
seems to work for me. Apart from changing WritableComparable and
Writable objects to Object there are some other changes since some
methods are deprecated now.

--Kirill

On Jun 26, 9:50 pm, Chris K Wensel <ch...@wensel.net> wrote:
> Thanks Mike
>
> On Jun 26, 2008, at 10:16 AM, montag wrote:
>
> > Hi Kirill,
>
> >   If you're in a rush and willing to dive into some source code, the
> > changes necessary to fix the AbstractMethodError are minimal.  Changes
> > mainly need to be made to classes in the cascading.flow,
> > cascading.operation, cascading.operation.expression,
> > cascading.operation.xml, cascading.tap, cascading.tap.hadoop, and
> > cascading.util packages.  Almost all of these changes are pretty
> > quick.  Hadoop 17 seemed to abstract certain classes even further.
> > Most of the quick fixes really boil down to changing
> > WritableComparable and Writable objects to Object.  If you use an IDE
> > like Eclipse, just compile the Cascading source against the Hadoop 17
> > jars, and follow the errors.
> . . .

Kirill Shabunov

unread,
Jul 1, 2008, 7:31:03 AM7/1/08
to cascading-user
The bad news is the aforementioned changes make it incompatible with
the old versions of Hadoop. Even if you leave deprecated methods and
add alternative methods for calls from Hadoop instead of changing the
old ones there are still calls to Hadoop methods from Cascading. For
example, cascading.scheme.TextLine.sink(...) calls
org.apache.hadoop.mapred.OutputCollector.collect(k, v). If you built
the project with the new Hadoop jar it would try to call
collect(Object, Object) which does not exist in the old Hadoop.

--Kirill

Chris K Wensel

unread,
Jul 1, 2008, 11:28:44 AM7/1/08
to cascadi...@googlegroups.com
So, should the next Cascading release be built against 17.x?

Those interested please respond with a +1 or -1.

--

timrobertson100

unread,
Jul 1, 2008, 5:29:41 PM7/1/08
to cascading-user
+1

Chris K Wensel

unread,
Jul 7, 2008, 4:56:36 PM7/7/08
to cascadi...@googlegroups.com
For those interested, I have a branch in svn that compiles and tests
well against Hadoop 0.17

http://code.google.com/p/cascading/source/browse/branches/hadoop_0.17/cascading/

ckw

--

tim robertson

unread,
Jul 7, 2008, 5:12:50 PM7/7/08
to cascadi...@googlegroups.com
Thank you very much Chris.

Michael Kramer

unread,
Jul 7, 2008, 5:16:09 PM7/7/08
to cascadi...@googlegroups.com
Thanks Chris!  I'm checking it out now.

-Mike

2008/7/7 tim robertson <timrobe...@gmail.com>:

esvee

unread,
Jul 11, 2008, 6:04:49 AM7/11/08
to cascading-user
+1

Chris K Wensel

unread,
Jul 14, 2008, 6:38:39 PM7/14/08
to cascadi...@googlegroups.com
FYI

svn trunk has been updated to support Hadoop 0.17. Trunk also includes
all the changes I've previously announced, unlike the Hadoop 0.17
branch.

http://code.google.com/p/cascading/source/browse/trunk/cascading/CHANGES.txt

Please test and let me know what you think, if you can.

I'll likely make a release tomorrow morning of the whole stack, time
permitting. But feedback before then is quite welcome.

cheers,
ckw

On Jul 11, 2008, at 3:04 AM, esvee wrote:

>
> +1
>
>
>> Those interested please respond with a +1 or -1.
>
> >

--

Reply all
Reply to author
Forward
0 new messages