http://www.alphaworks.ibm.com/tech/mapreducetools
pointed me to the map reduce cheat sheet in eclipse, which
was almost very helpful. I thought the plugin would allow
me to find the hadoop dev area on the local node, but this
areas to not be the case. Do we need to download the apache
hadoop release for the single node? If so, will we need a
separate one for the ibm cluster?
Is there some abbreviated information out there on what the
plugin is doing and thus understand what else I need to do
to development and run a job without it?
dave bayer
Ok, so you do need to download the hadoop dev environment.
Now I have gone through the steps in the cheat sheet, except
that I cannot see the Run The MapReduce Application step due
to 'not having completed the prior steps' - which I had to do
manually because the plugin had some problems. Is there a way
to get around this?
dave bayer
I suppose I should give the error too:
The action could not be performed because plugin
'com.ibm.hipods.mapreduce' could not be located.
Google is of no help for this error.
dave bayer
i spent a chunk of time messing with this too. in general, i found
that the cheat sheets aren't worth your time.
overall, there are two plugins out there. the hadoop one we are given
and the IBM map-reduce one. you will want the hadoop one (they are
very similar), and to switch into the map-reduce perspective and open a
map-reduce view. create a couple servers (one for the cloud and one
for your local machine).
from what i gather, the hadoop plugin provides some nice tools so you
can manage your hadoop jobs. you can browse your server's HDFS
(hadoop distributed file system) from the package explorer. you can
run your hadoop jobs directly from eclipse on your servers. it also
provides templates for common mapreduce classes (mapper, reducer,
driver) and allows you to make map-reduce projects.
if you are working from home, you will want to download and extract a
version of hadoop (i'm using 0.16.3 with no issues so far), so that you
can make mapreduce projects (it will want to point to the install
directory of hadoop).
there's another gotcha, at least for me. eclipse 3.3.1.1 does not work
well with the map-reduce plugin (either the apache hadoop one or the
IBM one), in that it does not allow you to tunnel a connection when
setting up a MR server. using eclipse 3.2 seems to work fine with
everything.
finally, regarding the cheat sheets - they aren't that great. or at
least i didn't like them. they were somewhat informative, but no more
than any website (the hadoop map-reduce tutorial is nice
(http://hadoop.apache.org/core/docs/r0.16.3/mapred_tutorial.html)).
the only nice thing about those cheat sheets is that they can generate
templates for you for Map, Reduce, and Driver classes. however, just
go to File->New->Whatever and you can get the ones provided by the
hadoop plugin.
the error you are getting is because the cheat sheets were made to work
with the IBM plugin. if you really want to use that plugin, then i can
provide it for you. but don't bother. i downloaded, jarred, and used
the thing, and it wasn't worth the time. plus there are other issues
with it. it doesn't let you edit MR servers, had some weird behavior
when connecting to the local VM HDFS, and a couple other minor things.
once i realized i didn't need the cheatsheet templates, i switched back
to the regular hadoop plugin. so my recommendation is to skip the IBM
plugin. if you really want it, email me.
just a disclaimer - i'm learning all of this too, so sorry if any of it
is wrong. i'm about a half step ahead, and will try to save you all
the headaches i've already gone through with this.
barret
Steven