Looks like it was a case of adding:
os.path.join(hadoop, 'share', 'hadoop', 'contrib', name),
to findjar() in dumbo/util.py
Now onto the next problem with:
12/03/20 17:11:14 WARN streaming.StreamJob: -jobconf option is
deprecated, please use -D instead.
-inputformat : class not found :
org.apache.hadoop.streaming.AutoInputFormat
Streaming Job Failed!
Which sounds like an unpatched streaming jar file, from a quick look
at this forum.
Cheers,
Piers Harding.
On Mar 20, 2:03 pm, Piers Harding <
pi...@ompka.net> wrote:
> Hi -
>
> I'm getting the "ERROR: Streaming jar not found" problem that (from
> looking at previous posts) should go away if I include the -hadoop
> option. this works for dumbo ls /user/hduser/gutenberg -hadoop /usr,
> but not for dumbo start ipcount.py -hadoop /usr -input /tmp/accesslogs/
> * -output ipcounts .
>
> I'm using the Debian package fromhttp://
mirrors.ibiblio.org/apache//hadoop/common/stable/hadoop_1.0.1-...,