This is a basic case just using hadoop streaming on EMR:
This call crashes:
hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input
s3://bom-test/test/querylist -output test -inputformat
org.apache.hadoop.streaming.AutoInputFormat -mapper '/bin/cat' -
reducer '/bin/cat'
2012-02-23 14:22:38,351 INFO org.apache.hadoop.mapred.TaskInProgress
(IPC Server handler 36 on 9001): Error from
attempt_201202231342_0001_m_000000_0:
java.lang.IllegalArgumentException: This file system object (hdfs://
10.228.77.162:9000) does not support access to the request path 's3://
bom-test/test/querylist' You possibly called FileSystem.get(conf) when
you should have called FileSystem.get(uri, conf) to obtain a file
system supporting your path.
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:
372)
at
org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:
106)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:
162)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:
187)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:417)
at
org.apache.hadoop.streaming.AutoInputFormat.getRecordReader(AutoInputFormat.java:
56)
at org.apache.hadoop.mapred.MapTask
$TrackedRecordReader.<init>(MapTask.java:199)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:
423)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:377)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:
1059)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
This call works:
hadoop jar /home/hadoop/contrib/streaming/hadoop-streaming.jar -input /
test/querylist -output test -inputformat
org.apache.hadoop.streaming.AutoInputFormat -mapper '/bin/cat' -
reducer '/bin/cat'
I can file a bug on this. Which list do you recommand?
-Gilles