Maybe I found the answer to my question but I'm running into more problems. This is the streaming command I use (broken into lines for readability):
hadoop
jar /usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/hadoop-streaming-1.1.2.jar
-libjars /usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar
-input /tmp/in
-output /tmp/out
-inputformat com.mongodb.hadoop.mapred.MongoInputFormat
-outputformat com.mongodb.hadoop.mapred.MongoOutputFormat
-jobconf mongo.input.uri=mongodb://
127.0.0.1:27017/visionion.import?readPreference=primary -jobconf mongo.output.uri=mongodb://
127.0.0.1:27017/visionion.hadoopfacts -jobconf stream.io.identifier.resolver.class=com.mongodb.hadoop.streaming.io.MongoIdentifierResolver
-io mongodb
-mapper /Users/me/aggregation/hadoop/mapper.js
-reducer /Users/me/aggregation/hadoop/reducer.js
-jobconf mongo.input.query={_id:{\\$date:1365030000000}}
But I'm getting this:
13/11/20 18:53:45 WARN streaming.StreamJob: -jobconf option is deprecated, please use -D instead.
13/11/20 18:53:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/11/20 18:53:45 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/20 18:53:46 INFO mapred.MongoInputFormat: Using com.mongodb.hadoop.splitter.StandaloneMongoSplitter@102f729e to calculate splits. (old mapreduce API)
13/11/20 18:53:46 INFO splitter.StandaloneMongoSplitter: Running splitvector to check splits against mongodb://
127.0.0.1:27017/visionion.import?readPreference=primary13/11/20 18:53:51 INFO filecache.TrackerDistributedCacheManager: Creating mongo-hadoop-streaming-assembly-1.1.0.jar in /tmp/hadoop-tl/mapred/local/archive/4825921399136019333_-2111303317_1891993332/file/usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar-work--1602227508575918435 with rwxr-xr-x
13/11/20 18:53:51 INFO filecache.TrackerDistributedCacheManager: Extracting /tmp/hadoop-tl/mapred/local/archive/4825921399136019333_-2111303317_1891993332/file/usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar-work--1602227508575918435/mongo-hadoop-streaming-assembly-1.1.0.jar to /tmp/hadoop-tl/mapred/local/archive/4825921399136019333_-2111303317_1891993332/file/usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar-work--1602227508575918435
13/11/20 18:53:51 INFO filecache.TrackerDistributedCacheManager: Cached file:///usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar as /tmp/hadoop-tl/mapred/local/archive/4825921399136019333_-2111303317_1891993332/file/usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar
13/11/20 18:53:51 INFO filecache.TrackerDistributedCacheManager: Cached file:///usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar as /tmp/hadoop-tl/mapred/local/archive/4825921399136019333_-2111303317_1891993332/file/usr/local/Cellar/hadoop/1.1.2/libexec/contrib/streaming/mongo-hadoop-streaming-assembly-1.1.0.jar
13/11/20 18:53:51 WARN mapred.LocalJobRunner: LocalJobRunner does not support symlinking into current working dir.
13/11/20 18:53:51 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-tl/mapred/local]
13/11/20 18:53:51 INFO streaming.StreamJob: Running job: job_local_0001
13/11/20 18:53:51 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/11/20 18:53:51 INFO mapred.Task: Using ResourceCalculatorPlugin : null
13/11/20 18:53:51 INFO mapred.MapTask: numReduceTasks: 1
13/11/20 18:53:51 INFO mapred.MapTask: io.sort.mb = 100
13/11/20 18:53:51 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/20 18:53:51 INFO mapred.MapTask: record buffer = 262144/327680
13/11/20 18:53:51 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/me/aggregation/hadoop/mapper.js]
java.io.IOException: Cannot run program "/Users/me/aggregation/hadoop/mapper.js": error=13, Permission denied
at java.lang.ProcessBuilder.processException(ProcessBuilder.java:478)
...
13/11/20 18:53:51 ERROR streaming.PipeMapRed: configuration exception
java.io.IOException: Cannot run program "/Users/me/aggregation/hadoop/mapper.js": error=13, Permission denied
...
13/11/20 18:53:51 WARN mapred.LocalJobRunner: job_local_0001
java.lang.RuntimeException: Error in configuring object
...
...
...
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...Caused by: java.io.IOException: Cannot run program "/Users/me/aggregation/hadoop/mapper.js": error=13, Permission denied
at java.lang.ProcessBuilder.processException(ProcessBuilder.java:478)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:457)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
... 19 more
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:53)
at java.lang.ProcessImpl.start(ProcessImpl.java:91)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:452)
... 20 more
13/11/20 18:53:52 INFO streaming.StreamJob: map 0% reduce 0%
13/11/20 18:53:52 INFO streaming.StreamJob: Job running in-process (local Hadoop)
13/11/20 18:53:52 ERROR streaming.StreamJob: Job not successful. Error: NA
13/11/20 18:53:52 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
I already changed the permissions for mapper.js to 755 just to be sure but to no avail. Can soembody please shed some light on where I should look for the error?
Regards,
Thomas