mongo-hadoop and hadoop-0.23.1

316 views
Skip to first unread message

Mark Lewandowski

unread,
Mar 9, 2012, 3:36:26 PM3/9/12
to mongod...@googlegroups.com
I'm currently trying to get mongo-hadoop working with hadoop-0.23.1 and streaming.  From the little documentation that exists on the web, I'm pretty certain that this is possible.

After installing hadoop, and writing a quick test MR job, I tried running it using mongo-hadoop.  The output from the hadoop job says it failed (output pasted below), but when I look in mongo, the correct output is sitting in a new collection.

Any ideas?

Here's the hadoop output:

╰─➤  $HADOOP_COMMON_HOME/bin/hadoop jar /home/mark/workspace/mongo-hadoop/streaming/target/mongo-hadoop-streaming-assembly-1.0.0-rc1-SNAPSHOT.jar -mapper pymapper.py -reducer pyreducer.py -inputURI mongodb://127.0.0.1/path_production.users -outputURI mongodb://127.0.0.1/path_production.mr_usercount -file pymapper.py -file pyreducer.py
12/03/09 12:28:42 INFO streaming.MongoStreamJob: Running
12/03/09 12:28:42 INFO streaming.MongoStreamJob: Init
12/03/09 12:28:42 INFO streaming.MongoStreamJob: Process Args
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Setup Options'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: PreProcess Args
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Parse Options
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-mapper'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pymapper.py'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-reducer'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pyreducer.py'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-inputURI'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/path_production.users'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-outputURI'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/path_production.mr_usercount'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-file'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pymapper.py'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-file'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pyreducer.py'
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Add InputSpecs
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Setup output_
12/03/09 12:28:42 INFO streaming.StreamJobPatch: Post Process Args
12/03/09 12:28:42 INFO streaming.MongoStreamJob: Args processed.
12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
12/03/09 12:28:43 INFO streaming.MongoStreamJob: Input Format: com.mongodb.hadoop.mapred.MongoInputFormat@d0721b0
12/03/09 12:28:43 INFO streaming.MongoStreamJob: Output Format: com.mongodb.hadoop.mapred.MongoOutputFormat@4f34b07e
12/03/09 12:28:43 INFO streaming.MongoStreamJob: Key Class: class com.mongodb.hadoop.io.BSONWritable
12/03/09 12:28:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/09 12:28:43 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
12/03/09 12:28:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/03/09 12:28:43 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/03/09 12:28:43 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/03/09 12:28:43 INFO util.MongoSplitter:  Calculate Splits Code ... Use Shards? false, Use Chunks? true; Collection Sharded? false
12/03/09 12:28:43 INFO util.MongoSplitter: Creation of Input Splits is enabled.
12/03/09 12:28:43 INFO util.MongoSplitter: Using Unsharded Split mode (Calculating multiple splits though)
12/03/09 12:28:43 INFO util.MongoSplitter: Calculating unsharded input splits on namespace 'path_production.users' with Split Key '{ "_id" : 1}' and a split size of '8'mb per
12/03/09 12:28:43 WARN util.MongoSplitter: WARNING: No Input Splits were calculated by the split code. Proceeding with a *single* split. Data may be too small, try lowering 'mongo.input.split_size' if this is undesirable.
12/03/09 12:28:43 INFO input.MongoInputSplit: Creating a new MongoInputSplit for MongoURI 'mongodb://127.0.0.1/path_production.users', query: '{ "$query" : { }}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
12/03/09 12:28:43 INFO mapreduce.JobSubmitter: number of splits:1
12/03/09 12:28:43 WARN mapred.LocalDistributedCacheManager: LocalJobRunner does not support symlinking into current working dir.
12/03/09 12:28:43 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
12/03/09 12:28:43 INFO mapred.LocalJobRunner: OutputCommitter set in config null
12/03/09 12:28:43 INFO mapreduce.Job: Running job: job_local_0001
12/03/09 12:28:43 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
12/03/09 12:28:44 INFO mapred.LocalJobRunner: Waiting for map tasks
12/03/09 12:28:44 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000000_0
12/03/09 12:28:44 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@19381960
12/03/09 12:28:44 INFO mapred.MapTask: numReduceTasks: 1
12/03/09 12:28:44 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/03/09 12:28:44 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/03/09 12:28:44 INFO mapred.MapTask: soft limit at 83886080
12/03/09 12:28:44 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/03/09 12:28:44 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/03/09 12:28:44 INFO streaming.PipeMapRed: PipeMapRed exec [/home/mark/workspace/tmp/mongo-hadoop/./pymapper.py]
12/03/09 12:28:44 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/03/09 12:28:44 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/03/09 12:28:44 INFO mapreduce.Job: Job job_local_0001 running in uber mode : false
12/03/09 12:28:44 INFO mapreduce.Job:  map 0% reduce 0%
12/03/09 12:28:45 INFO input.MongoRecordReader: Cursor exhausted.
Done Mapping.
12/03/09 12:28:45 INFO streaming.PipeMapRed: Records R/W=27/1
12/03/09 12:28:45 INFO streaming.PipeMapRed: MRErrorThread done
12/03/09 12:28:45 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/03/09 12:28:45 INFO streaming.PipeMapRed: mapRedFinished
12/03/09 12:28:45 INFO mapred.LocalJobRunner:
12/03/09 12:28:45 INFO mapred.MapTask: Starting flush of map output
12/03/09 12:28:45 INFO mapred.MapTask: Spilling map output
12/03/09 12:28:45 INFO mapred.MapTask: bufstart = 0; bufend = 1323; bufvoid = 104857600
12/03/09 12:28:45 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214292(104857168); length = 105/6553600
12/03/09 12:28:45 INFO mapred.MapTask: Finished spill 0
12/03/09 12:28:45 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/03/09 12:28:45 INFO mapred.LocalJobRunner: Records R/W=27/1
12/03/09 12:28:45 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/03/09 12:28:45 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000000_0
12/03/09 12:28:45 INFO mapred.LocalJobRunner: Map task executor complete.
12/03/09 12:28:45 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@8497904
12/03/09 12:28:45 INFO mapred.Merger: Merging 1 sorted segments
12/03/09 12:28:45 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1358 bytes
12/03/09 12:28:45 INFO mapred.LocalJobRunner:
12/03/09 12:28:45 INFO streaming.PipeMapRed: PipeMapRed exec [/home/mark/workspace/tmp/mongo-hadoop/./pyreducer.py]
12/03/09 12:28:45 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/03/09 12:28:45 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/03/09 12:28:45 INFO streaming.PipeMapRed: MRErrorThread done
12/03/09 12:28:45 INFO streaming.PipeMapRed: Records R/W=27/1
12/03/09 12:28:45 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/03/09 12:28:45 INFO streaming.PipeMapRed: mapRedFinished
12/03/09 12:28:45 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/03/09 12:28:45 INFO mapred.LocalJobRunner: Records R/W=27/1 > reduce
12/03/09 12:28:45 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/03/09 12:28:45 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File file:/tmp/_temporary/0 does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1249)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1289)
    at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:540)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1249)
    at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1289)
    at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:262)
    at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:302)
    at org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)
    at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:208)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:455)
12/03/09 12:28:46 INFO mapreduce.Job:  map 100% reduce 100%
12/03/09 12:28:46 INFO mapreduce.Job: Job job_local_0001 failed with state FAILED due to: NA
12/03/09 12:28:46 INFO mapreduce.Job: Counters: 27
    File System Counters
        FILE: Number of bytes read=902093
        FILE: Number of bytes written=1105556
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=27
        Map output records=27
        Map output bytes=1323
        Map output materialized bytes=1383
        Input split bytes=128
        Combine input records=0
        Combine output records=0
        Reduce input groups=1
        Reduce shuffle bytes=0
        Reduce input records=27
        Reduce output records=1
        Spilled Records=54
        Shuffled Maps =0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=88
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=351928320
    File Input Format Counters
        Bytes Read=0
    File Output Format Counters
        Bytes Written=0
12/03/09 12:28:46 ERROR streaming.StreamJob: Job not Successful!
MongoDB Streaming Command Failed!

-Mark

Brendan W. McAdams

unread,
Mar 9, 2012, 4:08:18 PM3/9/12
to mongod...@googlegroups.com
Hadoop streaming seems to want to work with a file absolutely and we work around that… I'm wondering why it's tossing an exception.

Are you running this in local mode? pseudo-distributed? 




-Mark

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/1JHY3q8Gs18J.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Mark Lewandowski

unread,
Mar 9, 2012, 4:12:59 PM3/9/12
to mongod...@googlegroups.com
This is running in local mode, trying to get an idea of what the mongo-hadoop package is capable of before I develop against it in production.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com.

Brendan W. McAdams

unread,
Mar 9, 2012, 4:26:53 PM3/9/12
to mongod...@googlegroups.com
Local mode should definitely work without issues.  Can you send me your mapper / reducer to take a look?

To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/7WuFfrE2KHIJ.

To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

Mark Lewandowski

unread,
Mar 9, 2012, 4:38:36 PM3/9/12
to mongod...@googlegroups.com
maper.py
-----------------------------------------------------------

#!/usr/bin/env python

import sys
sys.path.append(".")

from pymongo_hadoop import BSONMapper

def mapper(documents):
    for doc in documents:
        yield {'_id': 'user', 'count': 1}

BSONMapper(mapper)
print >> sys.stderr, "Done Mapping!!!"


reducer.py
-----------------------------------------------------------

#!/usr/bin/env python

import sys
sys.path.append('.')

from pymongo_hadoop import BSONReducer

def reducer(key, values):
    _count = 0
    for v in values:
        _count += v['count']
    return {'_id': key, 'count': _count}

BSONReducer(reducer)

Brendan W. McAdams

unread,
Mar 14, 2012, 3:24:39 PM3/14/12
to mongodb-user
Mark,

I have been unable to reproduce this issue despite several
configurations and test beds. Are you still seeing problems?

On Mar 9, 5:38 pm, Mark Lewandowski <mark.e.lewandow...@gmail.com>
wrote:
> >>>> ╰─➤  $HADOOP_COMMON_HOME/bin/hadoop jar /home/mark/workspace/mongo-**
> >>>> hadoop/streaming/target/mongo-**hadoop-streaming-assembly-1.0.**0-rc1-SNAPS HOT.jar
> >>>> -mapper pymapper.py -reducer pyreducer.py -inputURI mongodb://
> >>>> 127.0.0.1/path_**production.users<http://127.0.0.1/path_production.users>-outputURI mongodb://
> >>>> 127.0.0.1/path_**production.mr_usercount<http://127.0.0.1/path_production.mr_usercount>-file pymapper.py -file pyreducer.py
> >>>> 12/03/09 12:28:42 INFO streaming.MongoStreamJob: Running
> >>>> 12/03/09 12:28:42 INFO streaming.MongoStreamJob: Init
> >>>> 12/03/09 12:28:42 INFO streaming.MongoStreamJob: Process Args
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Setup Options'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: PreProcess Args
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Parse Options
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-mapper'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pymapper.py'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-reducer'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pyreducer.py'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-inputURI'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'mongodb://
> >>>> 127.0.0.1/path_**production.users<http://127.0.0.1/path_production.users>
> >>>> '
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-outputURI'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'mongodb://
> >>>> 127.0.0.1/path_**production.mr_usercount<http://127.0.0.1/path_production.mr_usercount>
> >>>> '
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-file'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pymapper.py'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: '-file'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Arg: 'pyreducer.py'
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Add InputSpecs
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Setup output_
> >>>> 12/03/09 12:28:42 INFO streaming.StreamJobPatch: Post Process Args
> >>>> 12/03/09 12:28:42 INFO streaming.MongoStreamJob: Args processed.
> >>>> 12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
> >>>> 12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
> >>>> 12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
> >>>> 12/03/09 12:28:43 INFO io.MongoIdentifierResolver: Resolving: bson
> >>>> 12/03/09 12:28:43 INFO streaming.MongoStreamJob: Input Format:
> >>>> com.mongodb.hadoop.mapred.**MongoInputFormat@d0721b0
> >>>> 12/03/09 12:28:43 INFO streaming.MongoStreamJob: Output Format:
> >>>> com.mongodb.hadoop.mapred.**MongoOutputFormat@4f34b07e
> >>>> 12/03/09 12:28:43 INFO streaming.MongoStreamJob: Key Class: class
> >>>> com.mongodb.hadoop.io.**BSONWritable
> >>>> 12/03/09 12:28:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> >>>> processName=JobTracker, sessionId=
> >>>> 12/03/09 12:28:43 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
> >>>> with processName=JobTracker, sessionId= - already initialized
> >>>> 12/03/09 12:28:43 WARN util.NativeCodeLoader: Unable to load
> >>>> native-hadoop library for your platform... using builtin-java classes where
> >>>> applicable
> >>>> 12/03/09 12:28:43 WARN conf.Configuration: fs.default.name is
> >>>> deprecated. Instead, use fs.defaultFS
> >>>> 12/03/09 12:28:43 WARN conf.Configuration: mapred.used.**genericoptionsparser
> >>>> is deprecated. Instead, use mapreduce.client.**
> >>>> genericoptionsparser.used
> >>>> 12/03/09 12:28:43 INFO util.MongoSplitter:  Calculate Splits Code ...
> >>>> Use Shards? false, Use Chunks? true; Collection Sharded? false
> >>>> 12/03/09 12:28:43 INFO util.MongoSplitter: Creation of Input Splits is
> >>>> enabled.
> >>>> 12/03/09 12:28:43 INFO util.MongoSplitter: Using Unsharded Split mode
> >>>> (Calculating multiple splits though)
> >>>> 12/03/09 12:28:43 INFO util.MongoSplitter: Calculating unsharded input
> >>>> splits on namespace 'path_production.users' with Split Key '{ "_id" : 1}'
> >>>> and a split size of '8'mb per
> >>>> 12/03/09 12:28:43 WARN util.MongoSplitter: WARNING: No Input Splits
> >>>> were calculated by the split code. Proceeding with a *single* split. Data
> >>>> may be too small, try lowering 'mongo.input.split_size' if this is
> >>>> undesirable.
> >>>> 12/03/09 12:28:43 INFO input.MongoInputSplit: Creating a new
> >>>> MongoInputSplit for MongoURI 'mongodb://127.0.0.1/path_**
> >>>> production.users <http://127.0.0.1/path_production.users>', query: '{
> >>>> "$query" : { }}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
> >>>> 12/03/09 12:28:43 INFO mapreduce.JobSubmitter: number of splits:1
> >>>> 12/03/09 12:28:43 WARN mapred.**LocalDistributedCacheManager:
> >>>> LocalJobRunner does not support symlinking into current working dir.
> >>>> 12/03/09 12:28:43 INFO mapreduce.Job: The url to track the job:
> >>>>http://localhost:8080/
> >>>> 12/03/09 12:28:43 INFO mapred.LocalJobRunner: OutputCommitter set in
> >>>> config null
> >>>> 12/03/09 12:28:43 INFO mapreduce.Job: Running job: job_local_0001
> >>>> 12/03/09 12:28:43 INFO mapred.LocalJobRunner: OutputCommitter is
> >>>> org.apache.hadoop.mapred.**FileOutputCommitter
> >>>> 12/03/09 12:28:44 INFO mapred.LocalJobRunner: Waiting for map tasks
> >>>> 12/03/09 12:28:44 INFO mapred.LocalJobRunner: Starting task:
> >>>> attempt_local_0001_m_000000_0
> >>>> 12/03/09 12:28:44 INFO mapred.Task:  Using ResourceCalculatorPlugin :
> >>>> org.apache.hadoop.yarn.util.**LinuxResourceCalculatorPlugin@**19381960
> >>>> 12/03/09 12:28:44 INFO mapred.MapTask: numReduceTasks: 1
> >>>> 12/03/09 12:28:44 INFO mapred.MapTask: (EQUATOR) 0 kvi
> >>>> 26214396(104857584)
> >>>> 12/03/09 12:28:44 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
> >>>> 12/03/09 12:28:44 INFO mapred.MapTask: soft limit at 83886080
> >>>> 12/03/09 12:28:44 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
> >>>> 12/03/09 12:28:44 INFO mapred.MapTask: kvstart = 26214396; length =
> >>>> 6553600
> >>>> 12/03/09 12:28:44 INFO streaming.PipeMapRed: PipeMapRed exec
> >>>> [/home/mark/workspace/tmp/**mongo-hadoop/./pymapper.py]
> >>>> 12/03/09 12:28:44 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s]
> >>>> out:NA [rec/s]
> >>>> 12/03/09 12:28:44 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s]
> >>>> out:NA [rec/s]
> >>>> 12/03/09 12:28:44 INFO mapreduce.Job: Job job_local_0001 running in
> >>>> uber mode : false
> >>>> 12/03/09 12:28:44 INFO mapreduce.Job:  map 0% reduce 0%
> >>>> 12/03/09 12:28:45 INFO input.MongoRecordReader: Cursor exhausted.
> >>>> Done Mapping.
> >>>> 12/03/09 12:28:45 INFO streaming.PipeMapRed: Records R/W=27/1
> >>>> 12/03/09 12:28:45 INFO streaming.PipeMapRed: MRErrorThread done
> >>>> 12/03/09 12:28:45 INFO io.BSONWritable: No Length Header
> >>>> available.java.io.EOFException
> >>>> 12/03/09 12:28:45 INFO streaming.PipeMapRed: mapRedFinished
> >>>> 12/03/09 12:28:45 INFO mapred.LocalJobRunner:
> >>>> 12/03/09 12:28:45 INFO mapred.MapTask: Starting flush of map output
> >>>> 12/03/09 12:28:45 INFO mapred.MapTask: Spilling map output
> >>>> 12/03/09 12:28:45 INFO mapred.MapTask: bufstart = 0; bufend = 1323;
> >>>> bufvoid = 104857600
> >>>> 12/03/09 12:28:45 INFO mapred.MapTask: kvstart = 26214396(104857584);
> >>>> kvend = 26214292(104857168); length = 105/6553600
> >>>> 12/03/09 12:28:45 INFO mapred.MapTask: Finished spill 0
> >>>> 12/03/09 12:28:45 INFO mapred.Task: Task:attempt_local_0001_m_**000000_0
> >>>> is done. And is in the process of commiting
> >>>> 12/03/09 12:28:45 INFO mapred.LocalJobRunner: Records R/W=27/1
> >>>> 12/03/09 12:28:45 INFO mapred.Task: Task 'attempt_local_0001_m_000000_*
> >>>> *0' done.
> >>>> 12/03/09 12:28:45 INFO mapred.LocalJobRunner: Finishing task:
> >>>> attempt_local_0001_m_000000_0
> >>>> 12/03/09 12:28:45 INFO mapred.LocalJobRunner: Map task executor
>
> ...
>
> read more »

Mark Lewandowski

unread,
Mar 19, 2012, 2:23:58 PM3/19/12
to mongod...@googlegroups.com
Brendan,

The same issue is still occuring, but it does not seem to stop hadoop from finishing correctly.  I've just learned to ignore this for the time being, since I'm still in an experimentation phase.  When I begin rolling this out to a production environment I'll probably revisit this issue.

Thanks for looking into this.

-Mark

robee

unread,
Apr 27, 2012, 6:39:00 AM4/27/12
to mongod...@googlegroups.com
I have same problem.

here is the mapper and reducer. run on local mode and single node hadoop 0.23.1 on OS X Lion.

mapper :
#!/usr/bin/env python

import sys
sys.path.append(".")

from pymongo_hadoop import BSONMapper

def mapper(documents):
"""docstring for mapper"""
for doc in documents:
yield {'_id': doc['user']['time_zone'], 'count': 1}

BSONMapper(mapper)
print >> sys.stderr, "Done Mapping."

reducer:
#!/usr/bin/env python

import sys
sys.path.append(".")

from pymongo_hadoop import BSONReducer

def reducer(key, values):
"""docstring for reducer"""
print >> sys.stderr, "Processing Timezon %s" % key
_count = 0
for v in values:
_count += v['count']
return {'_id':key, 'count': _count}

BSONReducer(reducer)

does anyone has solve this problem?

thanks.

Brendan W. McAdams

unread,
Apr 27, 2012, 10:46:32 AM4/27/12
to mongod...@googlegroups.com
What is the problem you are seeing exactly, when you run this?
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/mongodb-user/-/rv39k1i4y-kJ.
>
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com.

robee

unread,
Apr 27, 2012, 10:50:53 AM4/27/12
to mongod...@googlegroups.com
Here's the output from hadoop

hadoop jar mongo-hadoop-streaming-assembly-1.0.0.jar -mapper twit_map.py -reducer twit_reduce.py -inputURI mongodb://127.0.0.1/test.live -outputURI mongodb://127.0.0.1/test.twit_reduction -file twit_map.py -file twit_reduce.py

12/04/27 17:21:17 INFO streaming.MongoStreamJob: Running
12/04/27 17:21:17 INFO streaming.MongoStreamJob: Init
12/04/27 17:21:17 INFO streaming.MongoStreamJob: Process Args
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Setup Options'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: PreProcess Args
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Parse Options
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: '-mapper'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: 'twit_map.py'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: '-reducer'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: 'twit_reduce.py'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: '-inputURI'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/test.live'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: '-outputURI'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: 'mongodb://127.0.0.1/test.twit_reduction'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: '-file'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: 'twit_map.py'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: '-file'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Arg: 'twit_reduce.py'
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Add InputSpecs
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Setup output_
12/04/27 17:21:17 INFO streaming.StreamJobPatch: Post Process Args
12/04/27 17:21:17 INFO streaming.MongoStreamJob: Args processed.
2012-04-27 17:21:17.614 java[21100:1903] Unable to load realm info from SCDynamicStore
2012-04-27 17:21:17.752 java[21100:1903] Unable to load realm info from SCDynamicStore
12/04/27 17:21:18 INFO io.MongoIdentifierResolver: Resolving: bson
12/04/27 17:21:18 INFO io.MongoIdentifierResolver: Resolving: bson
12/04/27 17:21:18 INFO io.MongoIdentifierResolver: Resolving: bson
12/04/27 17:21:18 INFO io.MongoIdentifierResolver: Resolving: bson
packageJobJar: [twit_map.py, twit_reduce.py] [] /var/folders/wz/vmr658f56dn5ly79j7vzxy2c0000gq/T/streamjob668124738975197009.jar tmpDir=null
12/04/27 17:21:18 INFO streaming.MongoStreamJob: Input Format: com.mongodb.hadoop.mapred.MongoInputFormat@a50a649
12/04/27 17:21:18 INFO streaming.MongoStreamJob: Output Format: com.mongodb.hadoop.mapred.MongoOutputFormat@34d507e9
12/04/27 17:21:18 INFO streaming.MongoStreamJob: Key Class: class com.mongodb.hadoop.io.BSONWritable
12/04/27 17:21:18 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
12/04/27 17:21:18 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/04/27 17:21:18 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
12/04/27 17:21:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/04/27 17:21:18 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/04/27 17:21:18 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/04/27 17:21:18 INFO util.MongoSplitter:  Calculate Splits Code ... Use Shards? false, Use Chunks? true; Collection Sharded? false
12/04/27 17:21:18 INFO util.MongoSplitter: Creation of Input Splits is enabled.
12/04/27 17:21:18 INFO util.MongoSplitter: Using Unsharded Split mode (Calculating multiple splits though)
12/04/27 17:21:18 INFO util.MongoSplitter: Calculating unsharded input splits on namespace 'test.live' with Split Key '{ "_id" : 1}' and a split size of '8'mb per
12/04/27 17:21:18 INFO util.MongoSplitter: Calculated 5 splits.
12/04/27 17:21:18 INFO input.MongoInputSplit: Creating a new MongoInputSplit for MongoURI 'mongodb://127.0.0.1/test.live', query: '{ "$query" : { } , "$max" : { "_id" : { "$oid" : "4f981eb458407f60e02ec0ab"}}}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
12/04/27 17:21:18 INFO input.MongoInputSplit: Creating a new MongoInputSplit for MongoURI 'mongodb://127.0.0.1/test.live', query: '{ "$query" : { } , "$min" : { "_id" : { "$oid" : "4f981eb458407f60e02ec0ab"}} , "$max" : { "_id" : { "$oid" : "4f9916fa6ada108fa2771c7f"}}}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
12/04/27 17:21:18 INFO input.MongoInputSplit: Creating a new MongoInputSplit for MongoURI 'mongodb://127.0.0.1/test.live', query: '{ "$query" : { } , "$min" : { "_id" : { "$oid" : "4f9916fa6ada108fa2771c7f"}} , "$max" : { "_id" : { "$oid" : "4f9917486ada108fa27723e2"}}}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
12/04/27 17:21:18 INFO input.MongoInputSplit: Creating a new MongoInputSplit for MongoURI 'mongodb://127.0.0.1/test.live', query: '{ "$query" : { } , "$min" : { "_id" : { "$oid" : "4f9917486ada108fa27723e2"}} , "$max" : { "_id" : { "$oid" : "4f9917b16ada108fa2772b45"}}}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
12/04/27 17:21:18 INFO input.MongoInputSplit: Creating a new MongoInputSplit for MongoURI 'mongodb://127.0.0.1/test.live', query: '{ "$query" : { } , "$min" : { "_id" : { "$oid" : "4f9917b16ada108fa2772b45"}} , "$max" : { "_id" : { "$oid" : "4f9918596ada108fa27732a8"}}}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
12/04/27 17:21:18 INFO input.MongoInputSplit: Creating a new MongoInputSplit for MongoURI 'mongodb://127.0.0.1/test.live', query: '{ "$query" : { } , "$min" : { "_id" : { "$oid" : "4f9918596ada108fa27732a8"}}}', fieldSpec: '{ }', sort: '{ }', limit: 0, skip: 0 .
12/04/27 17:21:18 INFO mapreduce.JobSubmitter: number of splits:6
12/04/27 17:21:19 WARN mapred.LocalDistributedCacheManager: LocalJobRunner does not support symlinking into current working dir.
12/04/27 17:21:19 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
12/04/27 17:21:19 INFO mapred.LocalJobRunner: OutputCommitter set in config null
12/04/27 17:21:19 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
12/04/27 17:21:19 INFO mapreduce.Job: Running job: job_local_0001
12/04/27 17:21:19 INFO mapred.LocalJobRunner: Waiting for map tasks
12/04/27 17:21:19 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000000_0
12/04/27 17:21:19 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
12/04/27 17:21:19 INFO mapred.MapTask: numReduceTasks: 1
12/04/27 17:21:19 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/04/27 17:21:19 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/04/27 17:21:19 INFO mapred.MapTask: soft limit at 83886080
12/04/27 17:21:19 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/04/27 17:21:19 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/04/27 17:21:19 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/tkid/Projects/mongo-hadoop/target/./twit_map.py]
12/04/27 17:21:19 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/04/27 17:21:20 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:20 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:20 INFO mapreduce.Job: Job job_local_0001 running in uber mode : false
12/04/27 17:21:20 INFO mapreduce.Job:  map 0% reduce 0%
12/04/27 17:21:20 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:21 INFO streaming.PipeMapRed: Records R/W=518/1
12/04/27 17:21:22 INFO streaming.PipeMapRed: R/W/S=1000/728/0 in:500=1000/2 [rec/s] out:364=728/2 [rec/s]
12/04/27 17:21:23 INFO input.MongoRecordReader: Cursor exhausted.
Done Mapping.
12/04/27 17:21:23 INFO streaming.PipeMapRed: MRErrorThread done
12/04/27 17:21:23 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/04/27 17:21:23 INFO streaming.PipeMapRed: mapRedFinished
12/04/27 17:21:23 INFO mapred.LocalJobRunner: 
12/04/27 17:21:23 INFO mapred.MapTask: Starting flush of map output
12/04/27 17:21:23 INFO mapred.MapTask: Spilling map output
12/04/27 17:21:23 INFO mapred.MapTask: bufstart = 0; bufend = 109200; bufvoid = 104857600
12/04/27 17:21:23 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26206840(104827360); length = 7557/6553600
12/04/27 17:21:24 INFO mapred.MapTask: Finished spill 0
12/04/27 17:21:24 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/04/27 17:21:24 INFO mapred.LocalJobRunner: Records R/W=518/1
12/04/27 17:21:24 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.
12/04/27 17:21:24 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000000_0
12/04/27 17:21:24 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000001_0
12/04/27 17:21:24 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
12/04/27 17:21:24 INFO mapred.MapTask: numReduceTasks: 1
12/04/27 17:21:24 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/04/27 17:21:24 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/04/27 17:21:24 INFO mapred.MapTask: soft limit at 83886080
12/04/27 17:21:24 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/04/27 17:21:24 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/04/27 17:21:24 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/tkid/Projects/mongo-hadoop/target/./twit_map.py]
12/04/27 17:21:24 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:24 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:25 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:25 INFO mapreduce.Job:  map 100% reduce 0%
12/04/27 17:21:26 INFO streaming.PipeMapRed: Records R/W=516/1
12/04/27 17:21:26 INFO streaming.PipeMapRed: R/W/S=1000/801/0 in:1000=1000/1 [rec/s] out:801=801/1 [rec/s]
12/04/27 17:21:27 INFO input.MongoRecordReader: Cursor exhausted.
Done Mapping.
12/04/27 17:21:27 INFO streaming.PipeMapRed: MRErrorThread done
12/04/27 17:21:27 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/04/27 17:21:27 INFO streaming.PipeMapRed: mapRedFinished
12/04/27 17:21:27 INFO mapred.LocalJobRunner: 
12/04/27 17:21:27 INFO mapred.MapTask: Starting flush of map output
12/04/27 17:21:27 INFO mapred.MapTask: Spilling map output
12/04/27 17:21:27 INFO mapred.MapTask: bufstart = 0; bufend = 105925; bufvoid = 104857600
12/04/27 17:21:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26206836(104827344); length = 7561/6553600
12/04/27 17:21:27 INFO mapred.MapTask: Finished spill 0
12/04/27 17:21:27 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/04/27 17:21:27 INFO mapred.LocalJobRunner: Records R/W=516/1
12/04/27 17:21:27 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.
12/04/27 17:21:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000001_0
12/04/27 17:21:27 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000002_0
12/04/27 17:21:27 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
12/04/27 17:21:27 INFO mapred.MapTask: numReduceTasks: 1
12/04/27 17:21:28 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/04/27 17:21:28 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/04/27 17:21:28 INFO mapred.MapTask: soft limit at 83886080
12/04/27 17:21:28 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/04/27 17:21:28 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/04/27 17:21:28 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/tkid/Projects/mongo-hadoop/target/./twit_map.py]
12/04/27 17:21:28 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:28 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:28 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:29 INFO streaming.PipeMapRed: Records R/W=557/1
12/04/27 17:21:29 INFO streaming.PipeMapRed: R/W/S=1000/505/0 in:1000=1000/1 [rec/s] out:505=505/1 [rec/s]
12/04/27 17:21:30 INFO input.MongoRecordReader: Cursor exhausted.
Done Mapping.
12/04/27 17:21:30 INFO streaming.PipeMapRed: MRErrorThread done
12/04/27 17:21:30 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/04/27 17:21:30 INFO streaming.PipeMapRed: mapRedFinished
12/04/27 17:21:30 INFO mapred.LocalJobRunner: 
12/04/27 17:21:30 INFO mapred.MapTask: Starting flush of map output
12/04/27 17:21:30 INFO mapred.MapTask: Spilling map output
12/04/27 17:21:30 INFO mapred.MapTask: bufstart = 0; bufend = 103041; bufvoid = 104857600
12/04/27 17:21:30 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26206836(104827344); length = 7561/6553600
12/04/27 17:21:30 INFO mapred.MapTask: Finished spill 0
12/04/27 17:21:30 INFO mapred.Task: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
12/04/27 17:21:30 INFO mapred.LocalJobRunner: Records R/W=557/1
12/04/27 17:21:30 INFO mapred.Task: Task 'attempt_local_0001_m_000002_0' done.
12/04/27 17:21:30 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000002_0
12/04/27 17:21:30 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000003_0
12/04/27 17:21:30 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
12/04/27 17:21:30 INFO mapred.MapTask: numReduceTasks: 1
12/04/27 17:21:31 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/04/27 17:21:31 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/04/27 17:21:31 INFO mapred.MapTask: soft limit at 83886080
12/04/27 17:21:31 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/04/27 17:21:31 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/04/27 17:21:31 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/tkid/Projects/mongo-hadoop/target/./twit_map.py]
12/04/27 17:21:31 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:31 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:31 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:32 INFO streaming.PipeMapRed: Records R/W=527/1
12/04/27 17:21:32 INFO streaming.PipeMapRed: R/W/S=1000/499/0 in:1000=1000/1 [rec/s] out:499=499/1 [rec/s]
12/04/27 17:21:33 INFO input.MongoRecordReader: Cursor exhausted.
Done Mapping.
12/04/27 17:21:33 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/04/27 17:21:33 INFO streaming.PipeMapRed: MRErrorThread done
12/04/27 17:21:33 INFO streaming.PipeMapRed: mapRedFinished
12/04/27 17:21:33 INFO mapred.LocalJobRunner: 
12/04/27 17:21:33 INFO mapred.MapTask: Starting flush of map output
12/04/27 17:21:33 INFO mapred.MapTask: Spilling map output
12/04/27 17:21:33 INFO mapred.MapTask: bufstart = 0; bufend = 102427; bufvoid = 104857600
12/04/27 17:21:33 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26206836(104827344); length = 7561/6553600
12/04/27 17:21:33 INFO mapred.MapTask: Finished spill 0
12/04/27 17:21:33 INFO mapred.Task: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
12/04/27 17:21:33 INFO mapred.LocalJobRunner: Records R/W=527/1
12/04/27 17:21:33 INFO mapred.Task: Task 'attempt_local_0001_m_000003_0' done.
12/04/27 17:21:33 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000003_0
12/04/27 17:21:33 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000004_0
12/04/27 17:21:33 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
12/04/27 17:21:33 INFO mapred.MapTask: numReduceTasks: 1
12/04/27 17:21:33 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/04/27 17:21:33 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/04/27 17:21:33 INFO mapred.MapTask: soft limit at 83886080
12/04/27 17:21:33 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/04/27 17:21:33 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/04/27 17:21:33 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/tkid/Projects/mongo-hadoop/target/./twit_map.py]
12/04/27 17:21:34 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:34 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:34 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:34 INFO streaming.PipeMapRed: Records R/W=559/1
12/04/27 17:21:35 INFO streaming.PipeMapRed: R/W/S=1000/515/0 in:1000=1000/1 [rec/s] out:515=515/1 [rec/s]
12/04/27 17:21:36 INFO input.MongoRecordReader: Cursor exhausted.
Done Mapping.
12/04/27 17:21:36 INFO streaming.PipeMapRed: MRErrorThread done
12/04/27 17:21:36 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/04/27 17:21:36 INFO streaming.PipeMapRed: mapRedFinished
12/04/27 17:21:36 INFO mapred.LocalJobRunner: 
12/04/27 17:21:36 INFO mapred.MapTask: Starting flush of map output
12/04/27 17:21:36 INFO mapred.MapTask: Spilling map output
12/04/27 17:21:36 INFO mapred.MapTask: bufstart = 0; bufend = 99915; bufvoid = 104857600
12/04/27 17:21:36 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26206836(104827344); length = 7561/6553600
12/04/27 17:21:36 INFO mapreduce.Job:  map 66% reduce 0%
12/04/27 17:21:36 INFO mapred.MapTask: Finished spill 0
12/04/27 17:21:36 INFO mapred.Task: Task:attempt_local_0001_m_000004_0 is done. And is in the process of commiting
12/04/27 17:21:36 INFO mapred.LocalJobRunner: Records R/W=559/1
12/04/27 17:21:36 INFO mapred.Task: Task 'attempt_local_0001_m_000004_0' done.
12/04/27 17:21:36 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000004_0
12/04/27 17:21:36 INFO mapred.LocalJobRunner: Starting task: attempt_local_0001_m_000005_0
12/04/27 17:21:36 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
12/04/27 17:21:36 INFO mapred.MapTask: numReduceTasks: 1
12/04/27 17:21:36 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
12/04/27 17:21:36 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
12/04/27 17:21:36 INFO mapred.MapTask: soft limit at 83886080
12/04/27 17:21:36 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
12/04/27 17:21:36 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
12/04/27 17:21:36 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/tkid/Projects/mongo-hadoop/target/./twit_map.py]
12/04/27 17:21:36 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:36 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:37 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:37 INFO mapreduce.Job:  map 100% reduce 0%
12/04/27 17:21:37 INFO streaming.PipeMapRed: Records R/W=552/1
12/04/27 17:21:38 INFO streaming.PipeMapRed: R/W/S=1000/504/0 in:1000=1000/1 [rec/s] out:504=504/1 [rec/s]
12/04/27 17:21:38 INFO input.MongoRecordReader: Cursor exhausted.
Done Mapping.
12/04/27 17:21:38 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/04/27 17:21:38 INFO streaming.PipeMapRed: MRErrorThread done
12/04/27 17:21:38 INFO streaming.PipeMapRed: mapRedFinished
12/04/27 17:21:38 INFO mapred.LocalJobRunner: 
12/04/27 17:21:38 INFO mapred.MapTask: Starting flush of map output
12/04/27 17:21:38 INFO mapred.MapTask: Spilling map output
12/04/27 17:21:38 INFO mapred.MapTask: bufstart = 0; bufend = 90089; bufvoid = 104857600
12/04/27 17:21:38 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26207708(104830832); length = 6689/6553600
12/04/27 17:21:39 INFO mapred.MapTask: Finished spill 0
12/04/27 17:21:39 INFO mapred.Task: Task:attempt_local_0001_m_000005_0 is done. And is in the process of commiting
12/04/27 17:21:39 INFO mapred.LocalJobRunner: Records R/W=552/1
12/04/27 17:21:39 INFO mapred.Task: Task 'attempt_local_0001_m_000005_0' done.
12/04/27 17:21:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local_0001_m_000005_0
12/04/27 17:21:39 INFO mapred.LocalJobRunner: Map task executor complete.
12/04/27 17:21:39 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
12/04/27 17:21:39 INFO mapred.Merger: Merging 6 sorted segments
12/04/27 17:21:39 INFO mapred.Merger: Down to the last merge-pass, with 6 segments left of total size: 632791 bytes
12/04/27 17:21:39 INFO mapred.LocalJobRunner: 
12/04/27 17:21:39 INFO streaming.PipeMapRed: PipeMapRed exec [/Users/tkid/Projects/mongo-hadoop/target/./twit_reduce.py]
12/04/27 17:21:39 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:39 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:39 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
12/04/27 17:21:39 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
Processing Timezon None
Processing Timezon Abu Dhabi
Processing Timezon Adelaide
Processing Timezon Alaska
Processing Timezon Almaty
Processing Timezon Amsterdam
Processing Timezon Arizona
Processing Timezon Astana
Processing Timezon Athens
Processing Timezon Atlantic Time (Canada)
Processing Timezon Auckland
Processing Timezon Azores
Processing Timezon Baghdad
Processing Timezon Bangkok
Processing Timezon Beijing
Processing Timezon Belgrade
Processing Timezon Berlin
Processing Timezon Bern
Processing Timezon Bogota
Processing Timezon Brasilia
Processing Timezon Brisbane
Processing Timezon Brussels
Processing Timezon Bucharest
Processing Timezon Budapest
Processing Timezon Buenos Aires
Processing Timezon Cairo
Processing Timezon Canberra
Processing Timezon Cape Verde Is.
Processing Timezon Caracas
Processing Timezon Casablanca
Processing Timezon Central America
Processing Timezon Central Time (US & Canada)
Processing Timezon Chennai
Processing Timezon Chihuahua
Processing Timezon Copenhagen
Processing Timezon Dhaka
Processing Timezon Dublin
Processing Timezon Eastern Time (US & Canada)
Processing Timezon Edinburgh
Processing Timezon Ekaterinburg
Processing Timezon Fiji
Processing Timezon Georgetown
Processing Timezon Greenland
Processing Timezon Guadalajara
Processing Timezon Guam
Processing Timezon Hanoi
Processing Timezon Harare
Processing Timezon Hawaii
Processing Timezon Helsinki
Processing Timezon Hong Kong
Processing Timezon Indiana (East)
Processing Timezon International Date Line West
Processing Timezon Irkutsk
Processing Timezon Islamabad
Processing Timezon Istanbul
Processing Timezon Jakarta
Processing Timezon Jerusalem
Processing Timezon Kabul
Processing Timezon Karachi
Processing Timezon Kathmandu
Processing Timezon Kuala Lumpur
Processing Timezon Kuwait
Processing Timezon Kyiv
Processing Timezon La Paz
Processing Timezon Lima
Processing Timezon Lisbon
Processing Timezon Ljubljana
Processing Timezon London
Processing Timezon Madrid
Processing Timezon Mazatlan
Processing Timezon Melbourne
Processing Timezon Mexico City
Processing Timezon Mid-Atlantic
Processing Timezon Minsk
Processing Timezon Monterrey
Processing Timezon Moscow
Processing Timezon Mountain Time (US & Canada)
Processing Timezon Mumbai
Processing Timezon Muscat
Processing Timezon Nairobi
Processing Timezon New Caledonia
Processing Timezon New Delhi
Processing Timezon Novosibirsk
Processing Timezon Nuku'alofa
Processing Timezon Osaka
Processing Timezon Pacific Time (US & Canada)
12/04/27 17:21:40 INFO streaming.PipeMapRed: R/W/S=10000/0/0 in:10000=10000/1 [rec/s] out:0=0/1 [rec/s]
Processing Timezon Paris
Processing Timezon Perth
Processing Timezon Prague
Processing Timezon Pretoria
Processing Timezon Quito
Processing Timezon Riga
Processing Timezon Riyadh
Processing Timezon Rome
Processing Timezon Santiago
Processing Timezon Sapporo
Processing Timezon Sarajevo
Processing Timezon Seoul
Processing Timezon Singapore
Processing Timezon St. Petersburg
Processing Timezon Stockholm
Processing Timezon Sydney
Processing Timezon Taipei
Processing Timezon Tallinn
Processing Timezon Tashkent
Processing Timezon Tehran
Processing Timezon Tijuana
Processing Timezon Tokyo
Processing Timezon Ulaan Bataar
Processing Timezon Vienna
Processing Timezon Warsaw
Processing Timezon Wellington
Processing Timezon West Central Africa
Processing Timezon Yakutsk
Processing Timezon Zagreb
12/04/27 17:21:40 INFO streaming.PipeMapRed: MRErrorThread done
12/04/27 17:21:40 INFO streaming.PipeMapRed: Records R/W=11127/1
12/04/27 17:21:40 INFO io.BSONWritable: No Length Header available.java.io.EOFException
12/04/27 17:21:40 INFO streaming.PipeMapRed: mapRedFinished
12/04/27 17:21:40 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/04/27 17:21:40 INFO mapred.LocalJobRunner: Records R/W=11127/1 > reduce
12/04/27 17:21:40 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.
12/04/27 17:21:40 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File file:/tmp/_temporary/0 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:315)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1249)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1289)
at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:540)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1249)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1289)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:262)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:302)
at org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)
at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:208)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:455)
12/04/27 17:21:41 INFO mapreduce.Job:  map 100% reduce 100%
12/04/27 17:21:41 INFO mapreduce.Job: Job job_local_0001 failed with state FAILED due to: NA
12/04/27 17:21:41 INFO mapreduce.Job: Counters: 29
File System Counters
FILE: Number of bytes read=669175
FILE: Number of bytes written=3766514
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=0
HDFS: Number of bytes written=0
HDFS: Number of read operations=0
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Map-Reduce Framework
Map input records=11127
Map output records=11127
Map output bytes=610597
Map output materialized bytes=632887
Input split bytes=1306
Combine input records=0
Combine output records=0
Reduce input groups=115
Reduce shuffle bytes=0
Reduce input records=11127
Reduce output records=115
Spilled Records=22254
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=2262
Total committed heap usage (bytes)=1720918016
File Input Format Counters 
Bytes Read=0
File Output Format Counters 
Bytes Written=0
12/04/27 17:21:41 ERROR streaming.StreamJob: Job not Successful!
MongoDB Streaming Command Failed!


-- 
robee

Brendan W. McAdams

unread,
Apr 27, 2012, 10:58:51 AM4/27/12
to mongod...@googlegroups.com
The previous poster indicated that despite this error, the data was
succesfully written to his mongodb output collection. Can you verify
this?

We are looking into the filesystem error separately but as we do not
use the filesystem at any time, believe it to be a false error and the
job to have succeeded.

robee

unread,
Apr 27, 2012, 11:00:26 AM4/27/12
to mongod...@googlegroups.com
Oh right, the data successfully written to mongodb output collection.

-- 
robee

Tyler Brock

unread,
May 10, 2012, 8:54:07 AM5/10/12
to mongodb-user
I'm having the same problem... any updates on this issue?

-Tyler

On Apr 27, 11:00 am, robee <muhammad.ro...@gmail.com> wrote:
> Oh right, the data successfully written to mongodb output collection.
>
> --
> robee
>
>
>
>
>
>
>
> On Friday, 27 April 2012 at 21:58, Brendan W. McAdams wrote:
> > The previous poster indicated that despite this error, the data was
> > succesfully written to his mongodb output collection. Can you verify
> > this?
>
> > We are looking into the filesystem error separately but as we do not
> > use the filesystem at any time, believe it to be a false error and the
> > job to have succeeded.
>
> > > 12/04/27 17:21:18 WARN conf.Configuration: fs.default.name (http://fs.default.name) is deprecated.
> > > 12/04/27 17:21:19 WARN conf.Configuration: fs.default.name (http://fs.default.name) is deprecated.
> ...
>
> read more »

Tyler Brock

unread,
May 10, 2012, 9:15:00 AM5/10/12
to mongodb-user
So the issue is hadoop 0.23.1, if you downgrade to 0.23.0 it works
just fine.

You can use this homebrew formula if you want:
https://github.com/TylerBrock/evil-formulas/blob/master/0.23.0/hadoop.rb

brew uninstall hadoop, then brew install
https://raw.github.com/TylerBrock/evil-formulas/master/0.23.0/hadoop.rb

Cheers,

Tyler
> ...
>
> read more »

Jesse Sanford

unread,
Aug 23, 2012, 3:07:52 PM8/23/12
to mongod...@googlegroups.com
I am seeing something similar when using cdh 0.20.2 with cdh3

2012-08-23 14:35:41,708 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
2012-08-23 14:35:42,163 INFO com.mongodb.hadoop.io.BSONWritable: No Length Header available.java.io.EOFException
2012-08-23 14:35:42,163 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2012-08-23 14:35:42,163 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2012-08-23 14:35:42,184 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-08-23 14:35:42,202 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
	at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:479)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)
Reply all
Reply to author
Forward
0 new messages