mongo.exe hangs in mapReduce() indefinitely

182 views
Skip to first unread message

Guy Pitelko

unread,
May 9, 2013, 5:46:40 AM5/9/13
to mongod...@googlegroups.com
Every time I use mongo.exe to perform a mapReduce the shell hangs in the db.nyCol.mapReduce() function even after the mapReduce operation has ended.

I can see in the mongo logs and the resulting table that the map-reduce has ended, but the shell remains stuck in the mapReduce() call. This happens everytime for a specific mapReduce operation.

Any ideas why ?

Asya Kamsky

unread,
May 12, 2013, 1:25:47 AM5/12/13
to mongod...@googlegroups.com
If it only happens for a specific mapreduce operation but not for others, maybe you can tell us how this one differs from the others that don't hang?

Guy Pitelko

unread,
May 12, 2013, 1:40:13 AM5/12/13
to mongod...@googlegroups.com
The only thing special is that it takes a long time. Pretty much every MR that takes more than an hour will leave the mongo.exe hanging in the mapReduce() call even after the server has completed the op.

Asya Kamsky

unread,
May 13, 2013, 1:47:54 PM5/13/13
to mongod...@googlegroups.com
This is on Windows?

And mongo shell is running in a cmd window?  Are you sure it's not something weird like clicking in the window (if you do a select/mark for select then the terminal will freeze until you hit enter)...

If it's not something like that, can you check in process monitor and see what mongo.exe process is doing if anything?

Asya

Guy Pitelko

unread,
May 13, 2013, 1:54:50 PM5/13/13
to mongod...@googlegroups.com
mongod is running on windows server 2012 (Azure VM), mongo.exe client is running on windows 7. 
mongo.exe is running via a cmd window (using a batch file to pass arguments). It's not cmd selection/clicking, we've checked that. Looking at the process monitor doesn't reveal anything abnormal.
We're also using Database Master 5 to query mongodb and it too gets stuck this way sometimes on long operations (like mapReduce or building an index). In this case I can't really tell who's fault it is, but this makes me believe that the problem might be on the server side.

Asya Kamsky

unread,
May 13, 2013, 2:25:45 PM5/13/13
to mongod...@googlegroups.com
I know logging can be voluminous when set higher than default, but I would say you might want to set server logging to more verbose (1 or 2) and see what's happening on the server when the client hangs like this.   Also output of db.currentOp() would be helpful if you can get it.

Guy Pitelko

unread,
May 13, 2013, 4:46:24 PM5/13/13
to mongod...@googlegroups.com
ok, so I've disconnected all clients from the db, reset the server with verbose logging (vv=true), and ran a mapReduce.
The same problem occured again. The mapReduce operations has ended, this can be seen both in the logs and in the resulting table, but the mongo.exe running the mapReduce has not returned from the call to mapReduce.
db.currentOp() returns an empty array after completion.

Attached:
- a log file for this run.
- the mapReduce script (never reaching the "End.." print)
- the execution batch.

Both the client and the server mongo instances are 2.4.3.
logfile.zip
batch and script.zip

Guy Pitelko

unread,
May 13, 2013, 6:23:36 PM5/13/13
to mongod...@googlegroups.com
One more thing, this is the output of the mongo shell:

MongoDB shell version: 2.4.3
Mon May 13 23:08:18.714 versionArrayTest passed
connecting to:=======================
Mon May 13 23:08:18.765 creating new connection to:================
Mon May 13 23:08:18.884 BackgroundJob starting: ConnectBG
Mon May 13 23:08:19.043 connected connection!
Start ......................................
[.............................................................. Two hours of silence ..................................................................]
Tue May 14 01:08:29.799 Socket recv() errno:10060 A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ===============
Tue May 14 01:08:29.799 SocketException: remote:============== error: 9001
socket exception [1] server [================]
Tue May 14 01:08:29.799 DBClientCursor::init call() failed
Tue May 14 01:08:29.800 JavaScript execution failed: Error: error doing query: failed at src/mongo/shell/query.js:L78
failed to load: script.js
Tue May 14 01:08:29.803 freeing 1 uncollected class mongo::DBClientWithCommands
objects

Asya Kamsky

unread,
May 22, 2013, 1:32:30 PM5/22/13
to mongod...@googlegroups.com
I haven't been able to reproduce this at all.   I even added long sleeps inside of my map and reduce functions which are like yours and I'm running this on windows and I just can't get it to hang.  Eventually when it's finished the shell gets control back...  My shell and mongod are on the same machine - is it possible that your TCP connection from mongo to the shell is somehow getting ... into a bad state?   I'm a bit lost as to how we can proceed without being able to reproduce this.

Guy Pitelko

unread,
May 23, 2013, 11:19:34 AM5/23/13
to mongod...@googlegroups.com
I've ran the mapReduce directly on the server and they indeed don't hang.
It's seems to only hang when running remotely. 
I'm not experiencing any other TCP problems except this.

I just recalled something that might help:
Our mongo server is a hosted Azure VM. 
We've had hanging problems in the C# driver, and I used these resources to solve them:

Specifically, setting these values when using the mongo c# driver:
            ServicePointManager.SetTcpKeepAlive(true, 30 * 1000, 30 * 1000);
            MongoDefaults.MaxConnectionIdleTime = TimeSpan.FromSeconds(50);

            MongoDefaults.MaxConnectionPoolSize = 500;
            MongoDefaults.ConnectTimeout = TimeSpan.FromSeconds(60 * 5);
            MongoDefaults.SocketTimeout = TimeSpan.FromSeconds(60 * 5);

It's possible the the mongo.exe shell suffers from the same problem regarding tcp keep alive timeouts, and doesn't not handle such a scenario well.

It might also be worth while to change the defaults in the c# driver so that other apps (like DatabaseMaster) won't have the same problem (assuming this is the problem).

Can you reproduce this scenario ?
Reply all
Reply to author
Forward
0 new messages