mongodump sometimes fails when run from a cron script

1,776 views
Skip to first unread message

Tal Liron

unread,
Jun 9, 2011, 5:03:49 PM6/9/11
to mongod...@googlegroups.com
We're having problems backing up our MongoDB replica set using a script run from cron.

The script looks something like this:

#/bin/bash
mongodump --host rs/db1,db2 --db mydb --collection mycollection1 --out /backup/
mongodump --host rs/db1,db2 --db mydb --collection mycollection2 --out /backup/
mongodump --host rs/db1,db2 --db mydb --collection mycollection3 --out /backup/
mongodump --host rs/db1,db2 --db mydb --collection mycollection4 --out /backup/
mongodump --host rs/db1,db2 --db mydb --collection mycollection5 --out /backup/

Works fine when run from a terminal, but seems to stop in the middle *sometimes* when run from cron (on Ubuntu server). The strange thing is that it stops at different collections each time it runs.

I'm not entirely sure, but I have a feeling that this tends to happen when mongodump enters "showing progress mode" - when the dumping takes more than a few seconds, mongodump starts showing a percentage marker updated every second until it gets to 100%. I might be completely wrong, though.

Things we've tried:

1) Running with "script -c" to simulate a PTY (which cron doesn't have) - no effect
2) Using -vvvv in mongodump: the last output we see is "connected to rs/db1,db2", and after that the script dies
3) Connecting only to one db, not the whole rs - no effect
4) There is nothing relevant in /var/log/syslog

Help would be very much appreciated. We've spent a lot of time trying to debug this and find various workarounds. We simply cannot launch our product without reliable backups!

Eliot Horowitz

unread,
Jun 9, 2011, 9:10:39 PM6/9/11
to mongod...@googlegroups.com
What does the Cron entry look like?  Are you piping stderr?
--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Tal Liron

unread,
Jun 9, 2011, 11:11:12 PM6/9/11
to mongodb-user
I am piping stderr and teeing it to a file, but not seeing any errors.
When it gets to the death point, the output simply stops.

(I added this piping only later, when I found out about these errors
and tried to debug. The piping is not part of the problem.)

There is nothing in particular interesting about the cron entry, it
simply runs the bash script at root. It does not change the
environment in any way or cause redirects.

This StackOverflow post may be related:

http://stackoverflow.com/questions/4885581/mongodump-only-dumps-few-collections-when-run-from-script-compete-database-if-ru

The poster claims that it worked OK on CentOS, so it must be something
particular to the Debian/Ubuntu crond setup.

I'm still suspecting a PTY issue. Consider the gpg command's --no-tty
switch, designed specifically for running it in such terminal-less
environments,

On Jun 9, 8:10 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> What does the Cron entry look like?  Are you piping stderr?
>

Eliot Horowitz

unread,
Jun 11, 2011, 9:16:30 AM6/11/11
to mongod...@googlegroups.com
Can you try a different shell?

Cutter Brown

unread,
Jun 29, 2011, 12:36:09 PM6/29/11
to mongodb-user
I'm experiencing the same issue. Was there any resolution to this?
I've straced mongodump and it appears to be waiting on the network for
something. This is the end of the strace when it hangs:

write(1, "DATABASE: admin\t to \t/mnt/tmp/mo"..., 48DATABASE:
admin to /mnt/tmp/mongobackup/admin
) = 48
stat("/mnt/tmp/mongobackup/admin", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
stat("/mnt/tmp/mongobackup/admin", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
sendto(4, "9\0\0\0\337[]c
\377\377\377\377\324\7\0\0\24\0\0\0admin.system"..., 57, MSG_NOSIGNAL,
NULL, 0) = 57
recvfrom(4,
*hangs*

This is the normal output:

write(1, "DATABASE: admin\t to \t/mnt/tmp/mo"..., 48DATABASE:
admin to /mnt/tmp/mongobackup/admin
) = 48
stat("/mnt/tmp/mongobackup/admin", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
stat("/mnt/tmp/mongobackup/admin", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
sendto(4, "9\0\0\0\374\212u|
\377\377\377\377\324\7\0\0\24\0\0\0admin.system"..., 57, MSG_NOSIGNAL,
NULL, 0) = 57
recvfrom(4, "$\0\0\0", 4, MSG_NOSIGNAL, NULL, NULL) = 4
recvfrom(4, "od\246\n\374\212u|
\1\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 32, MSG_NOSIGNAL,
NULL, NULL) = 32
close(4) = 0
open("", O_WRONLY|O_CREAT|O_TRUNC, 0666) = -1 ENOENT (No such file or
directory)
munmap(0x7facf428c000, 4206592) = 0
exit_group(0)

Thanks!

Cutter

On Jun 9, 8:11 pm, Tal Liron <tal.li...@gmail.com> wrote:
> I am piping stderr and teeing it to a file, but not seeing any errors.
> When it gets to the death point, the output simply stops.
>
> (I added this piping only later, when I found out about these errors
> and tried to debug. The piping is not part of the problem.)
>
> There is nothing in particular interesting about the cron entry, it
> simply runs the bash script at root. It does not change the
> environment in any way or cause redirects.
>
> This StackOverflow post may be related:
>
> http://stackoverflow.com/questions/4885581/mongodump-only-dumps-few-c...
>
> The poster claims that it worked OK on CentOS, so it must be something
> particular to the Debian/Ubuntu crond setup.
>
> I'm still suspecting a PTY issue. Consider the gpg command's --no-tty
> switch, designed specifically for running it in such terminal-less
> environments,
>
> On Jun 9, 8:10 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
>
>
>
> > What does the Cron entry look like?  Are you piping stderr?
>
> > On Jun 9, 2011, at 5:03 PM, Tal Liron <tal.li...@gmail.com> wrote:
>
> > > We're having problems backing up our MongoDB replica set using a script run from cron.
>
> > > The script looks something like this:
>
> > > #/bin/bash
> > >mongodump--host rs/db1,db2 --db mydb --collection mycollection1 --out /backup/
> > >mongodump--host rs/db1,db2 --db mydb --collection mycollection2 --out /backup/
> > >mongodump--host rs/db1,db2 --db mydb --collection mycollection3 --out /backup/
> > >mongodump--host rs/db1,db2 --db mydb --collection mycollection4 --out /backup/
> > >mongodump--host rs/db1,db2 --db mydb --collection mycollection5 --out /backup/
>
> > > Works fine when run from a terminal, but seems to stop in the middle *sometimes* when run from cron (on Ubuntu server). The strange thing is that it stops at different collections each time it runs.
>
> > > I'm not entirely sure, but I have a feeling that this tends to happen whenmongodumpenters "showing progress mode" - when the dumping takes more than a few seconds,mongodumpstarts showing a percentage marker updated every second until it gets to 100%. I might be completely wrong, though.
>
> > > Things we've tried:
>
> > > 1) Running with "script -c" to simulate a PTY (which cron doesn't have) - no effect
> > > 2) Using -vvvv inmongodump: the last output we see is "connected to rs/db1,db2", and after that the script dies

Tal Liron

unread,
Jun 29, 2011, 2:25:58 PM6/29/11
to mongod...@googlegroups.com
Not only is there no resolution, but I've wasted countless hours on this.

Conclusions:

1. mongodump is essentially broken. It's also inaccurately documented:
somewhere between the wiki and the man page and help I saw one version
claiming that I could connect to a replica set via the --host argument,
but that's wrong. mongodump does not support reading from a replica set,
despite the fact that it accepts a replica set argument. It will read
from one node (the first) and one node only, and if that node is in a
bad state ("rollback") it can hang for minutes. So, I created a special
little script to find the current secondary node and dump from there,
but encountered the same issues: mongodump just dies with no error.

2. I tried mongoexport instead. And then encountered this bug:

https://jira.mongodb.org/browse/SERVER-2157


So, mongoexport is broken, too.


3. Eliot gives a good suggestion in the bug description "I would
recommend writing your own json exportor." So, that's the direction
we're going now. MongoDB's CLI tools are just not ready for production.
(Version 1.8.2.) To others I would say: at this point, deploying MongoDB
will likely involve a lot of homegrown solutions.

4. This story has been repeating itself throughout our MongoDB replica
set deployment experience. We have nodes stuck in "rollback" mode for
minutes or dying on us suddenly (we wrote a special script to test if
MongoDB is up and restart it if it's not). 10gen would do well to stop
the race to add more features, buckle down and stabilize their product.
(Not to mention get their documentation up to date and in better quality.)

Cutter Brown

unread,
Jun 29, 2011, 5:08:37 PM6/29/11
to mongodb-user
Thanks Tal, I'm currently working around it using 'timeout':

http://bashcurescancer.com/timeout-new-coreutils-command.html


On Jun 29, 11:25 am, Tal Liron <tal.li...@gmail.com> wrote:
> Not only is there no resolution, but I've wasted countless hours on this.
>
> Conclusions:
>
> 1.mongodumpis essentially broken. It's also inaccurately documented:
> somewhere between the wiki and the man page and help I saw one version
> claiming that I could connect to a replica set via the --host argument,
> but that's wrong.mongodumpdoes not support reading from a replica set,
> despite the fact that it accepts a replica set argument. It will read
> from one node (the first) and one node only, and if that node is in a
> bad state ("rollback") it can hang for minutes. So, I created a special
> little script to find the current secondary node and dump from there,
> but encountered the same issues:mongodumpjust dies with no error.
>
> 2. I tried mongoexport instead. And then encountered this bug:
>
> https://jira.mongodb.org/browse/SERVER-2157
>
> So, mongoexport is broken, too.
>
> 3. Eliot gives a good suggestion in the bug description "I would
> recommend writing your own json exportor." So, that's the direction
> we're going now. MongoDB's CLI tools are just not ready for production.
> (Version 1.8.2.) To others I would say: at this point, deploying MongoDB
> will likely involve a lot of homegrown solutions.
>
> 4. This story has been repeating itself throughout our MongoDB replica
> set deployment experience. We have nodes stuck in "rollback" mode for
> minutes or dying on us suddenly (we wrote a special script to test if
> MongoDB is up and restart it if it's not). 10gen would do well to stop
> the race to add more features, buckle down and stabilize their product.
> (Not to mention get their documentation up to date and in better quality.)
>
> On 06/29/2011 11:36 AM, Cutter Brown wrote:
>
>
>
> > I'm experiencing the same issue.  Was there any resolution to this?
> > I've stracedmongodumpand it appears to be waiting on the network for

Eliot Horowitz

unread,
Jun 29, 2011, 5:16:48 PM6/29/11
to mongod...@googlegroups.com
Can you send the output and try with -v?
Also, can you connect to the database and run show dbs?

Eliot Horowitz

unread,
Jun 29, 2011, 5:18:09 PM6/29/11
to mongod...@googlegroups.com
Sorry you've been having issues, but there a lot of people using
mongodump in the same way you are.
There is something going in your case that is odd, especially since it
works from a shell.
Did you ever try a different shell?

Re replica set nodes dying, is there any log message?
If not, is there anything in /var/log/messages killing processes?

Tal Liron

unread,
Jun 29, 2011, 5:28:10 PM6/29/11
to mongod...@googlegroups.com

Sorry for not updating you earlier --


I did not try a different shell, because I found out that the issue was not just in cron, but also running the script plainly (my theory about tty was only partially true; even after I got around that problem mongodump would still fail on occasion).


It could definitely be an issue specific to our setup. In particular we found that trying to dump a map-reduce-produced collection is especially prone to mongodump failing. And so I explicitly dump the "regular" collections calling mongodump once for every single one of them (would be nice if we could specify a list of collections in the command line!). Still, some of these "regular" collections cause mongodump to fail with no error output.


Perhaps another approach to increase stability could be to have a dedicated read-only node on the replica set, and do all mongodumps from it after taking it out of the replica set, rejoining it when we're done. But again this is quite elaborate (and possibly expensive) to set up, and would require quite a bit of custom scripting.


I'd love to help you debug this more, but quite honestly we spent far too much time on this approach.


Your suggestion to roll our own export tool is not a bad one: I'm working on a really nice multi-threaded tool similar to mongoexport, which I will make publicly available soon.

Cutter Brown

unread,
Jun 29, 2011, 5:32:50 PM6/29/11
to mongodb-user


On Jun 29, 2:16 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Can you send the output and try with -v?

sure, appears to be hanging in a couple different places:

# ./backup_mongo.sh
snapshotting the db and creating archive
MongoDB shell version: 1.8.1
connecting to: admin
> db.runCommand({fsync:1,lock:1});
{
"info" : "now locked against writes, use db.
$cmd.sys.unlock.findOne() to unlock",
"ok" : 1
}
> bye
Wed Jun 29 21:25:28 creating new connection to:127.0.0.1
Wed Jun 29 21:25:28 BackgroundJob starting:
connected to: 127.0.0.1
all dbs
*hangs*

# ./backup_mongo.sh
snapshotting the db and creating archive
MongoDB shell version: 1.8.1
connecting to: admin
> db.runCommand({fsync:1,lock:1});
{
"info" : "now locked against writes, use db.
$cmd.sys.unlock.findOne() to unlock",
"ok" : 1
}
> bye
Wed Jun 29 21:29:20 creating new connection to:127.0.0.1
Wed Jun 29 21:29:20 BackgroundJob starting:
connected to: 127.0.0.1
all dbs
DATABASE: anchor_production to /mnt/tmp/mongobackup/
anchor_production
anchor_production.listings to /mnt/tmp/mongobackup/
anchor_production/listings.bson
425 objects
anchor_production.system.indexes to /mnt/tmp/mongobackup/
anchor_production/system.indexes.bson
1 objects
DATABASE: lagunitas_production to /mnt/tmp/mongobackup/
lagunitas_production
lagunitas_production.notifications to /mnt/tmp/mongobackup/
lagunitas_production/notifications.bson
764 objects
lagunitas_production.system.indexes to /mnt/tmp/mongobackup/
lagunitas_production/system.indexes.bson
2 objects
lagunitas_production.activities to /mnt/tmp/mongobackup/
lagunitas_production/activities.bson
7824 objects
DATABASE: admin to /mnt/tmp/mongobackup/admin
*hangs*

It hangs about 1 out of every 10 to 15 times. Every other time is
fine.

> Also, can you connect to the database and run show dbs?

# echo "show dbs" | mongo admin
MongoDB shell version: 1.8.1
connecting to: admin
> show dbs
admin (empty)
anchor_production 0.203125GB
lagunitas_production 0.203125GB
local 0.203125GB
> bye

Eliot Horowitz

unread,
Jun 29, 2011, 5:34:54 PM6/29/11
to mongod...@googlegroups.com
When it hangs, can you run db.currentOp() against the db?
Are you doing this on a primary or seconday?

Cutter Brown

unread,
Jun 29, 2011, 5:47:57 PM6/29/11
to mongodb-user


On Jun 29, 2:34 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> When it hangs, can you run db.currentOp() against the db?
> Are you doing this on a primary or seconday?

its a slave:

# echo "db.currentOp() " | mongo admin
MongoDB shell version: 1.8.1
connecting to: admin
> db.currentOp()
{
"inprog" : [
{
"opid" : 10329,
"active" : true,
"lockType" : "write",
"waitingForLock" : true,
"secs_running" : 1928,
"op" : "none",
"ns" : "local.sources",
"query" : {

},
"client" : "(NONE)",
"desc" : "replslave"
},
{
"opid" : 17844,
"active" : true,
"lockType" : "read",
"waitingForLock" : true,
"secs_running" : 7,
"op" : "query",
"ns" : "?",
"query" : {
"listDatabases" : 1
},
"client" : "127.0.0.1:33103",
"desc" : "conn"
}
],
"fsyncLock" : 1,
"info" : "use db.$cmd.sys.unlock.findOne() to terminate the
fsync write/snapshot lock"
> >> > To post to this group, send email to mongod...@googlegroups.com....
>
> read more »

Eliot Horowitz

unread,
Jun 29, 2011, 5:50:17 PM6/29/11
to mongod...@googlegroups.com
Sorry - hopefully last question - is this a replica set or master/slave?

Cutter Brown

unread,
Jun 29, 2011, 6:07:08 PM6/29/11
to mongodb-user
master/slave (this is the slave)
> >> >> >> >>> What does the Cron entry look...
>
> read more »

Eliot Horowitz

unread,
Jun 29, 2011, 6:30:46 PM6/29/11
to mongod...@googlegroups.com
Can you try doing the mongodump without fysync + lock?
I think there might be an issue with that causing a deadlock.

Cutter Brown

unread,
Jun 29, 2011, 6:38:25 PM6/29/11
to mongodb-user
Yeah I can't reproduce with without the fysync + lock.
> >> >> >> >> >> and tried to debug....
>
> read more »

Cutter Brown

unread,
Jun 29, 2011, 11:15:34 PM6/29/11
to mongodb-user
I'm assuming that since its documented everywhere to use fysync + lock
before mongodump, that I should continue with my workaround (timeout)
for now, correct?

Thanks!

cb

On Jun 29, 3:38 pm, Cutter Brown <cut...@copious.com> wrote:
> Yeah I can't reproduce with without the fysync + lock.
>
> On Jun 29, 3:30 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
>
>
>
> > Can you try doing themongodumpwithout fysync + lock?
> > >> >> >> >> > NULL, NULL) = 32...
>
> read more »

Eliot Horowitz

unread,
Jun 29, 2011, 11:18:08 PM6/29/11
to mongod...@googlegroups.com
fsync + lock is only really needed if you're copying the data files.
if you're using mongodump, its safe to do without it.

You aren't guaranteed a fully self consistent data set though.

Tal Liron

unread,
Jul 1, 2011, 12:26:53 AM7/1/11
to mongod...@googlegroups.com

I've released my "alternative backup" tool as part of the Savory Framework. The list announcement was here:


http://groups.google.com/group/mongodb-user/browse_thread/thread/2e1f3ad485543c49


A description of the backup service specifically:


http://threecrickets.com/savory/about/service/backup/

Tal Liron

unread,
Jul 7, 2011, 1:29:44 PM7/7/11
to mongod...@googlegroups.com
One more update to this issue:


It seems that when running mongodump through a bash script you may want
to add a "wait" command, like so:


#/bin/bash


# do things here

mongodump

wait

# do things here


Seems to have fixed things on one of our deployments, but I can't be
sure because we did a LOT of things there. I can't confirm this
entirely, and I'm also a bit skeptical that this is the issue. Why would
mongodump be running as a background process? It can explain, though,
why in some cases mongodump was cut off abruptly with no error code.


-Tal

Eliot Horowitz

unread,
Jul 11, 2011, 1:37:06 AM7/11/11
to mongod...@googlegroups.com
That doesn't make too much sense...

Does it work from the shell 100% of the time?

Christopher Cote

unread,
Jul 11, 2011, 11:23:43 AM7/11/11
to mongodb-user
It is working now, I think it had something to do with the replicaset
setup, or... some long tmp collection names that were extremely
difficult to get rid of.

There were actually several issues with the deployment.

Either way it's working great now.
Reply all
Reply to author
Forward
0 new messages