Data lost silently after stopping and starting mongodb

2,697 views
Skip to first unread message

chanon

unread,
Jan 21, 2011, 2:40:37 AM1/21/11
to mongodb-user
After stopping mongodb after it was running for about a month and then
starting it again, we found that recent data disappeared. It was as if
someone rolled back the data to maybe a week or a month ago (we're not
sure).

The database is small since we are still in development and we
automatically backup once a day using mongodump.

We are running mongodb 1.6.3 on Ubuntu.

Our mongodb process is managed using supervisord which uses TERM
signals by default.

On starting the server (using supervisord) it started normally,
without saying a repair is needed.

This is the log showing the part where it was stopped and started
again:

Thu Jan 20 10:41:49 [initandlisten] connection accepted from .........
#470
Thu Jan 20 11:12:39 [conn470] end connection .........
Thu Jan 20 12:32:40 [conn468] end connection .........
Fri Jan 21 03:54:19 [initandlisten] connection accepted from .........
#471
Fri Jan 21 05:21:48 MongoDB starting : pid=3620 port=xxxx dbpath=xxxx
Fri Jan 21 05:21:48 db version v1.6.3, pdfile version 4.5
Fri Jan 21 05:21:48 git version:
278bd2ac2f2efbee556f32c13c1b6803224d1c01
Fri Jan 21 05:21:48 sys info: Linux xxxxxxx #1 SMP Fri Nov 20 17:48:28
EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Fri Jan 21 05:21:48 [initandlisten] waiting for connections on port
xxxx
Fri Jan 21 05:21:48 [websvr] web admin interface listening on port
xxxx
Fri Jan 21 05:22:01 [initandlisten] connection accepted
from ............... #1

What I'd like to know is if there is any known reason this might have
happened. We are still somewhat new to mongodb so we may be doing
something wrong. From what I've read though, stopping mongodb using
TERM signal should not cause data loss, especially when upon starting
it it doesn't require a repair.

We have restored using our daily backup and lost maybe about half a
day of work in it and it is development data so it wasn't critical
data. However, this made us a bit wary because if we go live and
something like this happens to user data, then it would be big
trouble.

So I would like to know more about best practices. I saw someone talk
about running with --master .. how does that help (if there are no
slaves)?

I guess next time before stopping mongodb we should do a mongodump and
fsync lock + copy backup - but just thinking that I have to do that
makes me realize how I've lost confidence in mongodb's durability.

Eliot Horowitz

unread,
Jan 21, 2011, 2:42:47 AM1/21/11
to mongod...@googlegroups.com
Do you have the logs from when it was shut down?
What options are you running with?

TERM should be fine...

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

chanon

unread,
Jan 21, 2011, 3:10:59 AM1/21/11
to mongodb-user
The log during the shutdown is posted in my original email.

mongod --config /path/to/mongo.conf

mongo.conf contents:
dbpath = /path/to/dbfolder
auth = true
port = xxxxx

So it is a very basic config.

A more detailed explanation of the data loss to give you a better
idea:

This db is used for static data for our social game. For example, our
social game has weapons, so we have a mongo collection for our weapon
data called Weapon. Inside the Weapon collection are multiple
documents each describing the attributes of a weapon. Yesterday we
were adding additional weapons to the database. Today we had to stop
the mongo server. When it was started again everything seemed fine,
until the developer who is adding the weapon data said that the
weapons he added yesterday were gone. The weapons that were inputted
before yesterday were present though.

Our last daily mongodump backup (which was yesterday noon) did have
the weapons he added up until the time of the backup.

So what happened is that some documents that were added somehow went
missing from the collection after the restart. And this happened
silently. The amount of documents in this collection was pretty small,
no more than 100 items.

So of course our fear is what would happen if user data went missing
like this. And the most scary part is that it was silent - if we
didn't notice then we wouldn't know.

On Jan 21, 2:42 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Do you have the logs from when it was shut down?
> What options are you running with?
>
> TERM should be fine...
>

Eliot Horowitz

unread,
Jan 21, 2011, 8:05:46 AM1/21/11
to mongod...@googlegroups.com
I see.

So, here are 2 lines from the log:

Fri Jan 21 03:54:19 [initandlisten] connection accepted from .........
#471
Fri Jan 21 05:21:48 MongoDB starting : pid=3620 port=xxxx dbpath=xxxx

That's interested for 3 reasons.
- that is not a clean shutdown. there is not shutting down message at all.
should see something like: Fri Jan 21 08:04:22 got kill or ctrl c
or hup signal 15 (Terminated), will terminate after current cmd ends
and then a variety of shutdown messages.
try just started a local instance and then ctrl-c of SIGTERM it
- there is a ~90 minute gap between the last message and the next
startup message.
did it take that long for the machine to come back up? or is
something else going on?
- there should have been a lock file left over. are you sure there
isn't a script removing it?

chanon

unread,
Jan 21, 2011, 10:34:06 AM1/21/11
to mongodb-user
Hi Eliot,

OK, so it looks like maybe we aren't shutting it down properly.
Another possibility is that maybe supervisord stopped redirecting the
logging before it sent the TERM signal.

(There isn't a delete lock file script.)

Big thanks for the help! I'll go back and see what supervisord is
doing.

On Jan 21, 8:05 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> I see.
>
> So, here are 2 lines from the log:
>
> Fri Jan 21 03:54:19 [initandlisten] connection accepted from .........
> #471
> Fri Jan 21 05:21:48 MongoDB starting : pid=3620 port=xxxx dbpath=xxxx
>
> That's interested for 3 reasons.
>  - that is not a clean shutdown.  there is not shutting down message at all.
>    should see something like: Fri Jan 21 08:04:22 got kill or ctrl c
> or hup signal 15 (Terminated), will terminate after current cmd ends
>    and then a variety of shutdown messages.
>    try just started a local instance and then ctrl-c of SIGTERM it
> - there is a ~90 minute gap between the last message and the next
> startup message.
>   did it take that long for the machine to come back up?  or is
> something else going on?
> - there should have been a lock file left over.  are you sure there
> isn't a script removing it?
>

Eliot Horowitz

unread,
Jan 21, 2011, 10:36:34 AM1/21/11
to mongod...@googlegroups.com
Ok.
Either way, with durability in 1.8 mongo will handle these cases better.

chanon

unread,
Jan 21, 2011, 10:50:48 AM1/21/11
to mongodb-user
I've now switched to using mongodb's native logging (logpath,
logappend options).
Looks like it was just supervisord stopping redirecting of log files
prematurely.
Which means I don't have the actual logs during shutdown in this
case .. so it may still have been a case of actual data loss.

Will see if it happens again, but I'll be a bit more cautious with
stopping mongodb now.

Yes, I'm very much looking forward to 1.8 .. when do you think it will
be ready?

On Jan 21, 10:36 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Ok.
> Either way, with durability in 1.8 mongo  will handle these cases better.
>
Reply all
Reply to author
Forward
0 new messages