Stopped DB, now fails on start. Repair crashes with 'exception: bad system.namespaces object'

301 views
Skip to first unread message

Mike

unread,
Jan 7, 2011, 10:37:38 AM1/7/11
to mongodb-user
Hi folks,

I stopped my web app and mongod with a term signal, to prepare for an
upgrade. Mongod no longer starts, with the following error:

Caught Assertion in runQuery ns:imeveryone.messages
massert:assertion db/pdfile.h:223

Repairing the database crashes with the following error:

[initandlisten] exception in initAndListen std::exception: bad
system.namespaces object { $err: "assertion db/pdfile.h:241" },
terminating

The issue seems similar to http://jira.mongodb.org/browse/SERVER-2010
which is already raised but currently not solved.

Using 64 bit OS X 1.6.5, though I can get the error on Ubuntu too.
mongodb/bin/mongod --version
db version v1.6.5, pdfile version 4.5
Fri Jan 7 15:35:29 git version:
0eb017e9b2828155a67c5612183337b89e12e291

Any thoughts on how to recover the data?

Mike

Eliot Horowitz

unread,
Jan 7, 2011, 11:14:10 AM1/7/11
to mongod...@googlegroups.com
Are you sure you did a TERM and not KILL?

Is there more of a stack trace in the log?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Mike MacCana

unread,
Jan 7, 2011, 11:58:29 AM1/7/11
to mongod...@googlegroups.com
It's indeed possible I did a KILL, if mongod didn't respond to the TERM. I can't find any recent mongod logs (there's a /var/log/mongod, but the one file in there hasn't been updated for some time). 

Is there any way I can swap the broken the namespaces on the failing db with the working ones from my most recent backup?

Mike

Eliot Horowitz

unread,
Jan 8, 2011, 9:40:22 AM1/8/11
to mongod...@googlegroups.com
Maybe if you never changed collections/indexes but a kill -9 can leave things unrecoverable. 

Did you try 1.7.4 repair?  

Eliot Horowitz

unread,
Jan 8, 2011, 9:42:53 AM1/8/11
to mongod...@googlegroups.com
No slave right?  Did you have --master on at all?


On Jan 7, 2011, at 11:58 AM, Mike MacCana <mike.m...@gmail.com> wrote:

Mike MacCana

unread,
Jan 8, 2011, 11:12:11 AM1/8/11
to mongod...@googlegroups.com
Thanks for the suggestion. 1.7.4 gives the same error (imeveryone.system.namespaces Assertion), full results below.

$ mongodb/bin/mongod --repair --dbpath db-ohshit/
Sat Jan  8 16:10:23 MongoDB starting : pid=36930 port=27017 dbpath=db-ohshit/ 64-bit 

** NOTE: This is a development version (1.7.4) of MongoDB.
**       Not recommended for production.

Sat Jan  8 16:10:23 db version v1.7.4, pdfile version 4.5
Sat Jan  8 16:10:23 git version: fecf14def3bd51b91e0af55d61e9de2b9f918dcf
Sat Jan  8 16:10:23 sys info: Darwin erh2.10gen.cc 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386 BOOST_LIB_VERSION=1_40
Sat Jan  8 16:10:23 [initandlisten] ****
Sat Jan  8 16:10:23 [initandlisten] ****
Sat Jan  8 16:10:23 [initandlisten] need to upgrade database imeveryone with pdfile version 4.5, new version: 4.5
Sat Jan  8 16:10:23 [initandlisten]      starting upgrade
Sat Jan  8 16:10:23 [initandlisten]  imeveryone repairDatabase imeveryone
Sat Jan  8 16:10:23 [initandlisten]  imeveryone.system.namespaces Assertion failure magic == 0x41424344 db/pdfile.h 252
0x1000755f2 0x10008493e 0x1002e0f22 0x1002d12fd 0x1002d3b4b 0x1002ec269 0x1001847f6 0x1002ef4b4 0x1002efe04 0x1002f6f84 0x1002f90d1 0x1002f9fba 0x100170ca8 0x1002a9e98 0x1002b52af 0x1002b65ff 0x1000c420e 0x1000aeb14 0x10027c1a4 0x10027e4ce 
 0   mongod                              0x00000001000755f2 _ZN5mongo12sayDbContextEPKc + 178
 1   mongod                              0x000000010008493e _ZN5mongo8assertedEPKcS1_j + 286
 2   mongod                              0x00000001002e0f22 _ZN5mongo11DataFileMgr9getExtentERKNS_7DiskLocE + 114
 3   mongod                              0x00000001002d12fd _ZN5mongo11DataFileMgr7findAllEPKcRKNS_7DiskLocE + 77
 4   mongod                              0x00000001002d3b4b _ZN5mongo13findTableScanEPKcRKNS_7BSONObjERKNS_7DiskLocE + 139
 5   mongod                              0x00000001002ec269 _ZNK5mongo9QueryPlan9newCursorERKNS_7DiskLocEi + 1209
 6   mongod                              0x00000001001847f6 _ZN5mongo11UserQueryOp5_initEv + 342
 7   mongod                              0x00000001002ef4b4 _ZN5mongo12QueryPlanSet6Runner6initOpERNS_7QueryOpE + 324
 8   mongod                              0x00000001002efe04 _ZN5mongo12QueryPlanSet6Runner3runEv + 660
 9   mongod                              0x00000001002f6f84 _ZN5mongo12QueryPlanSet5runOpERNS_7QueryOpE + 868
 10  mongod                              0x00000001002f90d1 _ZN5mongo16MultiPlanScanner9runOpOnceERNS_7QueryOpE + 113
 11  mongod                              0x00000001002f9fba _ZN5mongo16MultiPlanScanner5runOpERNS_7QueryOpE + 26
 12  mongod                              0x0000000100170ca8 _ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_ + 3256
 13  mongod                              0x00000001002a9e98 _ZN5mongo13receivedQueryERNS_6ClientERNS_10DbResponseERNS_7MessageE + 568
 14  mongod                              0x00000001002b52af _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 5055
 15  mongod                              0x00000001002b65ff _ZN5mongo14DBDirectClient4callERNS_7MessageES2_b + 111
 16  mongod                              0x00000001000c420e _ZN5mongo14DBClientCursor4initEv + 206
 17  mongod                              0x00000001000aeb14 _ZN5mongo12DBClientBase5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii + 740
 18  mongod                              0x000000010027c1a4 _ZN5mongo6Cloner2goEPKcRSsRKSsbbbb + 820
 19  mongod                              0x000000010027e4ce _ZN5mongo9cloneFromEPKcRSsRKSsbbbb + 62
Sat Jan  8 16:10:23 [initandlisten] assertion 0 assertion db/pdfile.h:252 ns:imeveryone.system.namespaces query:{}
Sat Jan  8 16:10:23 [initandlisten] Assertion: 10290:bad system.namespaces object { $err: "assertion db/pdfile.h:252" }
0x100083acb 0x10027c525 0x10027e4ce 0x1002ccacb 0x100419456 0x10041ab2a 0x10041ceac 0x10041d764 0x10041f141 0x100001414 0x4 
 0   mongod                              0x0000000100083acb _ZN5mongo11msgassertedEiPKc + 315
 1   mongod                              0x000000010027c525 _ZN5mongo6Cloner2goEPKcRSsRKSsbbbb + 1717
 2   mongod                              0x000000010027e4ce _ZN5mongo9cloneFromEPKcRSsRKSsbbbb + 62
 3   mongod                              0x00000001002ccacb _ZN5mongo14repairDatabaseESsRSsbb + 987
 4   mongod                              0x0000000100419456 _ZN5mongo11doDBUpgradeERKSsSsPNS_14DataFileHeaderE + 118
 5   mongod                              0x000000010041ab2a _ZN5mongo30repairDatabasesAndCheckVersionEv + 1066
 6   mongod                              0x000000010041ceac _ZN5mongo14_initAndListenEiPKc + 1084
 7   mongod                              0x000000010041d764 _ZN5mongo13initAndListenEiPKc + 36
 8   mongod                              0x000000010041f141 main + 4577
 9   mongod                              0x0000000100001414 start + 52
 10  ???                                 0x0000000000000004 0x0 + 4
Sat Jan  8 16:10:23 [initandlisten] exception in initAndListen std::exception: bad system.namespaces object { $err: "assertion db/pdfile.h:252" }, terminating
Sat Jan  8 16:10:23 dbexit: 
Sat Jan  8 16:10:23 [initandlisten] shutdown: going to close listening sockets...
Sat Jan  8 16:10:23 [initandlisten] shutdown: going to flush oplog...
Sat Jan  8 16:10:23 [initandlisten] shutdown: going to close sockets...
Sat Jan  8 16:10:23 [initandlisten] shutdown: waiting for fs preallocator...
Sat Jan  8 16:10:23 [initandlisten] shutdown: closing all files...
Sat Jan  8 16:10:23 closeAllFiles() finished
Sat Jan  8 16:10:23 [initandlisten] shutdown: removing fs lock...
Sat Jan  8 16:10:23 dbexit: really exiting now

Mike

Eliot Horowitz

unread,
Jan 8, 2011, 11:19:34 AM1/8/11
to mongod...@googlegroups.com
And no slave or --master?
What backups do you have?
Do you have the last log from when it was running?

Mike MacCana

unread,
Jan 8, 2011, 11:23:11 AM1/8/11
to mongod...@googlegroups.com
I have backups, that's what the sites running on ATM. The DB was running fine till I shut it down to take another one, which was where the corruption seemed to happen. No slave, as it'sa single VM and that's what the backups were for till 1.8 comes out. 

Alas I can't find any logs sent to syslog, and I don't have the app output anymore.

Eliot Horowitz

unread,
Jan 8, 2011, 11:25:21 AM1/8/11
to mongod...@googlegroups.com
Where do the regular stdout logs go?
How were you starting mongod?

Mike MacCana

unread,
Jan 8, 2011, 11:34:06 AM1/8/11
to mongod...@googlegroups.com
stdout went to a screen session, the buffers gone past the session where it was stopped, but the output on start wasn't any different to the current output (that's how I ended up at the bug below). 

Mongod was started by a call to Python's subprocess.Popen.

Eliot Horowitz

unread,
Jan 8, 2011, 11:36:19 AM1/8/11
to mongod...@googlegroups.com
Was more looking for how it went down, if it was clean kill or a kill -9, etc...

On a kill -9 there are no guarantees about what you can or cannot recover.

How old was the last backup?

You can try using the .ns from the last backup with the new data files.

Mike MacCana

unread,
Jan 9, 2011, 3:33:54 PM1/9/11
to mongod...@googlegroups.com
Thanks for the suggestion (and your help in general). I tried using the old .ns file, --repair still crashes with bad system.namespaces object. Last backup was 19th Dec, I have a lot of user generated content around Christmastime that's been lost - I'm currently re:getting some of this from the Google cache (or as much has been indexed), but I'd love if there were any other options to fix the existing file. So - is there anything left to try?

Thanks,

Mike

Eliot Horowitz

unread,
Jan 9, 2011, 3:36:26 PM1/9/11
to mongod...@googlegroups.com
There might be, how many objects were in it?
Do you have any logs at all?
Can you upload the last backup?
Reply all
Reply to author
Forward
0 new messages