Mongodb Crash, wouldn't restart: disk error?

366 views
Skip to first unread message

jamieorc

unread,
Feb 27, 2015, 3:17:54 PM2/27/15
to mongod...@googlegroups.com
I'm trying to trouble-shoot a failure today of a Mongodb 2.6.7 server (due to "SEVERE: Invalid access at address: 0x7fb1eac500b8"). Was it Mongo? Was it a disk failure? Something else? Any feedback appreciated.

The setup:

* Mongodb 2.6.7
* DB data on encrypted SDD disk (LUKS)
* ext4 format
* have been using this disk for many months
* Mongo recently updated from 2.4.12 to 2.6.7 (within past week)

Recovered by:

* attaching new disk, encrypting, formatting, mounting, etc
* copying data to new disk
* reattaching new disk to Mongodb's expected mount point
* starting Mongodb

Some relevant Mongodb and syslog info:

From Mongodb:

2015-02-27T09:25:32.129+0000 [DataFileSync] msync errno:5 Input/output error
2015-02-27T09:25:32.129+0000 [DataFileSync] error syncing data to disk, probably a disk error
2015-02-27T09:25:32.129+0000 [DataFileSync]  shutting down immediately to avoid corruption
2015-02-27T09:25:32.338+0000 [DataFileSync] Fatal Assertion 17346
2015-02-27T09:25:32.969+0000 [conn69896] SEVERE: Invalid access at address: 0x7fb1eac500b8
2015-02-27T09:25:38.392+0000 [DataFileSync] 0x11fd1b1 0x119efa9 0x1181add 0x11a1d72 0x11a8739 0x11a30dc 0x11a38c3 0x777a95 0x1184c92 0x1241b49 0x7fc8307d7e9a 0x7fc82faeb2ed 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11fd1b1]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x119efa9]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xcd) [0x1181add]
 /usr/bin/mongod(_ZN5mongo21dataSyncFailedHandlerEv+0x132) [0x11a1d72]
 /usr/bin/mongod(_ZN5mongo14PosixFlushable5flushEv+0x439) [0x11a8739]
 /usr/bin/mongod(_ZN5mongo9MongoFile9_flushAllEb+0x1ac) [0x11a30dc]
 /usr/bin/mongod(_ZN5mongo9MongoFile8flushAllEb+0x23) [0x11a38c3]
 /usr/bin/mongod(_ZN5mongo12DataFileSync3runEv+0x125) [0x777a95]
 /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x1184c92]
 /usr/bin/mongod() [0x1241b49]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fc8307d7e9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc82faeb2ed]
2015-02-27T09:25:38.392+0000 [DataFileSync] 

***aborting after fassert() failure


2015-02-27T09:25:38.397+0000 [DataFileSync] SEVERE: Got signal: 6 (Aborted).
Backtrace:0x11fd1b1 0x11fc58e 0x7fc82fa2d150 0x7fc82fa2d0d5 0x7fc82fa3083b 0x1181b4a 0x11a1d72 0x11a8739 0x11a30dc 0x11a38c3 0x777a95 0x1184c92 0x1241b49 0x7fc8307d7e9a 0x7fc82faeb2ed 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11fd1b1]
 /usr/bin/mongod() [0x11fc58e]
 /lib/x86_64-linux-gnu/libc.so.6(+0x36150) [0x7fc82fa2d150]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7fc82fa2d0d5]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7fc82fa3083b]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x13a) [0x1181b4a]
 /usr/bin/mongod(_ZN5mongo21dataSyncFailedHandlerEv+0x132) [0x11a1d72]
 /usr/bin/mongod(_ZN5mongo14PosixFlushable5flushEv+0x439) [0x11a8739]
 /usr/bin/mongod(_ZN5mongo9MongoFile9_flushAllEb+0x1ac) [0x11a30dc]
 /usr/bin/mongod(_ZN5mongo9MongoFile8flushAllEb+0x23) [0x11a38c3]
 /usr/bin/mongod(_ZN5mongo12DataFileSync3runEv+0x125) [0x777a95]
 /usr/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0xd2) [0x1184c92]
 /usr/bin/mongod() [0x1241b49]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7fc8307d7e9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc82faeb2ed]

Syslog info:

Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642787] end_request: I/O error, dev xvdb, sector 553226992
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642834] end_request: I/O error, dev xvdb, sector 553227088
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642848] end_request: I/O error, dev xvdb, sector 553227056
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642863] end_request: I/O error, dev xvdb, sector 310773216
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642881] end_request: I/O error, dev xvdb, sector 310773224
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642893] end_request: I/O error, dev xvdb, sector 310773232
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642935] Aborting journal on device dm-0-8.
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642973] end_request: I/O error, dev xvdb, sector 192942336
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642989] Buffer I/O error on device dm-0, logical block 24117280
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.642997] lost page write due to I/O error on dm-0
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643011] end_request: I/O error, dev xvdb, sector 168371200
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643031] Buffer I/O error on device dm-0, logical block 21045888
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643042] Buffer I/O error on device dm-0, logical block 21045889
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643050] Buffer I/O error on device dm-0, logical block 21045890
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643058] Buffer I/O error on device dm-0, logical block 21045891
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643065] Buffer I/O error on device dm-0, logical block 21045892
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643072] Buffer I/O error on device dm-0, logical block 21045893
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643080] Buffer I/O error on device dm-0, logical block 21045894
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643088] Buffer I/O error on device dm-0, logical block 21045895
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643096] Buffer I/O error on device dm-0, logical block 21045896
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643103] Buffer I/O error on device dm-0, logical block 21045897
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643111] Buffer I/O error on device dm-0, logical block 21045898
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643121] EXT4-fs warning (device dm-0): ext4_end_bio:250: I/O error -5 writing to inode 204 (offset 1990721536 size 49152 starting block 21045888)
Feb 27 09:25:26 iad-mongodb-big kernel: [13119487.643130] end_request: I/O error, dev xvdb, sector 168371288

(more of the same)

Thanks,

Jamie

jamieorc

unread,
Feb 27, 2015, 3:42:13 PM2/27/15
to mongod...@googlegroups.com
Rackspace replied. It was their problem. Would have been nice to get a notice at the time:

"This morning at approximately 3:30 am CST multiple SATA and SSD storage nodes lost connection to Cloud Servers due to a network change made during backbone maintenance CHG0052454. Backbone engineers were able to restore the connectivity at approximately 5:00 am CST. Cloud Block Storage engineers are in the process of assessing the full customer impact and targeted customer notifications will follow. At this time we are encouraging any customers with CBS volumes in IAD to please check the status of their volumes as they could potentially be in read-only mode due to the service interruption. If your volume is in read-only mode, please unmount the volume, run a filesystem check (linux: fsck, windows: chkdsk), and then remount the volume in read-write mode."
Reply all
Reply to author
Forward
0 new messages