wiredTiger "Invariant failure" hard stop -- BWAP BOOM BLAM

601 views
Skip to first unread message

EastGhostCom

unread,
Jun 27, 2015, 3:04:24 AM6/27/15
to mongod...@googlegroups.com
An index used by wiredTiger appears to be partly wrecked.

MongoDB 3.0.1 precompiled for Debian 7.1
and same on MongoDB 3.0.4 precompiled for Deb 7.1



Error from log:

2015-06-27T02:50:53.982-0400 E STORAGE  [conn10] WiredTiger (-31802) [1435387853:982291][4356:0x7fef12be6700], file:DntstsOrg/index/48-3746921816743884250.wt, session.open_cursor: DntstsOrg/index/48-3746921816743884250.wt read error: failed to read 4096 bytes at offset 32768: WT_ERROR: non-specific WiredTiger error

2015-06-27T02:50:53.982-0400 I -        [conn10] Invariant failure: ret resulted in status UnknownError -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 78

2015-06-27T02:50:53.999-0400 I CONTROL  [conn10]

followed by a backtrace and hard crash of primary



If I stop Mongo and empty the index folder, the error message is slightly different, but the end result (hard crash) is the same.  If I remove the 48-3746... file the end result is the same.


Is there any way to force wiredTiger / Mongo to either ignore the error and/or recreate the needed indexes from the data files?

This happened with indexes removed from data and stored separately on a SSD...No indication yet that it was implicated, but it is curious.

EastGhostCom

unread,
Jun 27, 2015, 11:59:42 AM6/27/15
to mongod...@googlegroups.com

this appears to be related somehow to https://jira.mongodb.org/browse/SERVER-18316


or it could be that this bug still lingers https://jira.mongodb.org/browse/SERVER-17451


1. I recovered by making BBB into new PRIMARY by changing the priority of BBB above that of XXX (on PRIMARY XXX: cfg = rs.config() ; cfg.members[1].priority = 1000 ; rs.reconfig(cfg) )

2. then stopping XXX (FORMER PRIMARY)

3. then deleting the entire data + indexes folders from XXX

4. then restarting XXX and allowing it to resynch all data from BBB

5. then stopping XXX and moving all the indexes (we store indexes on SSDs, which requires moving the collection/index folder and then soft linking to it)

6. then restarting XXX

7. then making XXX the new primary by reversing (1) above.


Seems convoluted as heck but it was pretty fast and painless.  Mongo has generally been so excellent for us, reliable, fast, everything -- so it's quite jarring to be reminded that it can be just as fragile as anything.

EastGhostCom

unread,
Jun 27, 2015, 12:01:53 PM6/27/15
to mongod...@googlegroups.com
Seems to me there needs to be a way to selectively wipe particular directories / databases from the server, and NOT require a total destroy and resynch of all data. 

Alexander Gorrod

unread,
Jul 9, 2015, 1:30:02 AM7/9/15
to mongod...@googlegroups.com
Hi,

I'm sorry that you ran into problems with MongoDB. I'd say that the best way to recover a member of a replica set that has a data corruption is to re-sync the node from scratch. Using a process such as you described above.

An alternative is to use the MongoDB repair command: http://docs.mongodb.org/manual/reference/command/repairDatabase/

I would never recommend that you manually remove files from the database directory - it is almost guaranteed to lead to corruption.
Reply all
Reply to author
Forward
0 new messages