mongos crash

Message has been deleted

Skipy

unread,

May 16, 2011, 8:15:12 AM5/16/11

to mongodb-user

Hello

We're running Mongo 1.8.1 on our production environment, with 8
shards, 1 config database, 1 mongos, no replication. After writing
some data in all the shards (there were a few chunks per shard), the
mongos process crashed, yielding the following errors:

Received signal 11
Backtrace: 0x52e235 0x2aaaab6c6040 0x2aaaaaf8e58c 0x621403 0x69afab
0x576ba6 0x5774b6 0x575630 0x575886 0x583914 0x586464 0x66c41a
0x6362c9 0x66432c 0x6761c7 0x57ea3c 0x69ec30 0x2aaaaacd33ba
0x2aaaab778fcd
bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
/lib/libc.so.6[0x2aaaab6c6040]
/usr/lib/libstdc++.so.6(_ZNSs6assignERKSs+0x1c)[0x2aaaaaf8e58c]
bin/mongos(_ZN5mongo5Shard5resetERKSs+0x93)[0x621403]
bin/mongos[0x69afab]
bin/
mongos(_ZN5boost6detail8function17function_invoker4IPFbRN5mongo12DBClientBa
seERKSsbiEbS5_S7_biE6invokeERNS1_15function_bufferES5_S7_bi
+0x16)[0x576ba6]
bin/mongos(_ZN5mongo17ClientConnections13checkVersionsERKSs+0x1c6)
[0x5774b6]
bin/mongos(_ZN5mongo15ShardConnection5_initEv+0x2d0)[0x575630]
bin/mongos(_ZN5mongo15ShardConnectionC1ERKSsS2_+0x76)[0x575886]
bin/mongos(_ZN5mongo15ClusteredCursor5queryERKSsiNS_7BSONObjEi+0x124)
[0x583914]
bin/mongos(_ZN5mongo27SerialServerClusteredCursor4moreEv+0x134)
[0x586464]
bin/
mongos(_ZN5mongo19ShardedClientCursor13sendNextBatchERNS_7RequestEi
+0x8a)[0x66c41a]
bin/mongos(_ZN5mongo13ShardStrategy7queryOpERNS_7RequestE+0xe39)
[0x6362c9]
bin/mongos(_ZN5mongo7Request7processEi+0x29c)[0x66432c]
bin/
mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21Abstract
MessagingPortEPNS_9LastErrorE
+0x77)[0x6761c7]
bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x34c)
[0x57ea3c]
bin/mongos(thread_proxy+0x80)[0x69ec30]
/lib/libpthread.so.0[0x2aaaaacd33ba]
/lib/libc.so.6(clone+0x6d)[0x2aaaab778fcd]
===
Fri May 13 19:05:17 ERROR: couldn't unset sharding : std::bad_alloc
Fri May 13 19:05:17 ERROR: couldn't unset sharding : std::bad_alloc
Fri May 13 19:05:17 ERROR: couldn't unset sharding : std::bad_alloc

The mongos crashed at 19:05:17, right after the previous error.
However, a few other errors are in the mongos.log file, similar to
the
following:

Fri May 13 09:53:40 [WriteBackListener] ~ScopedDBConnection: _conn !=
null
Fri May 13 09:53:40 [WriteBackListener] ERROR: error processing
writeback: 10429 setShardVersion failed
host[someip.compute-1.amazonaws.com:27022] { oldVersion: Timestamp
1633000|1, ns: "ubervu.mentions", newVersion: Timestamp 19000|1,
globalVersion: Timestamp 19000|0, errmsg: "you already have a newer
version of collection 'ubervu.mentions'", ok: 0.0 }

Any ideas why this might have happened?

Thank you,
Mihnea @ uberVU

Greg Studer

unread,

May 16, 2011, 11:36:56 AM5/16/11

to mongod...@googlegroups.com

Could be a memory issue - how much memory / swap space do you have on
the machine (I assume you're running on EC2), and are you running a
64-bit instance? Also, is this error reproducible, or have you only
seen it once?

Skipy

unread,

May 17, 2011, 4:45:14 AM5/17/11

to mongodb-user

We're running on an amazon m2.xlarge instance, which is a 64-bit
instance, with 17.1 GB of memory.
We've only seen the error once, and so far have not been able to
reproduce it.

Greg Studer

unread,

May 17, 2011, 2:58:38 PM5/17/11

to mongod...@googlegroups.com

tracked this down - caused by a thread synchronization issue - trying to
figure out the best fix. There shouldn't be any danger to data, aside
from the annoyance of having to restart the mongos process.

https://jira.mongodb.org/browse/SERVER-3114

Skipy

unread,

May 18, 2011, 8:39:41 AM5/18/11

to mongodb-user

Glad to hear you're tracking it down. Will watch it on JIRA to update
as soon as it's fixed.

Got another of these errors in our production environment. As far as I
can tell they're similar:

Received signal 11
Backtrace: 0x52e235 0x2aaaab6c6040 0x6213b3 0x61539e 0x69ac16 0x576ba6

0x5774b6 0x575630 0x575886 0x583914 0x586464 0x66c41a 0x6362c9
0x66432c 0x6761c7 0x57ea3c 0x69ec30 0x2aaaaacd33ba 0x2aaaab778fcd
bin/mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x52e235]
/lib/libc.so.6[0x2aaaab6c6040]

bin/mongos(_ZN5mongo5Shard5resetERKSs+0x43)[0x6213b3]
bin/
mongos(_ZN5mongo15setShardVersionERNS_12DBClientBaseERKSsNS_17ShardChunkVersionEbRNS_7BSONObjE
+0x44e)[0x61539e]
bin/mongos[0x69ac16]
bin/
mongos(_ZN5boost6detail8function17function_invoker4IPFbRN5mongo12DBClientBaseERKSsbiEbS5_S7_biE6invokeERNS1_15function_bufferES5_S7_bi

+0x16)[0x576ba6]
bin/mongos(_ZN5mongo17ClientConnections13checkVersionsERKSs+0x1c6)
[0x5774b6]
bin/mongos(_ZN5mongo15ShardConnection5_initEv+0x2d0)[0x575630]
bin/mongos(_ZN5mongo15ShardConnectionC1ERKSsS2_+0x76)[0x575886]
bin/mongos(_ZN5mongo15ClusteredCursor5queryERKSsiNS_7BSONObjEi+0x124)
[0x583914]
bin/mongos(_ZN5mongo27SerialServerClusteredCursor4moreEv+0x134)
[0x586464]
bin/mongos(_ZN5mongo19ShardedClientCursor13sendNextBatchERNS_7RequestEi
+0x8a)[0x66c41a]
bin/mongos(_ZN5mongo13ShardStrategy7queryOpERNS_7RequestE+0xe39)
[0x6362c9]
bin/mongos(_ZN5mongo7Request7processEi+0x29c)[0x66432c]
bin/
mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE
+0x77)[0x6761c7]
bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x34c)
[0x57ea3c]
bin/mongos(thread_proxy+0x80)[0x69ec30]
/lib/libpthread.so.0[0x2aaaaacd33ba]
/lib/libc.so.6(clone+0x6d)[0x2aaaab778fcd]
===

Greg Studer

unread,

May 18, 2011, 10:23:30 AM5/18/11

to mongod...@googlegroups.com

Yeah that's the same issue.

Reply all

Reply to author

Forward