Hi there!
I'm using mongodb 2.0.7 with 2 shards, each one is configured as a replica set with 3 instances.
I needed to reconfigure shard name identification, so I removed first shard(wait for the data draining, moved primaries, so on), changed its name and added it with new name.
Connection to the cluster is secured with keyFile and password.
I was just about to remove the second shard to do the same changes with it, when I experienced following exceptions in mongos:
Sun Sep 30 15:23:34 [conn271] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_
init @ mongodb-sh1/<...> attempt: 0
Sun Sep 30 15:23:34 [mongosMain] connection accepted from
77.88.18.6:54797 #447
Sun Sep 30 15:23:34 [conn447] authenticate: { authenticate: 1, user: "rating", nonce: "7642e5d689939a9c", key: "2ee0b88e7572c5fac957830fa727075d" }
Sun Sep 30 15:23:36 [conn230] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Sun Sep 30 15:23:40 [conn380] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity ParallelCursor::_init @ mongodb-sh1/<...< attempt: 0
Sun Sep 30 15:23:41 [mongosMain] connection accepted from
77.88.18.6:54904 #448
Sun Sep 30 15:23:41 [conn448] authenticate: { authenticate: 1, user: "rating", nonce: "83c1b541f74be683", key: "510ff1afd1d2cf9d41128135fdeccd43" }
Sun Sep 30 15:23:41 [conn448] getaddrinfo("
host1f.load.net") failed: Name or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn448] getaddrinfo("
host1f.load.net") failed: Name or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn448] getaddrinfo("
host1f.load.net") failed: Name or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn448] getaddrinfo("
host1f.load.net") failed: Name or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn327] getaddrinfo("
host1f.load.net") failed: Name or service not known
Sun Sep 30 15:23:41 [conn327] warning: could not get last error from a shard
host1g.load.net :: caused by :: socket exception
Received signal 6
Backtrace: 0x528c54 0x7ff4a8b77af0 0x7ff4a8b77a75 0x7ff4a8b7b5c0 0x7ff4a8b70941 0x730908 0x712fa5 0x700860 0x72ad97 0x73c8d1 0x5aad20 0x7ff4a9ef89ca 0x7ff4a8c2a70d
mongos(_ZN5mongo17printStackAndExitEi+0x64)[0x528c54]
/lib/libc.so.6(+0x33af0)[0x7ff4a8b77af0]
/lib/libc.so.6(gsignal+0x35)[0x7ff4a8b77a75]
/lib/libc.so.6(abort+0x180)[0x7ff4a8b7b5c0]
/lib/libc.so.6(__assert_fail+0xf1)[0x7ff4a8b70941]
mongos(_ZN5mongo10ClientInfo12getLastErrorERKNS_7BSONObjERNS_14BSONObjBuilderEb+0x3898)[0x730908]
mongos(_ZN5mongo7Command20runAgainstRegisteredEPKcRNS_7BSONObjERNS_14BSONObjBuilderEi+0x805)[0x712fa5]
mongos(_ZN5mongo14SingleStrategy7queryOpERNS_7RequestE+0x4d0)[0x700860]
mongos(_ZN5mongo7Request7processEi+0x157)[0x72ad97]
mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71)[0x73c8d1]
mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x260)[0x5aad20]
/lib/libpthread.so.0(+0x69ca)[0x7ff4a9ef89ca]
/lib/libc.so.6(clone+0x6d)[0x7ff4a8c2a70d]
===
CursorCache at shutdown - sharded: 1 passthrough: 2206
After I had restarted mongos, exceptions still occurred
***** SERVER RESTARTED *****
Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] Socket recv() timeout
95.108.235.171:27017Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] SocketException: remote:
95.108.235.171:27017 error: 9001 socket exception [3] server [
95.108.235.171:27017]
Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] trying reconnect to
host2f.load.net:27017Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] reconnect
host2f.load.net:27017 ok
Mon Oct 1 14:25:42 [conn6] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Mon Oct 1 14:26:11 [conn148] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_
init @ mongodb-sh1/<...> attempt: 0
Mon Oct 1 14:26:18 [conn41] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: vertis-rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Mon Oct 1 14:26:49 [conn150] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ClusteredCursor::
_checkCursor @ rset02/<...> attempt: 0
Mon Oct 1 14:26:49 [conn150] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity_attr ClusteredCursor::
_checkCursor @ rset02/<...> attempt: 1
Mon Oct 1 14:26:51 [WriteBackListener-host2f:27017] ChunkManager: time to load chunks for musicgroup.entity: 4ms sequenceNumber: 12 version: 416|1
Mon Oct 1 14:26:51 [conn150] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity_attr ParallelCursor::_
init @ <...> attempt: 2
Mon Oct 1 14:27:14 [conn150] SyncClusterConnection connecting to [conf1g:27018]
Mon Oct 1 14:27:14 [conn150] SyncClusterConnection connecting to [conf1d:27018]
Mon Oct 1 14:27:14 [conn150] SyncClusterConnection connecting to [conf1f:27018]
Mon Oct 1 14:27:16 [conn142] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Mon Oct 1 14:27:19 [conn86] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Mon Oct 1 14:27:24 [conn93] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_i
nit @ mongodb-sh1/<...> attempt: 0
Mon Oct 1 14:27:46 [ReplicaSetMonitorWatcher] Socket recv() timeout
192.168.1.101:27017Mon Oct 1 14:27:46 [ReplicaSetMonitorWatcher] SocketException: remote:
192.168.1.101:27017 error: 9001 socket exception [3] server [
192.168.1.101:27017]
Mon Oct 1 14:27:46 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed
Mon Oct 1 14:27:50 [conn149] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_
init @ mongodb-sh1/<...> attempt: 0
Mon Oct 1 14:27:56 [ReplicaSetMonitorWatcher] trying reconnect to
host2d.load.net:27017Mon Oct 1 14:27:56 [ReplicaSetMonitorWatcher] reconnect
host2d.load.net:27017 ok
At this moment I realized, that my loglevel is 0. So I changed it to 5 to see, what's happening.
So, I performed db.runCommand("{setParameter:1,logLevel:5}")
And right after this command, this exception disappeared. Even sharded queries are processed properly:
Mon Oct 1 14:28:20 [conn90] Request::process ns: musicgroup.entity msg id:2954917 attempt: 0
Mon Oct 1 14:28:20 [conn90] shard query: musicgroup.entity { query: { _id: { $in: <...IDS...> } }, $readPreference: { mode: "primary" } }
Mon Oct 1 14:28:20 [conn90] shard query servers: 2
server:mongodb-sh1/<...> _extra:{} _orderObject:{}
server:rset02/<...> _extra:{} _orderObject:{}
Mongos is still OK, but it seems to be a bug here.
Does anybody have some ideas?
Thanks!