Hi there! I'm using mongodb 2.0.7 with 2 shards, each one is configured as a replica set with 3 instances. I needed to reconfigure shard name identification, so I removed first shard(wait for the data draining, moved primaries, so on), changed its name and added it with new name.
Connection to the cluster is secured with keyFile and password. I was just about to remove the second shard to do the same changes with it, when I experienced following exceptions in mongos:
Sun Sep 30 15:23:34 [conn271] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_ init @ mongodb-sh1/<...> attempt: 0 Sun Sep 30 15:23:34 [mongosMain] connection accepted from 77.88.18.6:54797 #447 Sun Sep 30 15:23:34 [conn447] authenticate: { authenticate: 1, user: "rating", nonce: "7642e5d689939a9c", key: "2ee0b88e7572c5fac957830fa727075d" } Sun Sep 30 15:23:36 [conn230] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0 Sun Sep 30 15:23:40 [conn380] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity ParallelCursor::_init @ mongodb-sh1/<...< attempt: 0 Sun Sep 30 15:23:41 [mongosMain] connection accepted from 77.88.18.6:54904 #448 Sun Sep 30 15:23:41 [conn448] authenticate: { authenticate: 1, user: "rating", nonce: "83c1b541f74be683", key: "510ff1afd1d2cf9d41128135fdeccd43" } Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name or service not known Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...> Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name or service not known Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...> Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name or service not known Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...> Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name or service not known Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize cursor across all shards because : socket exception @ mongodb-sh1/<...> Sun Sep 30 15:23:41 [conn327] getaddrinfo("host1f.load.net") failed: Name or service not known Sun Sep 30 15:23:41 [conn327] warning: could not get last error from a shard host1g.load.net :: caused by :: socket exception Received signal 6 Backtrace: 0x528c54 0x7ff4a8b77af0 0x7ff4a8b77a75 0x7ff4a8b7b5c0 0x7ff4a8b70941 0x730908 0x712fa5 0x700860 0x72ad97 0x73c8d1 0x5aad20 0x7ff4a9ef89ca 0x7ff4a8c2a70d mongos(_ZN5mongo17printStackAndExitEi+0x64)[0x528c54] /lib/libc.so.6(+0x33af0)[0x7ff4a8b77af0] /lib/libc.so.6(gsignal+0x35)[0x7ff4a8b77a75] /lib/libc.so.6(abort+0x180)[0x7ff4a8b7b5c0] /lib/libc.so.6(__assert_fail+0xf1)[0x7ff4a8b70941] mongos(_ZN5mongo10ClientInfo12getLastErrorERKNS_7BSONObjERNS_14BSONObjBuild erEb+0x3898)[0x730908] mongos(_ZN5mongo7Command20runAgainstRegisteredEPKcRNS_7BSONObjERNS_14BSONOb jBuilderEi+0x805)[0x712fa5] mongos(_ZN5mongo14SingleStrategy7queryOpERNS_7RequestE+0x4d0)[0x700860] mongos(_ZN5mongo7Request7processEi+0x157)[0x72ad97] mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21Abstract MessagingPortEPNS_9LastErrorE+0x71)[0x73c8d1] mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x260)[0x5aad20] /lib/libpthread.so.0(+0x69ca)[0x7ff4a9ef89ca] /lib/libc.so.6(clone+0x6d)[0x7ff4a8c2a70d] === CursorCache at shutdown - sharded: 1 passthrough: 2206
After I had restarted mongos, exceptions still occurred
***** SERVER RESTARTED *****
Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] Socket recv() timeout 95.108.235.171:27017 Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] SocketException: remote: 95.108.235.171:27017 error: 9001 socket exception [3] server [95.108.235.171:27017] Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] trying reconnect to host2f.load.net:27017 Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] reconnect host2f.load.net:27017 ok Mon Oct 1 14:25:42 [conn6] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0 Mon Oct 1 14:26:11 [conn148] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_ init @ mongodb-sh1/<...> attempt: 0 Mon Oct 1 14:26:18 [conn41] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: vertis-rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0 Mon Oct 1 14:26:49 [conn150] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ClusteredCursor:: _checkCursor @ rset02/<...> attempt: 0
Mon Oct 1 14:26:49 [conn150] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity_attr ClusteredCursor:: _checkCursor @ rset02/<...> attempt: 1 Mon Oct 1 14:26:51 [WriteBackListener-host2f:27017] ChunkManager: time to load chunks for musicgroup.entity: 4ms sequenceNumber: 12 version: 416|1 Mon Oct 1 14:26:51 [conn150] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity_attr ParallelCursor::_ init @ <...> attempt: 2 Mon Oct 1 14:27:14 [conn150] SyncClusterConnection connecting to [conf1g:27018] Mon Oct 1 14:27:14 [conn150] SyncClusterConnection connecting to [conf1d:27018] Mon Oct 1 14:27:14 [conn150] SyncClusterConnection connecting to [conf1f:27018] Mon Oct 1 14:27:16 [conn142] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0 Mon Oct 1 14:27:19 [conn86] ns: rating.entity could not initialize cursor across all shards because : stale config detected for ns: rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0 Mon Oct 1 14:27:24 [conn93] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_i nit @ mongodb-sh1/<...> attempt: 0 Mon Oct 1 14:27:46 [ReplicaSetMonitorWatcher] Socket recv() timeout 192.168.1.101:27017 Mon Oct 1 14:27:46 [ReplicaSetMonitorWatcher] SocketException: remote: 192.168.1.101:27017 error: 9001 socket exception [3] server [192.168.1.101:27017] Mon Oct 1 14:27:46 [ReplicaSetMonitorWatcher] DBClientCursor::init call() failed Mon Oct 1 14:27:50 [conn149] ns: musicgroup.entity could not initialize cursor across all shards because : stale config detected for ns: musicgroup.entity ParallelCursor::_ init @ mongodb-sh1/<...> attempt: 0 Mon Oct 1 14:27:56 [ReplicaSetMonitorWatcher] trying reconnect to host2d.load.net:27017 Mon Oct 1 14:27:56 [ReplicaSetMonitorWatcher] reconnect host2d.load.net:27017 ok
At this moment I realized, that my loglevel is 0. So I changed it to 5 to see, what's happening. So, I performed db.runCommand("{setParameter:1,logLevel:5}") And right after this command, this exception disappeared. Even sharded queries are processed properly:
> Hi there!
> I'm using mongodb 2.0.7 with 2 shards, each one is configured as a replica > set with 3 instances. > I needed to reconfigure shard name identification, so I removed first > shard(wait for the data draining, moved primaries, so on), changed its name > and added it with new name.
> Connection to the cluster is secured with keyFile and password.
> I was just about to remove the second shard to do the same changes with > it, when I experienced following exceptions in mongos:
> Sun Sep 30 15:23:34 [conn271] ns: musicgroup.entity could not initialize > cursor across all shards because : stale config detected for ns: > musicgroup.entity ParallelCursor::_
> init @ mongodb-sh1/<...> attempt: 0
> Sun Sep 30 15:23:34 [mongosMain] connection accepted from 77.88.18.6:54797#447
> Sun Sep 30 15:23:34 [conn447] authenticate: { authenticate: 1, user: > "rating", nonce: "7642e5d689939a9c", key: > "2ee0b88e7572c5fac957830fa727075d" }
> Sun Sep 30 15:23:36 [conn230] ns: rating.entity could not initialize > cursor across all shards because : stale config detected for ns: > rating.entity ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
> Sun Sep 30 15:23:40 [conn380] ns: rating.entity could not initialize > cursor across all shards because : stale config detected for ns: > rating.entity ParallelCursor::_init @ mongodb-sh1/<...< attempt: 0
> Sun Sep 30 15:23:41 [mongosMain] connection accepted from 77.88.18.6:54904#448
> Sun Sep 30 15:23:41 [conn448] authenticate: { authenticate: 1, user: > "rating", nonce: "83c1b541f74be683", key: > "510ff1afd1d2cf9d41128135fdeccd43" }
> Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name > or service not known
> Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize > cursor across all shards because : socket exception @ mongodb-sh1/<...>
> Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name > or service not known
> Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize > cursor across all shards because : socket exception @ mongodb-sh1/<...>
> Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name > or service not known
> Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize > cursor across all shards because : socket exception @ mongodb-sh1/<...>
> Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name > or service not known
> Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize > cursor across all shards because : socket exception @ mongodb-sh1/<...>
> Sun Sep 30 15:23:41 [conn327] getaddrinfo("host1f.load.net") failed: Name > or service not known
> Sun Sep 30 15:23:41 [conn327] warning: could not get last error from a > shard host1g.load.net :: caused by :: socket exception
> Received signal 6
> Backtrace: 0x528c54 0x7ff4a8b77af0 0x7ff4a8b77a75 0x7ff4a8b7b5c0 > 0x7ff4a8b70941 0x730908 0x712fa5 0x700860 0x72ad97 0x73c8d1 0x5aad20 > 0x7ff4a9ef89ca 0x7ff4a8c2a70d > mongos(_ZN5mongo17printStackAndExitEi+0x64)[0x528c54]
> /lib/libc.so.6(+0x33af0)[0x7ff4a8b77af0]
> /lib/libc.so.6(gsignal+0x35)[0x7ff4a8b77a75]
> /lib/libc.so.6(abort+0x180)[0x7ff4a8b7b5c0]
> /lib/libc.so.6(__assert_fail+0xf1)[0x7ff4a8b70941]
> After I had restarted mongos, exceptions still occurred
> ***** SERVER RESTARTED *****
> Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] Socket recv() timeout
> 95.108.235.171:27017
> Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] SocketException: remote: > 95.108.235.171:27017 error: 9001 socket exception [3] server [
> 95.108.235.171:27017] > Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] DBClientCursor::init call() > failed
> Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] trying reconnect to > host2f.load.net:27017
> Mon Oct 1 14:25:02 [ReplicaSetMonitorWatcher] reconnect > host2f.load.net:27017 ok
> Mon Oct 1 14:25:42 [conn6] ns: rating.entity could not initialize cursor > across all shards because : stale config detected for ns: rating.entity > ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
> Mon Oct 1 14:26:11 [conn148] ns: musicgroup.entity could not initialize > cursor across all shards because : stale config detected for ns: > musicgroup.entity ParallelCursor::_
> init @ mongodb-sh1/<...> attempt: 0
> Mon Oct 1 14:26:18 [conn41] ns: rating.entity could not initialize cursor > across all shards because : stale config detected for ns: > vertis-rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> > attempt: 0
> Mon Oct 1 14:26:49 [conn150] ns: musicgroup.entity could not initialize > cursor across all shards because : stale config detected for ns: > musicgroup.entity ClusteredCursor::
> _checkCursor @ rset02/<...> attempt: 0
> Mon Oct 1 14:26:49 [conn150] ns: musicgroup.entity could not initialize > cursor across all shards because : stale config detected for ns: > musicgroup.entity_attr ClusteredCursor::
> _checkCursor @ rset02/<...> attempt: 1
> Mon Oct 1 14:26:51 [WriteBackListener-host2f:27017] ChunkManager: time to > load chunks for musicgroup.entity: 4ms sequenceNumber: 12 version: 416|1
> Mon Oct 1 14:26:51 [conn150] ns: musicgroup.entity could not initialize > cursor across all shards because : stale config detected for ns: > musicgroup.entity_attr ParallelCursor::_
There was an exception thrown in the log indicating a failure to resolve host1d.load.net - just wanted to use that as a starting point.
Wed Oct 3 18:28:32 [conn273] creating new connection to:
host1d.load.net:27017
Wed Oct 3 18:28:32 [conn273] getaddrinfo("host1d.load.net") failed: Name or service not known
Wed Oct 3 18:28:32 [conn273] warning: could not get last error from a shard host1d.load.net:27017 :: caused by :: socket exception
Wed Oct 3 18:28:32 [conn393] Request::process ns: chords.entity msg id:169529 attempt: 0
Wed Oct 3 18:28:32 [conn393] CursorCache::get id: 4966267460045049647
well, found a similar problem with stale config (https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/7Y9R...)
So, the only question now is why mongos crashed twice.
I'm pretty sure, that all mongod instances were up when mongod fails with exception.
And every time it was new unresolved host(1f and 1g the first time, 1d the second).
These instances are located in different physical machines.
четверг, 4 октября 2012 г., 18:46:29 UTC+4 пользователь Brian McNamara написал:
> There was an exception thrown in the log indicating a failure to resolve > host1d.load.net - just wanted to use that as a starting point.
> Wed Oct 3 18:28:32 [conn273] creating new connection to:
> host1d.load.net:27017
> Wed Oct 3 18:28:32 [conn273] getaddrinfo("host1d.load.net") failed: Name > or service not known
> Wed Oct 3 18:28:32 [conn273] warning: could not get last error from a > shard host1d.load.net:27017 :: caused by :: socket exception
> Wed Oct 3 18:28:32 [conn393] Request::process ns: chords.entity msg > id:169529 attempt: 0
> Wed Oct 3 18:28:32 [conn393] CursorCache::get id: 4966267460045049647
> Regards,
> Brian
> On Thursday, October 4, 2012 7:15:17 AM UTC-4, Anton Volokhov wrote:
>> Obviously, yes. It's resolvable, pingable and sshable all the time.
>> среда, 3 октября 2012 г., 20:02:16 UTC+4 пользователь Brian McNamara >> написал:
>>> Hi Anton,
>>> It looks like there's an issue communicating with one of the nodes (
>>> host1d.load.net:27017). Can you resolve the hostname?
It is possible that an old configuration across the config servers caused the mongos to experience a problem with routing requests. If the issue comes up in the future, it might be advisable to take the route that was taken in http://stackoverflow.com/questions/8563600/mongos-stale-config-error.
On Thursday, October 4, 2012 11:02:00 AM UTC-4, Anton Volokhov wrote:
> well, found a similar problem with stale config (
> https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/7Y9R... > )
> So, the only question now is why mongos crashed twice.
> I'm pretty sure, that all mongod instances were up when mongod fails with > exception.
> And every time it was new unresolved host(1f and 1g the first time, 1d the > second).
> These instances are located in different physical machines.
> четверг, 4 октября 2012 г., 18:46:29 UTC+4 пользователь Brian McNamara > написал:
>> Hi Anton,
>> There was an exception thrown in the log indicating a failure to resolve >> host1d.load.net - just wanted to use that as a starting point.
>> Wed Oct 3 18:28:32 [conn273] creating new connection to:
>> host1d.load.net:27017
>> Wed Oct 3 18:28:32 [conn273] getaddrinfo("host1d.load.net") failed: >> Name or service not known
>> Wed Oct 3 18:28:32 [conn273] warning: could not get last error from a >> shard host1d.load.net:27017 :: caused by :: socket exception
>> Wed Oct 3 18:28:32 [conn393] Request::process ns: chords.entity msg >> id:169529 attempt: 0
>> Wed Oct 3 18:28:32 [conn393] CursorCache::get id: 4966267460045049647
>> Regards,
>> Brian
>> On Thursday, October 4, 2012 7:15:17 AM UTC-4, Anton Volokhov wrote:
>>> Obviously, yes. It's resolvable, pingable and sshable all the time.
>>> среда, 3 октября 2012 г., 20:02:16 UTC+4 пользователь Brian McNamara >>> написал:
>>>> Hi Anton,
>>>> It looks like there's an issue communicating with one of the nodes (
>>>> host1d.load.net:27017). Can you resolve the hostname?