Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
mongos fails with "Name or service not known"
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Anton Volokhov  
View profile  
 More options Oct 3 2012, 5:53 am
From: Anton Volokhov <a.v.volok...@gmail.com>
Date: Wed, 3 Oct 2012 02:53:56 -0700 (PDT)
Local: Wed, Oct 3 2012 5:53 am
Subject: mongos fails with "Name or service not known"

Hi there!
I'm using mongodb 2.0.7 with 2 shards, each one is configured as a replica
set with 3 instances.
I needed to reconfigure shard name identification, so I removed first
shard(wait for the data draining, moved primaries, so on), changed its name
and added it with new name.

Connection to the cluster is secured with keyFile and password.
I was just about to remove the second shard to do the same changes with it,
when I experienced following exceptions in mongos:

Sun Sep 30 15:23:34 [conn271] ns: musicgroup.entity could not initialize
cursor across all shards because : stale config detected for ns:
musicgroup.entity ParallelCursor::_
init @ mongodb-sh1/<...> attempt: 0
Sun Sep 30 15:23:34 [mongosMain] connection accepted from 77.88.18.6:54797
#447
Sun Sep 30 15:23:34 [conn447]  authenticate: { authenticate: 1, user:
"rating", nonce: "7642e5d689939a9c", key:
"2ee0b88e7572c5fac957830fa727075d" }
Sun Sep 30 15:23:36 [conn230] ns: rating.entity could not initialize cursor
across all shards because : stale config detected for ns: rating.entity
ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Sun Sep 30 15:23:40 [conn380] ns: rating.entity could not initialize cursor
across all shards because : stale config detected for ns: rating.entity
ParallelCursor::_init @ mongodb-sh1/<...< attempt: 0
Sun Sep 30 15:23:41 [mongosMain] connection accepted from 77.88.18.6:54904
#448
Sun Sep 30 15:23:41 [conn448]  authenticate: { authenticate: 1, user:
"rating", nonce: "83c1b541f74be683", key:
"510ff1afd1d2cf9d41128135fdeccd43" }
Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name
or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize
cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name
or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize
cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name
or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize
cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn448] getaddrinfo("host1f.load.net") failed: Name
or service not known
Sun Sep 30 15:23:41 [conn448] DBException in process: could not initialize
cursor across all shards because : socket exception @ mongodb-sh1/<...>
Sun Sep 30 15:23:41 [conn327] getaddrinfo("host1f.load.net") failed: Name
or service not known
Sun Sep 30 15:23:41 [conn327] warning: could not get last error from a
shard host1g.load.net :: caused by :: socket exception
Received signal 6
Backtrace: 0x528c54 0x7ff4a8b77af0 0x7ff4a8b77a75 0x7ff4a8b7b5c0
0x7ff4a8b70941 0x730908 0x712fa5 0x700860 0x72ad97 0x73c8d1 0x5aad20
0x7ff4a9ef89ca 0x7ff4a8c2a70d
mongos(_ZN5mongo17printStackAndExitEi+0x64)[0x528c54]
/lib/libc.so.6(+0x33af0)[0x7ff4a8b77af0]
/lib/libc.so.6(gsignal+0x35)[0x7ff4a8b77a75]
/lib/libc.so.6(abort+0x180)[0x7ff4a8b7b5c0]
/lib/libc.so.6(__assert_fail+0xf1)[0x7ff4a8b70941]
mongos(_ZN5mongo10ClientInfo12getLastErrorERKNS_7BSONObjERNS_14BSONObjBuild erEb+0x3898)[0x730908]
mongos(_ZN5mongo7Command20runAgainstRegisteredEPKcRNS_7BSONObjERNS_14BSONOb jBuilderEi+0x805)[0x712fa5]
mongos(_ZN5mongo14SingleStrategy7queryOpERNS_7RequestE+0x4d0)[0x700860]
mongos(_ZN5mongo7Request7processEi+0x157)[0x72ad97]
mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21Abstract MessagingPortEPNS_9LastErrorE+0x71)[0x73c8d1]
mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x260)[0x5aad20]
/lib/libpthread.so.0(+0x69ca)[0x7ff4a9ef89ca]
/lib/libc.so.6(clone+0x6d)[0x7ff4a8c2a70d]
===
 CursorCache at shutdown -  sharded: 1 passthrough: 2206

After I had restarted mongos, exceptions still occurred

***** SERVER RESTARTED *****

Mon Oct  1 14:25:02 [ReplicaSetMonitorWatcher] Socket recv() timeout  
95.108.235.171:27017
Mon Oct  1 14:25:02 [ReplicaSetMonitorWatcher] SocketException: remote:
95.108.235.171:27017 error: 9001 socket exception [3] server
[95.108.235.171:27017]
Mon Oct  1 14:25:02 [ReplicaSetMonitorWatcher] DBClientCursor::init call()
failed
Mon Oct  1 14:25:02 [ReplicaSetMonitorWatcher] trying reconnect to
host2f.load.net:27017
Mon Oct  1 14:25:02 [ReplicaSetMonitorWatcher] reconnect
host2f.load.net:27017 ok
Mon Oct  1 14:25:42 [conn6] ns: rating.entity could not initialize cursor
across all shards because : stale config detected for ns: rating.entity
ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Mon Oct  1 14:26:11 [conn148] ns: musicgroup.entity could not initialize
cursor across all shards because : stale config detected for ns:
musicgroup.entity ParallelCursor::_
init @ mongodb-sh1/<...> attempt: 0
Mon Oct  1 14:26:18 [conn41] ns: rating.entity could not initialize cursor
across all shards because : stale config detected for ns:
vertis-rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...>
attempt: 0
Mon Oct  1 14:26:49 [conn150] ns: musicgroup.entity could not initialize
cursor across all shards because : stale config detected for ns:
musicgroup.entity ClusteredCursor::
_checkCursor @ rset02/<...> attempt: 0

Mon Oct  1 14:26:49 [conn150] ns: musicgroup.entity could not initialize
cursor across all shards because : stale config detected for ns:
musicgroup.entity_attr ClusteredCursor::
_checkCursor @ rset02/<...> attempt: 1
Mon Oct  1 14:26:51 [WriteBackListener-host2f:27017] ChunkManager: time to
load chunks for musicgroup.entity: 4ms sequenceNumber: 12 version: 416|1
Mon Oct  1 14:26:51 [conn150] ns: musicgroup.entity could not initialize
cursor across all shards because : stale config detected for ns:
musicgroup.entity_attr ParallelCursor::_
init @ <...> attempt: 2
Mon Oct  1 14:27:14 [conn150] SyncClusterConnection connecting to
[conf1g:27018]
Mon Oct  1 14:27:14 [conn150] SyncClusterConnection connecting to
[conf1d:27018]
Mon Oct  1 14:27:14 [conn150] SyncClusterConnection connecting to
[conf1f:27018]
Mon Oct  1 14:27:16 [conn142] ns: rating.entity could not initialize cursor
across all shards because : stale config detected for ns:
rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Mon Oct  1 14:27:19 [conn86] ns: rating.entity could not initialize cursor
across all shards because : stale config detected for ns:
rating.entity_attr ParallelCursor::_init @ mongodb-sh1/<...> attempt: 0
Mon Oct  1 14:27:24 [conn93] ns: musicgroup.entity could not initialize
cursor across all shards because : stale config detected for ns:
musicgroup.entity ParallelCursor::_i
nit @ mongodb-sh1/<...> attempt: 0
Mon Oct  1 14:27:46 [ReplicaSetMonitorWatcher] Socket recv() timeout  
192.168.1.101:27017
Mon Oct  1 14:27:46 [ReplicaSetMonitorWatcher] SocketException: remote:
192.168.1.101:27017 error: 9001 socket exception [3] server
[192.168.1.101:27017]
Mon Oct  1 14:27:46 [ReplicaSetMonitorWatcher] DBClientCursor::init call()
failed
Mon Oct  1 14:27:50 [conn149] ns: musicgroup.entity could not initialize
cursor across all shards because : stale config detected for ns:
musicgroup.entity ParallelCursor::_
init @ mongodb-sh1/<...> attempt: 0
Mon Oct  1 14:27:56 [ReplicaSetMonitorWatcher] trying reconnect to
host2d.load.net:27017
Mon Oct  1 14:27:56 [ReplicaSetMonitorWatcher] reconnect
host2d.load.net:27017 ok

At this moment I realized, that my loglevel is 0. So I changed it to 5 to
see, what's happening.
So, I performed db.runCommand("{setParameter:1,logLevel:5}")
And right after this command, this exception disappeared. Even sharded
queries are processed properly:

Mon Oct  1 14:28:20 [conn90] Request::process ns: musicgroup.entity msg
id:2954917 attempt: 0
Mon Oct  1 14:28:20 [conn90] shard query: musicgroup.entity  { query: {
_id: { $in: <...IDS...> } }, $readPreference: { mode: "primary" } }
Mon Oct  1 14:28:20 [conn90]  shard query servers: 2
       server:mongodb-sh1/<...> _extra:{} _orderObject:{}
       server:rset02/<...> _extra:{} _orderObject:{}

Mongos is still OK, but it seems to be a bug here.
Does anybody have some ideas?
Thanks!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Anton Volokhov  
View profile  
 More options Oct 3 2012, 10:54 am
From: Anton Volokhov <a.v.volok...@gmail.com>
Date: Wed, 3 Oct 2012 07:54:23 -0700 (PDT)
Local: Wed, Oct 3 2012 10:54 am
Subject: Re: mongos fails with "Name or service not known"

Expirienced one more mongos crush, with logLevel 5 now.

Wed Oct  3 18:28:32 [conn406] Request::process ns: chords.entity msg
id:169528 attempt: 0
Wed Oct  3 18:28:32 [conn406] CursorCache::get id: 1058686756268338652
Wed Oct  3 18:28:32 [conn302]  shard query servers: 2
       server:mongodb-sh1/<...> _extra:{} _orderObject:{}
       server:rset02/<...> _extra:{} _orderObject:{}

Wed Oct  3 18:28:32 [conn302] _check : mongodb-sh1/<...>
Wed Oct  3 18:28:32 [conn273] ReplicaSetMonitor::_checkConnection:
host1d.load.net:27017 { setName: "mongodb-sh1", ismaster: false, secondary:
true, hosts: [ "host1d.load.net:27017", "host1f.load.net:27017",
"host1g.load.net:27017" ], primary: "host1f.load.net:27017", me:
"host1d.load.net:27017", maxBsonObjectSize: 16777216, ok: 1.0 }
Wed Oct  3 18:28:32 [conn448] Request::process ns: admin.$cmd msg id:207996
attempt: 0
Wed Oct  3 18:28:32 [conn448] single query: admin.$cmd  { ismaster: 1 }  
ntoreturn: -1 options : 0
Wed Oct  3 18:28:32 [conn448] Request::process ns: admin.$cmd msg id:207997
attempt: 0
Wed Oct  3 18:28:32 [conn448] single query: admin.$cmd  { ismaster: 1 }  
ntoreturn: -1 options : 0
Wed Oct  3 18:28:32 [conn273] ReplicaSetMonitor::_checkConnection:
host1f.load.net:27017 { setName: "mongodb-sh1", ismaster: true, secondary:
false, hosts: [ "host1f.load.net:27017", "host1d.load.net:27017",
"host1g.load.net:27017" ], primary: "host1f.load.net:27017", me:
"host1f.load.net:27017", maxBsonObjectSize: 16777216, ok: 1.0 }
Wed Oct  3 18:28:32 [conn273] Request::process ns: rating.$cmd msg
id:169526 attempt: 0
Wed Oct  3 18:28:32 [conn273] single query: rating.$cmd  { getlasterror: 1,
w: 2, wtimeout: 0 }  ntoreturn: -1 options : 0
Wed Oct  3 18:28:32 [conn273] creating new connection
to:host1d.load.net:27017
Wed Oct  3 18:28:32 [conn273] getaddrinfo("host1d.load.net") failed: Name
or service not known
Wed Oct  3 18:28:32 [conn273] warning: could not get last error from a
shard host1d.load.net:27017 :: caused by :: socket exception
Wed Oct  3 18:28:32 [conn393] Request::process ns: chords.entity msg
id:169529 attempt: 0
Wed Oct  3 18:28:32 [conn393] CursorCache::get id: 4966267460045049647
Wed Oct  3 18:28:32 [conn446] Request::process ns: admin.$cmd msg id:183399
attempt: 0
Wed Oct  3 18:28:32 [conn446] single query: admin.$cmd  { ismaster: 1 }  
ntoreturn: -1 options : 0
Received signal 6
Backtrace: 0x528c54 0x7fed6cc0faf0 0x7fed6cc0fa75 0x7fed6cc135c0
0x7fed6cc08941 0x730908 0x712fa5 0x700860 0x72ad97 0x73c8d1 0x5aad20
0x7fed6df909ca 0x7fed6ccc270d
Wed Oct  3 18:28:32 [conn446] Request::process ns: admin.$cmd msg id:183400
attempt: 0
Wed Oct  3 18:28:32 [conn446] single query: admin.$cmd  { ismaster: 1 }  
ntoreturn: -1 options : 0
mongos(_ZN5mongo17printStackAndExitEi+0x64)[0x528c54]
/lib/libc.so.6(+0x33af0)[0x7fed6cc0faf0]
/lib/libc.so.6(gsignal+0x35)[0x7fed6cc0fa75]
/lib/libc.so.6(abort+0x180)[0x7fed6cc135c0]
/lib/libc.so.6(__assert_fail+0xf1)[0x7fed6cc08941]
mongos(_ZN5mongo10ClientInfo12getLastErrorERKNS_7BSONObjERNS_14BSONObjBuild erEb+0x3898)[0x730908]
mongos(_ZN5mongo7Command20runAgainstRegisteredEPKcRNS_7BSONObjERNS_14BSONOb jBuilderEi+0x805)[0x712fa5]
mongos(_ZN5mongo14SingleStrategy7queryOpERNS_7RequestE+0x4d0)[0x700860]
mongos(_ZN5mongo7Request7processEi+0x157)[0x72ad97]
mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21Abstract MessagingPortEPNS_9LastErrorE+0x71)[0x73c8d1]
mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x260)[0x5aad20]
/lib/libpthread.so.0(+0x69ca)[0x7fed6df909ca]
/lib/libc.so.6(clone+0x6d)[0x7fed6ccc270d]
===
 CursorCache at shutdown -  sharded: 2 passthrough: 1900

среда, 3 октября 2012 г., 13:53:56 UTC+4 пользователь Anton Volokhov
написал:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian McNamara  
View profile  
 More options Oct 3 2012, 12:02 pm
From: Brian McNamara <br...@mc-namara.com>
Date: Wed, 3 Oct 2012 09:02:16 -0700 (PDT)
Local: Wed, Oct 3 2012 12:02 pm
Subject: Re: mongos fails with "Name or service not known"

Hi Anton,

It looks like there's an issue communicating with one of the nodes (host1d.load.net:27017).  Can you resolve the hostname?

Regards,
Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Anton Volokhov  
View profile   Translate to Translated (View Original)
 More options Oct 4 2012, 7:15 am
From: Anton Volokhov <a.v.volok...@gmail.com>
Date: Thu, 4 Oct 2012 04:15:17 -0700 (PDT)
Local: Thurs, Oct 4 2012 7:15 am
Subject: Re: mongos fails with "Name or service not known"

Obviously, yes. It's resolvable, pingable and sshable all the time.

среда, 3 октября 2012 г., 20:02:16 UTC+4 пользователь Brian McNamara
написал:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian McNamara  
View profile  
 More options Oct 4 2012, 10:46 am
From: Brian McNamara <br...@mc-namara.com>
Date: Thu, 4 Oct 2012 07:46:29 -0700 (PDT)
Local: Thurs, Oct 4 2012 10:46 am
Subject: Re: mongos fails with "Name or service not known"

Hi Anton,

There was an exception thrown in the log indicating a failure to resolve
host1d.load.net - just wanted to use that as a starting point.

Wed Oct  3 18:28:32 [conn273] creating new connection to:
host1d.load.net:27017
Wed Oct  3 18:28:32 [conn273] getaddrinfo("host1d.load.net") failed: Name
or service not known
Wed Oct  3 18:28:32 [conn273] warning: could not get last error from a
shard host1d.load.net:27017 :: caused by :: socket exception
Wed Oct  3 18:28:32 [conn393] Request::process ns: chords.entity msg
id:169529 attempt: 0
Wed Oct  3 18:28:32 [conn393] CursorCache::get id: 4966267460045049647

Regards,
Brian


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Anton Volokhov  
View profile  
 More options Oct 4 2012, 11:02 am
From: Anton Volokhov <a.v.volok...@gmail.com>
Date: Thu, 4 Oct 2012 08:02:00 -0700 (PDT)
Subject: Re: mongos fails with "Name or service not known"

well, found a similar problem with stale config
(https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/7Y9R...)
So, the only question now is why mongos crashed twice.
I'm pretty sure, that all mongod instances were up when mongod fails with
exception.
And every time it was new unresolved host(1f and 1g the first time, 1d the
second).
These instances are located in different physical machines.

четверг, 4 октября 2012 г., 18:46:29 UTC+4 пользователь Brian McNamara
написал:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Sam Helman  
View profile  
 More options Oct 8 2012, 3:41 pm
From: Sam Helman <sam.hel...@10gen.com>
Date: Mon, 8 Oct 2012 12:41:30 -0700 (PDT)
Local: Mon, Oct 8 2012 3:41 pm
Subject: Re: mongos fails with "Name or service not known"

It is possible that an old configuration across the config servers caused
the mongos to experience a problem with routing requests.  If the issue
comes up in the future, it might be advisable to take the route that was
taken
in http://stackoverflow.com/questions/8563600/mongos-stale-config-error.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »