Received: by 10.101.2.7 with SMTP id e7mr106042ani.14.1349725304855; Mon, 08 Oct 2012 12:41:44 -0700 (PDT) X-BeenThere: mongodb-user@googlegroups.com Received: by 10.236.120.244 with SMTP id p80ls9336398yhh.0.gmail; Mon, 08 Oct 2012 12:41:31 -0700 (PDT) Received: by 10.236.79.7 with SMTP id h7mr1233626yhe.2.1349725291058; Mon, 08 Oct 2012 12:41:31 -0700 (PDT) Date: Mon, 8 Oct 2012 12:41:30 -0700 (PDT) From: Sam Helman To: mongodb-user@googlegroups.com Message-Id: <5802a808-5eb5-4281-b68d-bad42fd88628@googlegroups.com> In-Reply-To: References: <19ef60cd-8ab1-404f-b098-089e18eec22a@googlegroups.com> <9c43efaa-d58d-494c-8c35-7afe1656c2c1@googlegroups.com> <31007ef7-9087-4adc-8faa-17a7c472caf7@googlegroups.com> Subject: Re: mongos fails with "Name or service not known" MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_960_4700852.1349725290768" ------=_Part_960_4700852.1349725290768 Content-Type: multipart/alternative; boundary="----=_Part_961_20099803.1349725290768" ------=_Part_961_20099803.1349725290768 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable It is possible that an old configuration across the config servers caused= =20 the mongos to experience a problem with routing requests. If the issue=20 comes up in the future, it might be advisable to take the route that was=20 taken=20 in http://stackoverflow.com/questions/8563600/mongos-stale-config-error. On Thursday, October 4, 2012 11:02:00 AM UTC-4, Anton Volokhov wrote: > > well, found a similar problem with stale config ( > https://groups.google.com/forum/?fromgroups=3D#!topic/mongodb-user/7Y9Rg-= MCYZk > ) > So, the only question now is why mongos crashed twice. > I'm pretty sure, that all mongod instances were up when mongod fails with= =20 > exception. > And every time it was new unresolved host(1f and 1g the first time, 1d th= e=20 > second). > These instances are located in different physical machines. > > =D1=87=D0=B5=D1=82=D0=B2=D0=B5=D1=80=D0=B3, 4 =D0=BE=D0=BA=D1=82=D1=8F=D0= =B1=D1=80=D1=8F 2012 =D0=B3., 18:46:29 UTC+4 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7= =D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C Brian McNamara=20 > =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB: >> >> Hi Anton, >> >> There was an exception thrown in the log indicating a failure to resolve= =20 >> host1d.load.net - just wanted to use that as a starting point.=20 >> >> Wed Oct 3 18:28:32 [conn273] creating new connection to: >> host1d.load.net:27017 >> Wed Oct 3 18:28:32 [conn273] getaddrinfo("host1d.load.net") failed:=20 >> Name or service not known >> Wed Oct 3 18:28:32 [conn273] warning: could not get last error from a= =20 >> shard host1d.load.net:27017 :: caused by :: socket exception >> Wed Oct 3 18:28:32 [conn393] Request::process ns: chords.entity msg=20 >> id:169529 attempt: 0 >> Wed Oct 3 18:28:32 [conn393] CursorCache::get id: 4966267460045049647 >> >> Regards, >> Brian >> >> On Thursday, October 4, 2012 7:15:17 AM UTC-4, Anton Volokhov wrote: >>> >>> Obviously, yes. It's resolvable, pingable and sshable all the time. >>> >>> =D1=81=D1=80=D0=B5=D0=B4=D0=B0, 3 =D0=BE=D0=BA=D1=82=D1=8F=D0=B1=D1=80= =D1=8F 2012 =D0=B3., 20:02:16 UTC+4 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0= =B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C Brian McNamara=20 >>> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB: >>>> >>>> Hi Anton, >>>> >>>> It looks like there's an issue communicating with one of the nodes ( >>>> host1d.load.net:27017). Can you resolve the hostname? >>>> >>>> Regards, >>>> Brian >>>> >>>> ------=_Part_961_20099803.1349725290768 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable It is possible that an old configuration across the config servers caused t= he mongos to experience a problem with routing requests.  If the issue= comes up in the future, it might be advisable to take the route that was t= aken in http://stackoverflow.com/questions/8563600/mongos-stale-config= -error.

On Thursday, October 4, 2012 11:02:00 AM UTC-4, Anton Volokh= ov wrote:
well, found a similar= problem with stale config (https://group= s.google.com/forum/?fromgroups=3D#!topic/mongodb-user/7Y9Rg-MCYZk= )
So, the only question now is why mongos crashed twice.
I'm pret= ty sure, that all mongod instances were up when mongod fails with exception= .
And every time it was new unresolved host(1f and 1g the first time, 1d= the second).
These instances are located in different physical machines= .

=D1=87=D0=B5=D1=82=D0=B2=D0=B5=D1=80=D0=B3, 4 =D0=BE=D0=BA=D1=82= =D1=8F=D0=B1=D1=80=D1=8F 2012 =D0=B3., 18:46:29 UTC+4 =D0=BF=D0=BE=D0= =BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C Brian McNamara = =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:
Hi Anton,

There was an exception thrown in the l= og indicating a failure to resolve host1d.load.net - just wanted to use that as a starting po= int. 

Wed Oct  3 18:28:32 [conn273] crea= ting new connection to:host1d.load.net:27017
Wed Oct  3 18:28:32 [conn273] get= addrinfo("host1d.load= .net") failed: Name or service not known
Wed Oct  3 18:28:32 [c= onn273] warning: could not get last error from a shard host1d.load.net:27017 = ;:: caused by :: socket exception
Wed Oct  3 18:28:32 [conn393] Req= uest::process ns: chords.entity msg id:169529 attempt: 0
Wed Oct  3= 18:28:32 [conn393] CursorCache::get id: 4966267460045049647

Regards= ,
Brian

On Thursday, October 4, 2012 7:15:17 AM UTC-4, Ant= on Volokhov wrote:
Obviously, yes. I= t's resolvable, pingable and sshable all the time.

=D1=81=D1=80=D0= =B5=D0=B4=D0=B0, 3 =D0=BE=D0=BA=D1=82=D1=8F=D0=B1=D1=80=D1=8F 2012 =D0= =B3., 20:02:16 UTC+4 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82= =D0=B5=D0=BB=D1=8C Brian McNamara =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0= =BB:
Hi Anton,

It looks like there= 's an issue communicating with one of the nodes (host1d.load.net:27017).  Can you = resolve the hostname?

Regards,
Brian

------=_Part_961_20099803.1349725290768-- ------=_Part_960_4700852.1349725290768--