Local Node Name Resolution

scott

unread,

Jul 26, 2016, 12:27:21 PM7/26/16

to Isilon Technical User Group

A few weeks ago I noticed logging in via ssh to nodes on one of our Isilon clusters was taking a very long time. (~20 seconds). I believe I've isolated this problem to the local name resolver that is running on 127.0.0.1

On each node, /etc/resolv.conf looks like this:

[edited for the public internet]

search company.org
nameserver      127.0.0.1
nameserver      11.22.33.44
nameserver      11.22.55.66

If I attempt a nslookup specifying the nameserver, 127.0.0.1 fails while the others work:

# nslookup myhost 127.0.0.1

;; connection timed out; no servers could be reached

but the following works:

# nslookup myhost 11.22.33.44

Server: 11.22.33.44

Address: 11.22.33.44#53

Name: myhost.company.org

Address: 44.55.66.77

So, in summary,lookups against 127.0.0.1 are failing only on this cluster - and only for some names. If I remove 127.0.0.1 from resolv.conf the slow login goes away - but the cluster recreates the entry in a few seconds. If I turn off ssh dns lookups the problem also goes away.

Is there a way to refresh the local DNS lookup program?

Thanks

Dan Pritts

unread,

Jul 26, 2016, 1:11:34 PM7/26/16

to isilon-u...@googlegroups.com

isi net dns flush will flush the local dns cache, which might or might not help.

You can also do isi services dnscache (enable|disable).

I don't know what Isilon component does the dns caching. Maybe it's /usr/sbin/isi_dnsiq_d but that could only be for the smartconnect zones.

It isn't running standard BIND.

sounds like you might be headed toward opening a ticket. good luck

danno

scott

July 26, 2016 at 12:27 PM

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Dan Pritts

ICPSR Computing & Network Services

University of Michigan

erik.j...@gmail.com

unread,

Jul 26, 2016, 1:23:27 PM7/26/16

to isilon-u...@googlegroups.com

You may try to restart the local DNS cache daemon processes.

The process here is called "isi_cbind_d" and you may notice that some lookups work on some nodes but not others. Isilon recommended tool to use with DNS lookups is "dig" as it can provide more info than "nslookup".

If you look at isi networks dnscache statistics you may notice that the stats are all 0 which could indicate that cbind is "stuck / hung" on some nodes.

--

Erik Weiman

Sent from my iPhone 6s

saurabh chaudhary

unread,

Jul 27, 2016, 1:35:06 AM7/27/16

to Isilon Technical User Group

Hi,

The IP : 127.0.0.1 which you are referring to is the hardware loopback IP.

Every Hardware irrespective of vendor have its fixed loopback IP globally constant as 127.0.0.1.

_____________________________________________________________________

lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384

inet6 ::1 prefixlen 128 zone 1

inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 zone 1

inet 127.0.0.1 netmask 0xff000000 zone 1

-------------------------------------------------------------------------------------------------------------------------

This loopback IP is been used for the internal hardware communication and even also for the local replications too.

The file : /etc/resolv.conf, is similar to the host entry that we have on server and desktop [C:\Windows\System32\drivers\etc].

Isilon OS will created a host entry automatically for the localhost [node] for internal communication within hardware and cluster.

Because of this also you are not getting Host-A entry on AD-DNS on your network through nslookup.

+++++++++++++++++++++++++++++

It seems your problem is not with the IP 127.0.0.1, for me it seems to be routeing issue, number of hopes count is high due to this you may be facing the latency issue.

Please check the trace-route of your Isilon Node IP from your desktop\server and check the hop count at different subnet level.

erik.j...@gmail.com

unread,

Jul 27, 2016, 11:53:35 AM7/27/16

to isilon-u...@googlegroups.com

Isilon in OneFS 7.0 and later runs an internal DNS caching server on the loopback address of 127.0.0.1

--

Erik Weiman

Sent from my iPhone 6s

Chris Pepper

unread,

Jul 27, 2016, 12:02:43 PM7/27/16

to isilon-u...@googlegroups.com

Unfortunately we have had to disable it several times because it returned stale data, rather than properly refreshing. But it's currently enabled on all our 4 clusters (v7.1.0.5 through v7.2.0.3), so it did get better.

Chris

Dan Pritts

unread,

Jul 27, 2016, 6:01:06 PM7/27/16

to isilon-u...@googlegroups.com

erik.j...@gmail.com

July 27, 2016 at 11:53 AM

Isilon in OneFS 7.0 and later runs an internal DNS caching server on the loopback address of 127.0.0.1

Can you provide any insight as to why this is (apparently) an isilon/emc developed nameserver, rather than just using a standard nameserver like bind or unbound or djbdns? Curious what the need was for extra/nonstandard functionality.

danno

Erik Weiman

unread,

Jul 27, 2016, 6:04:34 PM7/27/16

to isilon-u...@googlegroups.com

Well it is named "isi_cbind_d" so I'd have to think it is bind based but I've not looked at the source code.

Speculation would be that this is a custom server to prevent the smaller companies that do not have a DNS infrastructure from being tempted to use the cluster as a full DNS server.

--

Erik Weiman

Sent from my iPhone 6+

Dan Pritts

unread,

Jul 27, 2016, 6:09:14 PM7/27/16

to isilon-u...@googlegroups.com

based on googling some phrases in the "strings" output, I don't think it's BIND.

"cbind" seems to mean "cluster bind" but that doesn't bring up anything either.

NBD, i guess, just curious.

Erik Weiman

July 27, 2016 at 6:04 PM

Well it is named "isi_cbind_d" so I'd have to think it is bind based but I've not looked at the source code.

Speculation would be that this is a custom server to prevent the smaller companies that do not have a DNS infrastructure from being tempted to use the cluster as a full DNS server.

--
Erik Weiman
Sent from my iPhone 6+

On Jul 27, 2016, at 5:01 PM, Dan Pritts <da...@umich.edu> wrote:

scott

unread,

Aug 1, 2016, 5:08:03 PM8/1/16

to Isilon Technical User Group

So, in summary,lookups against 127.0.0.1 are failing only on this cluster - and only for some names. If I remove 127.0.0.1 from resolv.conf the slow login goes away - but the cluster recreates the entry in a few seconds. If I turn off ssh dns lookups the problem also goes away.

Is there a way to refresh the local DNS lookup program?

Hopefully its not terrible form to reply to my own post. I believe I solved this one and want to record the solution.

I tried the recommendations: (isi_for_array "killall isi_cbind_d") and a few things suggested by support (restart sshd / lsass)

What seemed to work was, correcting the network connectivity to all nodes and waiting a couple of days.

-Scott

In my 8-node cluster, 2 of the nodes were misconfigured at our switch and were not on the right VLAN. They had link, but could not ping the outside world. After correcting that

Reply all

Reply to author

Forward