nginx start fails if upstream server is not resolved

5,795 views
Skip to first unread message

RJoshi

unread,
May 19, 2016, 5:05:02 PM5/19/16
to openresty-en
Hello,
  Nginx ensures at least one server per upstream is available at the startup time. This is a good feature but problem is one upstream host issues causes Nginx to not comeup.
  Is there any way to ignore DNS failure and continue start?

 If not, can I add a dummy local DNS server in the resolver and start this dummy DNS server which will return TTL 1sec. This would be the last entry in the resolver conf so if other DNS can not resolve the host, this dummy DNS server will return 127.0.0.1 address with TTL 1sec.

Thx

Lord Nynex

unread,
May 19, 2016, 9:11:38 PM5/19/16
to openre...@googlegroups.com
Hello,

DNS lookup failures halt nginx at startup because it has an internal cache to avoid adding dns lookup latency to requests at run time. It is not possible, to explicitly ignore these failures. 

On a more positive note, you have some options. DNS lookups use system level (glibc IIRC) functions (getaddrinfo etc). The bright side is these functions are aware of your system resolver and further, are aware of /etc/hosts. If the host you are putting into the upstream block is NOT a CNAME, you could create /etc/hosts entries for them and call it a day. This is the simplest approach, but is pretty inelegant IMHO. 

Another possibility is to do what you've described, run a local caching recursive dns server (maybe dnsmasq or a full version of bind or powerdns). In such case, you can use nginx's 'resolver' directive to tell it to direct DNS queries to localhost (or a central DNS server within your cluster). 

There are some more modern options as well, that are unrelated to nginx. One example, consul has the ability to generate templates based on configuration changes. In this case, DNS would represent a change indicating to consul it should render a new config file for the upstream block. This is not without it's issues in your situation though, if your setup is such that an A record or CNAME can disappear entirely, there will be a convergence time where nginx will not be able to resolve the missing hostname before consul can render and reload. In this case, nginx will simply produce an error and (depending on your logging) write an error to the error log. 

One last option is using openresty's balancer_by_lua directive to implement your own load balancing block. One possibility is to use a cluster central service like consul to do service registration, and having your lua code select an upstream based on consul service names. This is the most dynamic and scalable solution, but requires code and a non-trivial amount of testing. 

As an aside, I recommend you attempt to solve the problem externally as well. A server or cluster of servers that have dependencies on names, present a serious issue when those names suddenly disappear. Using DNS to indicate availability or register a service, is simply not a sensible or scalable design. 

I hope this helps

-Brandon

--
You received this message because you are subscribed to the Google Groups "openresty-en" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openresty-en...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

in...@ecsystems.nl

unread,
May 20, 2016, 3:53:31 AM5/20/16
to openresty-en
The Lua upstream module allows you to down/up a peer, simply set them as down and after nginx has started 'up' them.
Afaik a down peer is not checked against dns.

RJoshi

unread,
May 20, 2016, 10:15:50 AM5/20/16
to openresty-en
Thanks Brandon for detail answer.

We are using ELB CNAME so putting into /etc/hosts will not work.  

Do you see any issues/gotchas for running a temporary dummy DNS server during start/stop?  Is there any way to remove this dummy resolver from Nginx resolver entries in memory once system is up to avoid any potential issue?

 Consule seem to be good option but want to avoid dependency on any external components.

I am using balancer_by_lua for the  services added after openresty was already started.  I have a logic which reads all the preconfigured upstream instances and route to them. If it was added later, it will be routing using balancer_by_lua.

Do you know any performance different between using balancer_by_lua vs nginx upstream routing?  

I do agree with you that we should be solving this externally.  The problem is in AWS environment with each team managing their own instances/servers, we don't have direct control on their environment and one of them down causes OpenResty to not come up.

Again, thanks for detail explanation. 

RJoshi

unread,
May 20, 2016, 10:26:50 AM5/20/16
to openresty-en
Most of our upstream are configured with either ELB or F5 addresses which mean we have only one entry and marking it down does prevent from starting nginx.

Luis Gasca

unread,
May 20, 2016, 11:34:07 AM5/20/16
to openre...@googlegroups.com
Maybe a local local dnsmasq server and the nginx "resolver" directive pointing to 127.0.0.1 can help you . dnsmasq have several options that my fit your use-case (local-ttl, neg-ttl)
--

Nick Muerdter

unread,
May 20, 2016, 7:00:58 PM5/20/16
to openre...@googlegroups.com
You may also find this nginx module useful: https://github.com/GUI/nginx-upstream-dynamic-servers It primarily offers dynamic resolving of server hostnames, but it also allows nginx to start if a hostname cannot be resolved (with that upstream server being marked as down).
 
(Disclaimer: I created this module a while ago, but haven't been actively using it. However, it's gotten some contributions and fixes recently, and I think it should work with current versions of nginx, although I haven't explicitly tested it with OpenResty.)
 
Nick

Abhishek Manocha

unread,
Jul 13, 2017, 8:30:37 AM7/13/17
to openresty-en
@Lord Nynex  @RJoshi

did you get ahead with dnsmasq solution? Or did something else?

I have same issue as such.
Same nginx and ELB and the DNS ips are not resolved.

nginx version is 

nginx version: nginx/1.11.5


And I use the proxy_pass + resolve


My conf is like -


server {

    listen 80;

    server_name batpod.nearbuytools.com;

    resolver 10.x.x.2 valid=1s;

    set $upstream_endpoint ***.nbtools.com;

    location / {

        proxy_set_header HOST $upstream_endpoint;

        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_set_header X-Real-IP $remote_addr;

        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        proxy_read_timeout 300;

        proxy_pass http://***.nbtools.com/;

    }

    error_log /var/log/nginx/***.nearbuytools.com-error.log;

    access_log /var/log/nginx/***.nearbuytools.com-access.log main;

}


The way I have understood dnsmasq solution is that it will keep on resolving the upstream dns names (btw they are internal AWS dns names as per my understanding) by passing them to AWS DNS server (10.x.x.2) only end of the day and then always resolves to changed IP (In resolver I would be giving localhost now, if I am right?) 

Can point me to the example / gist?
Reply all
Reply to author
Forward
0 new messages