开启健康检查模块后,upstream显示DOWM,负载均衡服务无法使用

271 views
Skip to first unread message

Android Run

unread,
Oct 19, 2017, 12:56:43 AM10/19/17
to openresty
大家好,最近刚使用openresty,到我开启监控检查模块时,stats页面显示后台服务器全是DOWN,且负载均衡服务无法使用;

status页面显示如下:


Nginx Worker PID: 10375
Upstream test
    Primary Peers
        192.168.2.93:8079 DOWN
    Backup Peers

错误日志如下:

2017/10/19 12:12:19 [error] 10375#10375: *9012 no live upstreams while connecting to upstream, client: 192.168.2.93, server: localhost, request: "GET /favicon.ico HTTP/1.1", upstream: "http://test/favicon.ico", host: "192.168.20.85", referrer: "http://192.168.20.85/status


系统版本: Centos 6.9 x86_64
openresty:  nginx version: openresty/1.11.2.5


nginx.conf 配置如下:

#user  nobody;
worker_processes  1;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;
    lua_package_path "/usr/local/openresty/lualib/resty/upstream/healthcheck.lua";
    lua_shared_dict healthcheck 5m;
    lua_socket_log_errors off;

    upstream test {
        server 192.168.2.93:8079;
    }

    init_worker_by_lua_block {
        local hc = require "resty.upstream.healthcheck"

        local ok, err = hc.spawn_checker{
            shm = "healthcheck",  -- defined by "lua_shared_dict"
            upstream = "test", -- defined by "upstream"
            type = "http",

            http_req = "GET /status HTTP/1.0\r\nHost: test\r\n\r\n",
                    -- raw HTTP request for checking

            interval = 2000,  -- run the check cycle every 2 sec
            timeout = 1000,   -- 1 sec is the timeout for network operations
            fall = 3,  -- # of successive failures before turning a peer down
            rise = 2,  -- # of successive successes before turning a peer up
            valid_statuses = {200, 302},  -- a list valid HTTP status code
            concurrency = 10,  -- concurrency level for test requests
        }
        if not ok then
            ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
            return
        end
    }

    server {
        listen       80;
        server_name  localhost;

        location / {
            proxy_pass http://test;
        }

        location = /status {
            access_log off;
            allow 127.0.0.1;
            allow 192.168.2.93;
            deny all;

            default_type text/plain;
            content_by_lua_block {
                local hc = require "resty.upstream.healthcheck"
                ngx.say("Nginx Worker PID: ", ngx.worker.pid())
                ngx.print(hc.status_page())
            }
        }

        #charset koi8-r;

        #access_log  logs/host.access.log  main;

        #location / {
        #    root   html;
        #    index  index.html index.htm;
        #}

        #error_page  404              /404.html;

        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

        # proxy the PHP scripts to Apache listening on 127.0.0.1:80
        #
        #location ~ \.php$ {
        #    proxy_pass   http://127.0.0.1;
        #}

        # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
        #
        #location ~ \.php$ {
        #    root           html;
        #    fastcgi_pass   127.0.0.1:9000;
        #    fastcgi_index  index.php;
        #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
        #    include        fastcgi_params;
        #}

        # deny access to .htaccess files, if Apache's document root
        # concurs with nginx's one
        #
        #location ~ /\.ht {
        #    deny  all;
        #}
    }


    # another virtual host using mix of IP-, name-, and port-based configuration
    #
    #server {
    #    listen       8000;
    #    listen       somename:8080;
    #    server_name  somename  alias  another.alias;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}


    # HTTPS server
    #
    #server {
    #    listen       443 ssl;
    #    server_name  localhost;

    #    ssl_certificate      cert.pem;
    #    ssl_certificate_key  cert.key;

    #    ssl_session_cache    shared:SSL:1m;
    #    ssl_session_timeout  5m;

    #    ssl_ciphers  HIGH:!aNULL:!MD5;
    #    ssl_prefer_server_ciphers  on;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}

}



Kwanhur Huang

unread,
Oct 19, 2017, 1:31:19 AM10/19/17
to open...@googlegroups.com
hello,

upstream test {
        server 192.168.2.93:8079;
    }
确认这上游服务正常嚒??对应是否有日志可查?

另可调整error_log级别看是否有异常内容输出
error_log  logs/error.log  notice;

Android Run

unread,
Oct 19, 2017, 1:56:35 AM10/19/17
to openresty
感觉答复,

       上游地址是正常的,如果去掉init_worker_by_lua_block 部分,是可以正常访问的;

日志级别notice后的日志内容如下:


2017/10/19 13:11:04 [notice] 10570#10570: signal process started
2017/10/19 13:11:05 [notice] 20020#20020: signal 1 (SIGHUP) received, reconfiguring
2017/10/19 13:11:05 [notice] 20020#20020: reconfiguring
2017/10/19 13:11:05 [notice] 20020#20020: using the "epoll" event method
2017/10/19 13:11:05 [notice] 20020#20020: start worker processes
2017/10/19 13:11:05 [notice] 20020#20020: start worker process 10571
2017/10/19 13:11:05 [notice] 10568#10568: gracefully shutting down
2017/10/19 13:11:06 [notice] 10568#10568: exiting
2017/10/19 13:11:06 [notice] 10568#10568: exit
2017/10/19 13:11:06 [notice] 20020#20020: signal 17 (SIGCHLD) received
2017/10/19 13:11:06 [notice] 20020#20020: worker process 10568 exited with code 0
2017/10/19 13:11:06 [notice] 20020#20020: signal 29 (SIGIO) received
2017/10/19 13:11:12 [error] 10571#10571: *11502 no live upstreams while connecting to upstream, client: 192.168.2.93, server: localhost, request: "GET / HTTP/1.1", upstream: "http://test/", host: "192.168.20.85"
2017/10/19 13:11:12 [error] 10571#10571: *11502 no live upstreams while connecting to upstream, client: 192.168.2.93, server: localhost, request: "GET /favicon.ico HTTP/1.1", upstream: "http://test/favicon.ico", host: "192.168.20.85", referrer: "http://192.168.20.85/"


在 2017年10月19日星期四 UTC+8下午1:31:19,Kwanhur Huang写道:

tokers

unread,
Oct 19, 2017, 2:00:38 AM10/19/17
to openresty
Hi!
你有自己尝试过用心跳测试的 

http_req = "GET /status HTTP/1.0\r\nHost: test\r\n\r\n",

测试过吗?

Android Run

unread,
Oct 19, 2017, 2:06:58 AM10/19/17
to openresty
感觉答复,

有配置的,如下:

http_req = "GET /status HTTP/1.0\r\nHost: test\r\n\r\n",


在 2017年10月19日星期四 UTC+8下午2:00:38,tokers写道:

Zhang Chao

unread,
Oct 19, 2017, 2:43:09 AM10/19/17
to open...@googlegroups.com
Hi!
我的意思是,你手动这样测试过你的后端吗?
--
--
邮件来自列表“openresty”,专用于技术讨论!
订阅: 请发空白邮件到 openresty...@googlegroups.com
发言: 请发邮件到 open...@googlegroups.com
退订: 请发邮件至 openresty+...@googlegroups.com
归档: http://groups.google.com/group/openresty
官网: http://openresty.org/
仓库: https://github.com/agentzh/ngx_openresty
教程: http://openresty.org/download/agentzh-nginx-tutorials-zhcn.html

Android Run

unread,
Oct 19, 2017, 3:44:19 AM10/19/17
to openresty
您好!

 http_req = "GET /status HTTP/1.0\r\nHost: test\r\n\r\n",
这个配置是什么意思,是指upstream后台的服务器必须提供一个/status页面,并且返回 200 或者302吗。
如果没有这个/status,是不是改成/就OK?

在 2017年10月19日星期四 UTC+8下午2:43:09,tokers写道:

Kwanhur Huang

unread,
Oct 19, 2017, 4:03:07 AM10/19/17
to open...@googlegroups.com
hello,

如果没有这个/status,是不是改成/就OK?

OK的,主要手工确认是上游健康检测API是正常的

Android Run

unread,
Oct 19, 2017, 4:07:56 AM10/19/17
to openresty
您好!

    好的,已经解决;非常感谢大家。


在 2017年10月19日星期四 UTC+8下午4:03:07,Kwanhur Huang写道:
Reply all
Reply to author
Forward
0 new messages