blpop操作固定在60s时超时，但redis.conf和new时都指定了无超时限制，求解？

小冶

unread,

Jun 16, 2015, 11:42:31 PM6/16/15

to open...@googlegroups.com

之前一直没注意这个问题，最近打了log才发现，原来在spawn出来的coroutine里用一个大while循环，blpop阻塞方式接收redis消息。

在redis.conf和new时都指定了无超时限制：

 daemonize yes                     
 pidfile /var/run/redis.pid        
 timeout 0                         
 tcp-keepalive 0                   
 save 900 1                        
 save 300 10                       
 save 60 10000                     
 stop-writes-on-bgsave-error yes   
 dbfilename dump.rdb               
                                   
 dir DBSTORE                       
 logfile DBLOG                     
 port PORT                         
 bind 127.0.0.1

local redis = require "resty.redis"                    
                                                       
ip = "127.0.0.1"                                       
port = g_env.db_port                                   
                                                       
db_cache = {}                                          
setmetatable(db_cache, {__mode="kv"})                  
                                                       
get = function()                                       
    local co = coroutine.running()                     
    local db = db_cache[co]                            
    if db then                                         
        if db:ping() ~= "PONG" then                    
            db = nil                                   
            print("[DB] connection closed, remove it") 
        end                                            
    end                                                
                                                       
    if not db then                                     
        db = redis:new()                               
        db:set_timeout(0)                              
        local ok, err = db:connect(ip, port)           
        if not ok then                                 
            print("[DB] connect fail",ip,port,err)     
            return                                     
        end                                                                      
        db_cache[co] = db                              
    end                                                
    return db                                          
end

使用的地方：

recv = function( key, timeout )                                       
    local db = redis.get()                                            
    if not db then return false, "can't open db" end                  
                                                                      
    --timeout = timeout or 10                                           
    print("chan.recv for", key, timeout, os.daytime())              
    local msg, err = db:blpop( key, timeout or 0 )                    
    if err then                                                       
        printf("[DB] err:%s when read channel; %s", err, os.daytime())
        return false, err                                             
    end                                                               
                                                                      
    if db_null(msg) then                                              
        return true, nil                                              
    else                                                              
        print("chan.recv get", unpack(msg))                           
        return true, msg[2]                                           
    end                                                               
end

如果timeout指定为一个一般值，比如5，10，20什么的，那么blpop确实会在这么多时间后超时返回，但没有err，下一轮while里取该db时ping测试也正常，该连接继续使用。

但是如果timeout为nil也就是0时，预期应该是一直阻塞直到有数据返回，但实际上每固定60s后blpop返回，err为timeout，而且下一次db:ping调用失败，连接被关闭，只好重新new一个出来用。

请问，这个是否不正常行为呢？

小冶

unread,

Jun 16, 2015, 11:45:43 PM6/16/15

to open...@googlegroups.com

用redis-cli连上去挂blpop alist 0，是没问题的，会一直卡到有数据来

在 2015年6月17日星期三 UTC+8上午11:42:31，小冶写道：

Yichun Zhang (agentzh)

unread,

Jun 17, 2015, 2:21:27 AM6/17/15

to openresty

Hello!

2015-06-17 11:42 GMT+08:00 小冶:
> 如果timeout指定为一个一般值，比如5，10，20什么的，那么blpop确实会在这么多时间后超时返回，但没有err，下一轮while里取该db时ping测试也正常，该连接继续使用。
> 但是如果timeout为nil也就是0时，预期应该是一直阻塞直到有数据返回，但实际上每固定60s后blpop返回，err为timeout，而且下一次db:ping调用失败，连接被关闭，只好重新new一个出来用。
> 请问，这个是否不正常行为呢？

这是期望的行为。settimeout 方法的文档里面并没有说 0 或者 nil 代表没有超时：

https://github.com/openresty/lua-nginx-module#tcpsocksettimeout

事实上，根据目前的实现，0 或 nil 表示沿用 lua_socket_connect_timeout,
lua_socket_send_timeout, 和 lua_socket_read_timeout
配置指令的设置。而这些配置指令的默认值正是 60s.

Regards,
-agentzh

小冶

unread,

Jun 17, 2015, 2:38:51 AM6/17/15

to open...@googlegroups.com

谢谢答复

参数0不代表超时这个了解了。但是超时就超时了，为何连接都关闭了呢？导致下一次还得重新new个来用

在 2015年6月17日星期三 UTC+8下午2:21:27，agentzh写道：

Yichun Zhang (agentzh)

unread,

Jun 17, 2015, 2:49:23 AM6/17/15

to openresty

Hello!

2015-06-17 14:38 GMT+08:00 小冶:
> 参数0不代表超时这个了解了。但是超时就超时了，为何连接都关闭了呢？导致下一次还得重新new个来用
>

只有当 cosocket 发生写超时或者连接超时的时候才会关闭连接，因为在这两种情况下，连接的状态变得不可确定，所以继续使用该连接是不安全的。当发生致命错误时，你也不用重新
new；而只需在当前对象上调用 connect() 方法重连即可。

Regards,
-agentzh

Reply all

Reply to author

Forward