使用OpenResty timer操作lua-resty-mlcache，最终发现mlcache中数据缺失

tianxia...@gmail.com

unread,

Nov 19, 2022, 10:27:30 PM11/19/22

to openresty

我在OpenResty的init_worker_by_lua_block阶段使用ngx.timer.at来同步数据，发现重启两次OpenResty后，最终mlcache的缓存数据缺失了一部分，这是为什么呢？下面是复现步骤：

1、OpenResty的版本

[root@localhost logs]# openresty -V

nginx version: openresty/1.19.9.1

...

2、mlcache的版本

...

local _M = {

_VERSION = "2.5.0",

_AUTHOR = "Thibault Charbonnier",

_LICENSE = "MIT",

_URL = "https://github.com/thibaultcha/lua-resty-mlcache",

}

local mt = { __index = _M }

...

3、nginx配置文件

user root;
worker_processes 1;

#error_log logs/error.log;
#error_log logs/error.log notice;
#error_log logs/error.log info;

#pid logs/nginx.pid;

events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;
sendfile on;
#tcp_nopush on;

#keepalive_timeout 0;
keepalive_timeout 65;

#gzip on;

lua_shared_dict shm_main_cache 500m;
lua_shared_dict shm_miss_cache 50m;
lua_shared_dict shm_lock_cache 50m;
lua_shared_dict shm_iphm_cache 50m;

lua_package_path "/opt/lua/?.lua;;";
lua_package_cpath "/opt/lua/?.so;;";

init_by_lua_block {
_c = require("test_mlcache")
_c.init()
}

init_worker_by_lua_block {
_c.test()
}

server {
listen 80;
server_name localhost;

location =/peek {
content_by_lua_block {
_c.peek()
}
}

location =/access {
content_by_lua_block {
_c.access()
}
}
}
}

4、复现代码段

local mlcache = require("resty.mlcache")
local cjson = require("cjson")
local table_nkeys = require("table.nkeys")

local _M = {
_VERSION = 1.0
}

local default_settings = {
ttl = nil,
neg_ttl = nil
}

function _M.init()
local cache, err = mlcache.new("my_cache",

"shm_main_cache",

{
lru_size = 6, -- size of the L1 (Lua VM) eplat
ttl = 0, -- 1h ttl for hits
neg_ttl = 0, -- 30s ttl for misses
shm_miss = "shm_miss_cache",
shm_locks = "shm_lock_cache",
ipc_shm = "shm_iphm_cache"
})
if err then
ngx.log(ngx.ERR, "Initialize mlcache error: ", err)
return
end

cache:set("dns", default_settings, {})
_G.cache = cache
end

local callback = function(_, cores)
local id = ngx.worker.pid()
ngx.log(ngx.ERR, id)
for i = 1, 5000 do
cores:update()
ngx.sleep(0.002)
local value, err, hit_level = cores:get("dns")
value[tostring(i)] = "value"
if err then
ngx.log(ngx.ERR,"Get mlcache error: ", err)
return
else
cores:set("dns", default_settings, value)
end
end
end

local timer = function()
local ok, err = ngx.timer.at(
0,
callback,
cache
)

if not ok then
ngx.log(ngx.ERR, err)
end
end

function _M.test()
timer()
end

function _M.peek()
local ttl, err, value = cache:peek("dns")
ngx.say("result: ", cjson.encode(value))
ngx.say("pid: ", ngx.worker.pid())
ngx.say("type: ", type(value))
ngx.say("total: ", table_nkeys(value))
end

function _M.access()
local dns_later, err, hit_level = cache:get("dns")
ngx.say("result: ", cjson.encode(dns_later))
ngx.say("pid: ", ngx.worker.pid())
ngx.say("type: ", type(dns_later))
ngx.say("hit_level: ", hit_level)
ngx.say("total: ", table_nkeys(dns_later))
end

return _M

5、复现步骤

步骤1：使用以上配置启动OpenResty，此时会触发init_worker_by_lua的钩子，运行一个后台timer.

$ openresty

步骤2：启动完成OpenResty后，马上重启OpenResty，此时也会触发init_worker_by_lua的钩子，运行一个新的后台timer，但是此时老的timer还在运行。

$ openresty -s reload

步骤3：请求"http://localhost/access"查看最终结果，会发现mlcache中最终返回的结果是错误的，会缺失一部分数据。

$ curl -X POST http://localhost/access

...

pid: 1834966

type: table

hit_level: 1

total: 4687 # The correct value is 5000

步骤4：请求"http://localhost/peek"查看mlcache中L2中的最终结果，会发现同样会缺失一部分数据。

...

pid: 1834966

type: table

total: 4687 # The correct value is 5000

所以，我就感觉十分的疑惑，按照常规推论，L2是使用 lua_shared_dict实现的，操作应该是原子性的，L1的内容是从L2中拉取过来的，即便timer在OpenResty重启时老的timer不会结束，但是也不应该影响新的进程，结果应该是最终一致的才对，不知道是哪里出了问题，请大家帮忙一起定位一下。

Junlong li

unread,

Nov 20, 2022, 3:44:03 AM11/20/22

to openresty

因为两个进程同时再写入到shared dict，后面的写入的覆盖了前面的。而又会从share dict去获取，因此获取的不是自己写入的导致不完整了。

比如第一个进程先写入了3000个，第二个进程开始写入了1，然后后面的进程来获取到1个后再写入。这个时候就是2个开始了。然后后面还是会发生这样的情况。所以最终结果是会比较随机的。

田晓勇

unread,

Nov 20, 2022, 5:21:06 AM11/20/22

to open...@googlegroups.com

😅，原来如此，是我想岔了。

Junlong li <zhuizhu...@gmail.com>于2022年11月20日周日16:44写道：

--
--
邮件来自列表“openresty”,专用于技术讨论!
订阅: 请发空白邮件到 openresty...@googlegroups.com
发言: 请发邮件到 open...@googlegroups.com
退订: 请发邮件至 openresty+...@googlegroups.com
归档: http://groups.google.com/group/openresty
官网: http://openresty.org/
仓库: https://github.com/agentzh/ngx_openresty
教程: http://openresty.org/download/agentzh-nginx-tutorials-zhcn.html
---
您收到此邮件是因为您订阅了Google网上论坛上的“openresty”群组。
要退订此群组并停止接收此群组的电子邮件，请发送电子邮件到openresty+...@googlegroups.com。
要在网络上查看此讨论，请访问https://groups.google.com/d/msgid/openresty/8923d803-c973-4dba-81fa-910e93280afan%40googlegroups.com。

--

电话: 18336032131

邮箱：tianxia...@gmail.com

地址: 上海

Nick Xiao

unread,

Nov 28, 2022, 12:33:17 AM11/28/22

to open...@googlegroups.com

非正常退出貌似是有问题的. 看看这个帖子的评论部分:

编程语言和技术栈杂谈

zhuanlan.zhihu.com

Jinhua Luo

作者

功能都有，但这些功能背后的实现是不安全的，因为共享内存的缘故。

这个锁是不安全的，锁的过程非原子，而且没有进程负责跟踪这个状态，如果一个进程锁上后崩溃，那么这个locked状态就永远被维持着，接下来的事情你可以估计发生什么事了。这个情况还算好，比较干净，如果是对红黑树的操作，那情况就更微妙了，例如加入一个元素需要更新3个指针，但更新完一个指针后退出，其他进程对这个红黑树的后续操作会发生什么古怪事情就无法估计了。（正如C语言能直接对某个内存区域操作而无须考虑语言上的限制，如果这个区域恰好是栈或者数组，那其他代码继续操作就糟糕了）。

2019-02-11

On Nov 19, 2022, at 13:46, tianxia...@gmail.com <tianxia...@gmail.com> wrote:

我在OpenResty的init_by_lua_block阶段使用ngx.timer.at来同步数据，发现重启两次OpenResty后，最终mlcache的缓存数据缺失了一部分，这是为什么呢？下面是复现步骤：

所以，我就感觉十分的疑惑，按照常规推论，应该是最终一致的才对，不知道是哪里除了问题，请大家帮忙一起定位一下。

--
--
邮件来自列表“openresty”,专用于技术讨论!
订阅: 请发空白邮件到 openresty...@googlegroups.com
发言: 请发邮件到 open...@googlegroups.com
退订: 请发邮件至 openresty+...@googlegroups.com
归档: http://groups.google.com/group/openresty
官网: http://openresty.org/
仓库: https://github.com/agentzh/ngx_openresty
教程: http://openresty.org/download/agentzh-nginx-tutorials-zhcn.html
---
您收到此邮件是因为您订阅了Google网上论坛上的“openresty”群组。
要退订此群组并停止接收此群组的电子邮件，请发送电子邮件到openresty+...@googlegroups.com。

要在网络上查看此讨论，请访问https://groups.google.com/d/msgid/openresty/e38697e2-d360-4173-b88f-1b110d0be8bbn%40googlegroups.com。

Reply all

Reply to author

Forward