c调用lua5.3.2库出现死锁

7 views
Skip to first unread message

lw li

unread,
Aug 26, 2019, 5:17:56 AM8/26/19
to Lua Chinese(Lua中文用户组)
运行环境是:
1. 多线程环境运行lua库调用lua脚本
2. 同时会有多线程调用system执行其他任务

结果有概率性的出现了死锁问题,部分堆栈信息如下:

Thread 182 (Thread 0x7f7b3b79e700 (LWP 14196)):
#0  0x00007f7b7ab03d1c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f7b7aa76728 in _L_lock_6097 () from /lib64/libc.so.6
#2  0x00007f7b7aa7647f in __GI__IO_list_lock () from /lib64/libc.so.6
#3  0x00007f7b7aabd087 in fork () from /lib64/libc.so.6
#4  0x000000000044a302 in systemEx (cmdstring=0x7f7a7c01def8 "/usr/local/ms-eagle-wstools/bin/wssnmp -f /usr/local/ms-eagle-rcc/data/wsms_snmp_switch_bond_error.in -o /usr/local/ms-eagle-rcc/data/wsms_snmp_switch_bond_error.out", timeout=600) at ../src/util/ToolsUtil.cpp:135
#5  0x0000000000412f20 in BatchJobExecutor::runCmd (this=<optimized out>, cmd="/usr/local/ms-eagle-wstools/bin/wssnmp -f /usr/local/ms-eagle-rcc/data/wsms_snmp_switch_bond_error.in -o /usr/local/ms-eagle-rcc/data/wsms_snmp_switch_bond_error.out", iTimeout=997844736, pLogger=0xffffffffffffffff) at ../src/BatchJobExecutor.cpp:146
#6  0x0000000000413f1e in BatchJobExecutor::executeTask (this=0x11f1660, spJobInfo=..., spExecutingTask=...) at ../src/BatchJobExecutor.cpp:69
#7  0x0000000000420970 in JobExecutorWrapper::executeBatchTask (this=0x1146800, spJobInfo=..., spExecutingTask=...) at ../src/JobExecutorWrapper.cpp:198
#8  0x0000000000422319 in JobExecutorWrapper::batchTaskCallbackWrapper (data=<optimized out>, user_data=<optimized out>) at ../src/JobExecutorWrapper.cpp:127
#9  0x00007f7b7b86fe7c in g_thread_pool_thread_proxy () from /lib64/libglib-2.0.so.0
#10 0x00007f7b7b86f4f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#11 0x00007f7b7adccdd5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007f7b7aaf602d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f7ae26ec700 (LWP 14396)):
#0  0x00007f7b7ab03d1c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f7b7aa76640 in _L_lock_4452 () from /lib64/libc.so.6
#2  0x00007f7b7aa75a08 in _IO_flush_all_lockp () from /lib64/libc.so.6
#3  0x000000000046ba63 in io_popen ()
#4  0x000000000045c1dc in luaD_precall ()
#5  0x00000000004665a5 in luaV_execute ()
#6  0x000000000045c59c in luaD_call ()
#7  0x000000000045b75b in luaD_rawrunprotected ()
#8  0x000000000045b7e3 in luaD_pcall ()
#9  0x00000000004595f2 in lua_pcallk ()
#10 0x0000000000429252 in LuaHandler::parseConfigure (this=0x7f7ae26ebac0, filename=0x7f7adc09f8c8 "/usr/local/ms-eagle-rcc/conf/wsms/dns_local_domain", spTaskType=...) at ../src/LuaHandler.cpp:85
#11 0x00000000004393dc in TaskConfigure::executeScript (this=<optimized out>, script="/usr/local/ms-eagle-rcc/script/wsms/parseConfigureDns_dns_local_domain.lua", configFilename="/usr/local/ms-eagle-rcc/conf/wsms/dns_local_domain", pLogger=<optimized out>, spTaskType=...) at ../src/TaskConfigure.cpp:105
#12 0x000000000043989b in TaskConfigure::parseConfigure (this=0x11188c0, spTaskType=...) at ../src/TaskConfigure.cpp:75
#13 0x0000000000439e7c in TaskConfigure::parseConfigure (this=0x11188c0, pSchedulerConfig=0x7f7ae26ebca0) at ../src/TaskConfigure.cpp:43
#14 0x0000000000437844 in SchedulerHandler::updateConf (this=0x1118670, schedulerInfo=...) at ../src/SchedulerHandler.cpp:568
#15 0x0000000000437b8e in SchedulerHandler::run (this=0x7f7a50001a00) at ../src/SchedulerHandler.cpp:148
#16 0x00000000004cb9a0 in Poco::ThreadImpl::runnableEntry(void*) ()
#17 0x00007f7b7adccdd5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f7b7aaf602d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f7ae46f0700 (LWP 14392)):
#0  0x00007f7b7ab03d1c in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007f7b7aa7653d in _L_lock_121 () from /lib64/libc.so.6
#2  0x00007f7b7aa74103 in __GI__IO_un_link () from /lib64/libc.so.6
#3  0x00007f7b7aa6611d in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
#4  0x000000000046b2db in io_fclose ()
#5  0x000000000045c1dc in luaD_precall ()
#6  0x00000000004665a5 in luaV_execute ()
#7  0x000000000045c59c in luaD_call ()
#8  0x000000000045b75b in luaD_rawrunprotected ()
#9  0x000000000045b7e3 in luaD_pcall ()
#10 0x00000000004595f2 in lua_pcallk ()
#11 0x0000000000429638 in LuaHandler::dataAssemble_ex (this=0x7f7ae46efd40, pDataList=0x7f7a7c03c030, pResult=0x7f7ae46efd00, mapTaskRunResult=std::map with 0 elements, mapUpdateData=std::map with 0 elements, listSendPath=empty std::list, mapTaskHighsPot=std::map with 0 elements) at ../src/LuaHandler.cpp:387
#12 0x0000000000418dc6 in DataAssembler::dataAssemble (this=0x117b770, key="wsms_dns_hijacking_map", pDataList=0x7f7a7c03c030) at ../src/DataAssembler.cpp:323
#13 0x000000000041961b in DataAssembler::executeDataAssemble (this=0x7f7b7adc09e0 <list_all_lock>, strKey=<error reading variable: Cannot access memory at address 0x80>, pDataList=0x7f7ae46f0700) at ../src/DataAssembler.cpp:227
#14 0x00000000004198a1 in DataAssembler::dataAssembleCallBack (data=<optimized out>, user_data=<optimized out>) at ../src/DataAssembler.cpp:239
#15 0x00007f7b7b86fe7c in g_thread_pool_thread_proxy () from /lib64/libglib-2.0.so.0
#16 0x00007f7b7b86f4f5 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#17 0x00007f7b7adccdd5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f7b7aaf602d in clone () from /lib64/libc.so.6


最终定位到是lua内部io_popen的实现,调用了l_popen宏
#define l_popen(L,c,m) (fflush(NULL), popen(c,m))
正是这个fflush(NULL)导致了死锁的问题(参考这篇帖子:https://bugzilla.redhat.com/show_bug.cgi?id=906468

我的需改方案是直接把fflush(NULL)去掉,想知道这样会不会有什么风险









Reply all
Reply to author
Forward
0 new messages