> Can you update the commit message to match our usual format? And below
> the first line include a description of exactly how the fix works. If
> this is a workaround that should be removed in the future, please also
> add a COMPAT comment in the code linking to the issue it is working
> You moved some things around, but as far as I can tell the actual
> change is to switch from 'return timeout' to 'timer.add_task(timeout,
Yes it is, but I did not want to have this timeout handling in two places, so
I coalesced it into one single function.
> From what you wrote on the lua-http issue report I understand this is
> to prevent rescheduling the timer (which would be immediately
> executed) but create a new one instead (which will be executed on the
> next "tick").
> However...... I don't see that this is actually the case. In Prosody
> `return timeout` from a timer callback already creates a new timer. It
> doesn't reschedule the current one. So I feel like I'm missing
> something. Any ideas? Also, what network backend are you experiencing
> the issue with? Each has slightly different handling of timers.
I'm using the epoll backend and here it does reschedule the current one.
net/server_epoll.lua line 133 inserts a new timer using the variable "elapsed"
which is invariant in the whole timer loop --> returning 0.0 from the timer
sets the new timer's timestamp to "elapsed".
That timer gets peeked later in the loop and thus the if in line 124 does not
trigger (peek > elapsed), which makes the loop execute the rescheduled timer
again without ever leaving the timer loop at all.
Adding a new timer (with zero timeout) through the timer api will not use the
"frozen" elapsed variable, but a new call to monotonic() to calculate the
timeout, which will be biggern than "elapsed" and will be executed only after
the next epoll() call.
net/server_select.lua does not have this behaviour as far as I can tell.
> All this said, I'm not opposed to merging the patch (with the changes
> mentioned above) because I'm fairly confident nobody but you is using
> net.cqueues right now and it's a critical production issue for you.
> But I'd still like to understand the problem and solution to make sure
> we're not missing something (e.g. a deeper bug in Prosody's timer
Sure, maybe a better fix would be to fix net/server_epoll.lua to not allow
creating endless loops through timers at all.
Initially I didn't want to patch server_epoll because I was not sure what
assumptions the prosody codebase makes about the network backend module.
But given that server_select does not exhibit the same behaviour, patching
server_epoll would bring the epoll backend back in line with the select based
That makes my cqueues patch obsolete and I propose the following change of
line 132 instead:
-local next_time = elapsed+ret;
+local next_time = monotonic()+ret;