Bug in mod_smacks on closing connection

28 views
Skip to first unread message

Damian Minkov

unread,
Dec 9, 2022, 7:46:41 PM12/9/22
to prosody-dev
We(the Jitsi Team) have noticed that many sessions in prosody are unbind (the hook pre-resource-unbind) with an error "Hibernating too long". We are using mod_smacks that comes with prosody 0.12.1.

While debugging we found out that the old mod_smack module from the community modules does not have this problem.

The simplest scenario is when we open the prejoin page of jitsi-meet, that establishes an xmpp connection over websocket and we close the browser tab. 

Without loading mod_smacks or when you load the old community module you can see in the logs "Dec 10 00:17:30 c2s55d7399259e0    info    Client disconnected: connection closed" when you close the tab. With the smacks module from latest prosody that log is not printed.
Have you seen this? Do you want more information or some debug info to track it down?

Thank you
damencho

Damian Minkov

unread,
Dec 10, 2022, 12:23:59 AM12/10/22
to prosody-dev
I think this fixes it:

Index: plugins/mod_smacks.lua
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/plugins/mod_smacks.lua b/plugins/mod_smacks.lua
--- a/plugins/mod_smacks.lua    (revision d4daf40caa518a89e51222e6e1d9c230db02839a)
+++ b/plugins/mod_smacks.lua    (date 1670649730425)
@@ -493,13 +493,13 @@
         sessionmanager.destroy_session(session, "Hibernating too long");
         sessions_expired(1);
     end);
+    -- we need the queue in advance because on conn:close the session will be destroyed and queue cleaned
+    local queue = session.outgoing_stanza_queue;
     if session.conn then
         local conn = session.conn;
-        c2s_sessions[conn] = nil;
-        session.conn = nil;
         conn:close();
     end
-    module:fire_event("smacks-hibernation-start", { origin = session; queue = session.outgoing_stanza_queue:table() });
+    module:fire_event("smacks-hibernation-start", { origin = session; queue = queue:table() });
     return true; -- Postpone destruction for now
 end);
  

Clearing the shared sessions table and the conn in session is done in listener.ondisconnect in mod_c2s if session is found.
But if we first clear the shared table and then call conn:close() the session is not found in the cache to be destroyed. 

Damian Minkov

unread,
Dec 30, 2022, 3:25:43 AM12/30/22
to prosody-dev
Hi again,

And what about and this fix, do you think this can be applied?

Thank you
damencho

Damian Minkov

unread,
Feb 17, 2023, 8:58:04 AM2/17/23
to prosody-dev
We have been running this patch for more than a month without noticing any major issues ... one thing I noticed today is this error, have seen it only once:
Feb 17 13:44:22 c2s5619c63eb520 error   Traceback[c2s]: /usr/lib/prosody/modules/mod_smacks.lua:608: attempt to index local 'queue' (a nil value)
        stack traceback:
        /usr/lib/prosody/modules/mod_smacks.lua:608: in function </usr/lib/prosody/modules/mod_smacks.lua:530>
        (...tail calls...)
        /usr/share/lua/5.2/prosody/util/events.lua:81: in function </usr/share/lua/5.2/prosody/util/events.lua:77>
        (...tail calls...)
        /usr/share/lua/5.2/prosody/core/stanza_router.lua:143: in function 'core_process_stanza'
        /usr/lib/prosody/modules/mod_c2s.lua:326: in function 'func'
        /usr/share/lua/5.2/prosody/util/async.lua:144: in function </usr/share/lua/5.2/prosody/util/async.lua:142>

Matthew Wild

unread,
Feb 18, 2023, 5:59:10 AM2/18/23
to proso...@googlegroups.com
Hi Damian,

On Fri, 17 Feb 2023 at 13:58, Damian Minkov <dame...@damencho.com> wrote:
>
> We have been running this patch for more than a month without noticing any major issues ... one thing I noticed today is this error, have seen it only once:

Sorry for the delayed response on this thread. Past few months have been busy.

It sounds like the original issue is easy to reproduce for you. Any
chance you can try it with the latest 0.12 code (e.g. the
'prosody-0.12' package from our repo)? We fixed a bug recently related
to smacks hibernation, and I'm not sure whether this is related or
not.

Skimming your patch, I'm not clear on exactly what the root cause is.
A hibernating session wouldn't actually be destroyed (just
session.conn removed), so I would expect things like the outgoing
stanza queue to remain.

Actually, I'm also not entirely clear on what the issue you're
experiencing is. The "hibernating too long" message is not necessarily
an error - it's expected behaviour if the client disconnects uncleanly
and does not return before the configured hibernation timeout, which
can be very common. In the case of a clean disconnect, the hibernation
code should not kick in at all. In the case of hibernation, the
session should not be destroyed until after the timeout period. Debug
logs during the disconnect of a client would help identify which of
these is happening.

Regards,
Matthew

Damian Minkov

unread,
Feb 18, 2023, 8:50:56 AM2/18/23
to proso...@googlegroups.com
Hi,


On Sat, Feb 18, 2023, 04:59 Matthew Wild <mwi...@gmail.com> wrote:
Hi Damian,

On Fri, 17 Feb 2023 at 13:58, Damian Minkov <dame...@damencho.com> wrote:
>
> We have been running this patch for more than a month without noticing any major issues ... one thing I noticed today is this error, have seen it only once:

Sorry for the delayed response on this thread. Past few months have been busy.

It sounds like the original issue is easy to reproduce for you. Any
chance you can try it with the latest 0.12 code (e.g. the
'prosody-0.12' package from our repo)? We fixed a bug recently related
to smacks hibernation, and I'm not sure whether this is related or
not.

I will check it out next week and let you know. This is different than 0.12.2?


Skimming your patch, I'm not clear on exactly what the root cause is.
A hibernating session wouldn't actually be destroyed (just
session.conn removed), so I would expect things like the outgoing
stanza queue to remain.

Actually, I'm also not entirely clear on what the issue you're
experiencing is. The "hibernating too long" message is not necessarily
an error - it's expected behaviour if the client disconnects uncleanly

and does not return before the configured hibernation timeout, which
can be very common. In the case of a clean disconnect, the hibernation
code should not kick in at all. In the case of hibernation, the
session should not be destroyed until after the timeout period. Debug
logs during the disconnect of a client would help identify which of
these is happening.

Yep, we saw the problem as we added some logs to the unbind event (https://github.com/jitsi/infra-configuration/blob/ae1b579bce46cfbe686ec2cfc4b44b6e2b0309d2/ansible/roles/prosody/files/mod_muc_webhooks.lua#L706). To detect those sessions that actually lost connection. And then we noticed that we got the print for hibernation for all sessions even for the cleanly disconnected, the print is of course a minute after the clean disconnect (this is our setting for hibernating a connection duration).

The order of calls is wrong and on clean disconnect, sessions are not cleared correctly and all go into hibernation. 

Add the unbind message and do a clean connect and disconnect via the websocket and you will able to repro.

I hope this makes more sense.

Thanks
damencho


 

Regards,
Matthew

--
You received this message because you are subscribed to the Google Groups "prosody-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prosody-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prosody-dev/CAJt9-x7t2nMjYBopCTRAG%3DiVMt2RZqsr_Pm6MhBnO-ZrhRb_rQ%40mail.gmail.com.

Damian Minkov

unread,
Feb 18, 2023, 9:01:41 AM2/18/23
to prosody-dev
On Saturday, February 18, 2023 at 7:50:56 AM UTC-6 Damian Minkov wrote:
Hi,


On Sat, Feb 18, 2023, 04:59 Matthew Wild <mwi...@gmail.com> wrote:
Hi Damian,

On Fri, 17 Feb 2023 at 13:58, Damian Minkov <dame...@damencho.com> wrote:
>
> We have been running this patch for more than a month without noticing any major issues ... one thing I noticed today is this error, have seen it only once:

Sorry for the delayed response on this thread. Past few months have been busy.

It sounds like the original issue is easy to reproduce for you. Any
chance you can try it with the latest 0.12 code (e.g. the
'prosody-0.12' package from our repo)? We fixed a bug recently related
to smacks hibernation, and I'm not sure whether this is related or
not.

I will check it out next week and let you know. This is different than 0.12.2?
I see now the various session destruction commits in the 0.12 branch after 0.12.2.
I will test it first thing on Monday and will let you know.

Thanks

Damian Minkov

unread,
Feb 18, 2023, 12:16:55 PM2/18/23
to proso...@googlegroups.com
Hi again Matt,

I was able to repro the same with 0.12 nightly (ii  prosody-0.12  1nightly194-1~focal).
I added the following code to a module that loads under the main virtual host used with websockets and smacks module. With config `smacks_hibernation_time = 60;`:
local function handle_pre_resource_unbind(event)
local error = event.error
local participant_jid = event.session.full_jid
if error then
if participant_jid then
module:log("warn", "Participant %s disconnected abnormally error %s", participant_jid, error)
end
end
end

module:hook("pre-resource-unbind", handle_pre_resource_unbind, 10);
I'm attaching the 3 debug logs I captured, with 0.12.2, with nightly and with nightly after applying the patch that is mentioned in the previous emails.

We have been running with the patch for more than a month now on meet.jit.si.

Thank you
damencho


0.12-nightly.log
0.12-nigtly-patched.log
0.12.2.log

Matthew Wild

unread,
Feb 20, 2023, 1:18:07 PM2/20/23
to proso...@googlegroups.com
Hi Damian,

On Sat, 18 Feb 2023 at 17:16, Damian Minkov <dame...@damencho.com> wrote:
> I was able to repro the same with 0.12 nightly (ii prosody-0.12 1nightly194-1~focal).

Thanks a lot for the testing, and the logs. It was a bit of a rabbit
hole, but I eventually found the root cause and I've committed a fix
to 0.12 and trunk: https://hg.prosody.im/trunk/rev/0598d822614f

The fix will be in the next prosody-0.12 nightly build (due in 6 hours
or so) or you can try manually downloading and applying the patch from
the above link if you want to try it sooner.

Regards,
Matthew

Damian Minkov

unread,
Feb 20, 2023, 7:22:27 PM2/20/23
to proso...@googlegroups.com
Hi again,

I just tested the nightly195 with the fix and I confirm it is fixed now.

Thank you
damencho


--
You received this message because you are subscribed to the Google Groups "prosody-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prosody-dev...@googlegroups.com.

Matthew Wild

unread,
Feb 21, 2023, 5:31:24 AM2/21/23
to proso...@googlegroups.com
On Tue, 21 Feb 2023 at 00:22, 'Damian Minkov' via prosody-dev
<proso...@googlegroups.com> wrote:
> I just tested the nightly195 with the fix and I confirm it is fixed now.

Excellent! Thanks for confirming, the fix will be in 0.12.3.

Regards,
Matthew
Reply all
Reply to author
Forward
0 new messages