Hello,
I run 2 separate prosody 0.11.8 servers with ~1000 users for a few years. Both have decent CPUs ( Xeon E-2336 @ 4.8ghz ). The clients all use a fork of Conversations.
The last 6 months an issue has been occurring randomly every few weeks where Prosody CPU usage hits 100% and connections timeout. This can last for several minutes to over an hour, then eventually the problem resolves itself.
Looking through the logs, I tracked the issue down to mod_mam archive queries. This occurs on both groups and one-on-one conversations. Strangely it seems to mostly happen cross-server ( i.e: user A from
example.com querying group B on
example2.com ). The queries only take a few seconds each, but are followed one after another by more queries which is blocking the server.
Here's a snippet from the most recent outage (with usernames redacted):
Apr 12 20:03:17 c2s5644a83f4b00 debug Received[c2s]: <iq type='set' from='XYZ/Chat.STv5' id='jSiYtL7_fhQ1'>
Apr 12 20:03:17 example.com:mam debug Archive query, id nuat69qfv1 with FOOBAR from 1674081740 until 1680912929)
Apr 12 20:03:21 example.com:mam debug Archive query nuat69qfv1 completed
Apr 12 20:03:21 c2s5644a83f4b00 debug Sending[c2s]: <iq id='jSiYtL7_fhQ1' to='XYZ/Chat.STv5' type='result'>
Apr 12 20:03:21 c2s5644a83f4b00 debug Calling delayed_ack_function directly (still waiting for ack)
Apr 12 20:03:21 c2s5644a83f4b00 debug Firing event 'smacks-ack-delayed', queue = 146
Apr 12 20:03:21 c2s5644a83f4b00 debug Handled 68 incoming stanzas
Apr 12 20:03:21 c2s5644a83f4b00 debug Received[c2s]: <iq type='set' from='XYZ/Chat.STv5' id='XmZG3HZMm_wo'>
Apr 12 20:03:21 example.com:mam debug Archive query, id clmgs2ho0n with FOOBAR from 1680912929 until 1680912927)
Apr 12 20:03:24 example.com:mam debug Archive query clmgs2ho0n completed
Apr 12 20:03:24 c2s5644a83f4b00 debug Sending[c2s]: <iq id='XmZG3HZMm_wo' to='XYZ/Chat.STv5' type='result'>
Apr 12 20:03:24 c2s5644a83f4b00 debug Calling delayed_ack_function directly (still waiting for ack)
Apr 12 20:03:24 c2s5644a83f4b00 debug Firing event 'smacks-ack-delayed', queue = 147
Apr 12 20:03:24 c2s5644a83f4b00 debug Handled 69 incoming stanzas
Apr 12 20:03:24 c2s5644a83f4b00 debug Received[c2s]: <iq type='set' from='XYZ/Chat.STv5' id='aJOndG1knKJn'>
Apr 12 20:03:24 example.com:mam debug Archive query, id lm9d3creil with ABC from 1675094398 until 1680912929)
Apr 12 20:03:27 example.com:mam debug Archive query lm9d3creil completed
I've tried a number of changes which had no effect:
- Tuning the gc to use threshold = 200
- archive_expires_after = "1w"
- max_archive_query_results = 20
- restarting the server
Does anyone know what's causing this or how it can be fixed? Is this related to
https://issues.prosody.im/1523 ? Will upgrading to 0.12.3 resolve this ? I'd appreciate any help I can get. Besides this problem I'm really happy with prosody overall.