MongoDB Slow Read Errors

29 views
Skip to first unread message

glasses685

unread,
Mar 8, 2025, 5:01:04 PMMar 8
to sipxcom-users
I'm running an older version (17.10) with around 1K users and three servers (all virtual running on VMWare). I've always seen ALARM_MONGODB_SLOW_READ warning at least a few times a day, for years, but they haven't caused an issue. Lately I've been seeing them more frequently and with bigger delays, and once the slow read seemed to cause sipXproxy to stop on my primary server (and I had to go in and restart it). 

It doesn't seem like it's any one particular node complaining about slow reads either, most commonly it's imdb.entity, sometimes node.registrar, sometimes node.subscription. 

Just wondering if anyone has any troubleshooting tips? 

I'm wondering whether it's disk latency on the VMware side, something with my MongoDB, lack of memory, network latency between servers, etc. I do have 16GB RAM installed so I'd think that would be enough, but I could always add more. 

Matt Keys

unread,
Mar 13, 2025, 7:16:07 AMMar 13
to sipxcom-users
Use the sipcodes script and look for subscribe/registrar spamming phones. If you're using an iscsi or nfs storage backend on esxi could also have contention between VMs for the storage. Finally, after completing the first two things I've mentioned you can safely bump the db read/write timeout up to 500ms. This will not prevent slow read/write, but it should squelch the warning. 


Regards,
Matt

glasses685

unread,
Mar 13, 2025, 1:00:28 PMMar 13
to sipxcom-users
Thanks for the suggestion Matt!

I think I found the issue, looks like sipXproxy log had a bunch of constant warnings filling up the logs along the lines of "sipXproxy: Dropping message with comma in contact-user" - \nRegister". Checked it out and it looks like one of my users had put junk username (with a comma in it) as their softphone login and apparently the softphone was spamming the server with requests for a while. Once I put a stop to that the 500ms+ Slow Read warnings stopped for the most part. I do still get 100-200ms warnings occasionally but I've always gotten those so I'd assume that's probably just normal storage latency in our environment. 
Reply all
Reply to author
Forward
0 new messages