RTP Engine stops responding to http requests and control commands over websocket

293 views
Skip to first unread message

Jonas Swiatek

unread,
Nov 17, 2023, 6:37:44 AM11/17/23
to rtpengine
Hi!

I've got mr12.0.1 running in Docker on Debian Bookworm, having recently upgraded from mr9.5.6 on Debian Stretch (also docker).

I'm running into an issue where RTP Engine simply stops responding to http requests (/ping), and also doesn't respond to any ng commands sent over websocket.

I don't use the UDP Interface for control, so I don't know if this is also unresponsive. As far as I can tell rtp engine is still processing rtp for already established sessions.

There's no logs being printed that indicates any error - it simply just stops printing the usual lines:

INFO: [http] HTTP GET from 172.29.37.7:9682: '/ping'
INFO: [control] Received command 'ping' from 172.29.37.7:61970
INFO: [control] Replying to 'ping' from 172.29.37.7:61970 (elapsed time 0.000001 sec)

These lines simply stops being emitted, and all the surrounding systems show that http connections don't establish, and existing Websocket connections no longer receive replies to commands sent. 

Any tips on how I can debug this issue further, or any idea what could be the root cause?

My rtpengine.conf file is as follows:
[rtpengine]
foreground=true
interface=LOCAL_IP!PUBLIC_IP
listen-ng=LOCAL_IP:2223
listen-http=LOCAL_IP:8080
port-min=30000
port-max=40000
offer-timeout=600
timeout=180
log-stderr=true

Donat Zenichev

unread,
Nov 19, 2023, 12:37:29 PM11/19/23
to rtpengine
Hi Jonas,

is it exclusively related to HTTP?
Any other interaction with rtpengine possible? (for example via NG protocol or rtpengine module).

Do you see the http port being listened (or any other ports, which are usually engaged by rtpengine)? Use the `netstat` command to see that.
Also make sure you don't have coredumps, could have happened that rtpengine just crashed, but packets still keep on getting processed via kernel.

BR,
Donat

пятница, 17 ноября 2023 г. в 12:37:44 UTC+1, jonas....@gmail.com:

Jonas Swiatek

unread,
Nov 20, 2023, 6:51:42 AM11/20/23
to rtpengine
Hi Donat,

It's exclusively related to HTTP. But I can't be 100% sure right now.

The way I'm running RTP Engine is in a group of docker containers, slapped behind an HTTP Load Balancer.
When I need RTP Engine to do something, I establish a WebSocket connection, one per call, kept alive for the duration, which is used to perform control actions for that specific call.
The entire logic behind it, is that when scaling up or down, the load balancer knows if there are active calls on the instance, and can properly drain them without me having to figure that out on my own. The Load Balancer also knows the connection count to each, so it can load the cluster evenly. The Load Balancer itself sends an http get request to /ping on the http port regularly to verify that RTP Engine is still running.

So for that reason I never actually use UDP to control RTP Engine.

I'm 99% sure it didn't crash, since it keeps printing "final packet stats" messages after it stops responding to HTTP/WebSocket requests. But then again I don't know if those would be printed from the kernel module.
The calls are being transcoded though, which as far as I know, means they're not offloaded to kernel level forwarding(?).

I'm a little at a loss as to how to debug this, so I'm very open to suggestions. To my extremely untrained eye it looks like maybe a deadlock in the code that handles HTTP requests in RTP Engine. But again, I base that on absolutely nothing.

I might be able to rig the docker entrypoint to dump netstat into stdout before being shut down by the load balancer. That way I can at least get that information.

Donat Zenichev

unread,
Nov 22, 2023, 10:30:43 AM11/22/23
to rtpengine
For now hardly can I state it's related to locks, but for further investigation I need to see the full log during your issue.
I quickly checked sources of the websocket, I see there are plenty of logs which should be printed beforehand to logging those lines you've mentioned before.

So probably, if you enable the 7th level for logging of at least `--log-level-core`, `--log-level-internals`, `--log-level-http` and `--log-level-control`,
we should see where exactly this processing gets stuck.

Also, can you show the content of your HTTP request? (excluding IPs)

понедельник, 20 ноября 2023 г. в 12:51:42 UTC+1, jonas....@gmail.com:

Jonas Swiatek

unread,
Nov 22, 2023, 10:47:42 AM11/22/23
to rtpengine
I'll try deploying an instance running with that log level, but I fear it might log a bit too much for production use. This issue isn't something that I can actively provoke - it usually happens after 6-7 days of running in production. But I'll see if I can find a neat way to get all that log output channeled somewhere.

As for the HTTP Requests, they're only GET /ping for the load balancer, and then the websockets which send offer/answer and ping requests to RTP Engine as well. I'll see if I can get those logged, but I believe that setting log level 7 will print the contents of the requests into the logs as well.

Richard Fuchs

unread,
Nov 27, 2023, 9:27:29 AM11/27/23
to rtpe...@googlegroups.com
On 17/11/2023 06.37, [EXT] Jonas Swiatek wrote:
> Hi!
>
> I've got mr12.0.1 running in Docker on Debian Bookworm, having
> recently upgraded from mr9.5.6 on Debian Stretch (also docker).
>
> I'm running into an issue where RTP Engine simply stops responding to
> http requests (/ping), and also doesn't respond to any ng commands
> sent over websocket.
>
> I don't use the UDP Interface for control, so I don't know if this is
> also unresponsive. As far as I can tell rtp engine is still processing
> rtp for already established sessions.
>
> There's no logs being printed that indicates any error - it simply
> just stops printing the usual lines:
>
> INFO: [http] HTTP GET from 172.29.37.7:9682: '/ping'
> INFO: [control] Received command 'ping' from 172.29.37.7:61970
> INFO: [control] Replying to 'ping' from 172.29.37.7:61970 (elapsed
> time 0.000001 sec)
>
> These lines simply stops being emitted, and all the surrounding
> systems show that http connections don't establish, and existing
> Websocket connections no longer receive replies to commands sent.
>
> Any tips on how I can debug this issue further, or any idea what could
> be the root cause?

Once you have rtpengine in that state, I would suggest to attach a gdb
to the process, and then post the output of `thread apply all bt`

This should reveal if there are any deadlocks involved. Make sure you
have debug packages installed if needed before doing that.

Cheers

Reply all
Reply to author
Forward
0 new messages