On 04/04/2024 14.53, Sergiu Pojoga wrote:
> OK so now we are getting to the truly interesting part. I didn't
> mention it intentionally until now, not because I meant to waste your
> time but to avoid jumping into conclusions like "well... clearly it's
> because of that".
And in today's episode of "things that didn't age well" ... 🙃
> Yes, there is another process sharing the same UDP port ranges. It's
> Asterisk. Essentially, the two nodes run co-located with both
> RTPengine and Asterisk (don't ask me why). Same port ranges
> 10000-20000. I did try to separate the RTP ranges into like
> 10000-14000 for Asterisk and 14001-20000 for Rtpengine: it didn't
> solve this problem.
>
> The thing is - it runs in this setup for weeks at a time, until it
> starts acting up like that. So I'm not convinced the co-location
> itself is the root problem. Rtpengine, as well as Asterisk I guess,
> must have a mechanism of selecting an available UDP port from the
> Linux kernel, they don't just randomly choose something that may be in
> use by the OS. Or am I wrong?
Well, no, but yes.
You can request a random free port from the kernel, but then it really
is random (or rather, not predictable), and won't be within a given port
range (except the port range defined by the kernel). Even worse, the
usual RTP+RTCP combination requires the RTP port to be even-numbered,
and RTCP to be RTP +1. So the ports definitely can't be random.
The only other option is to request one specific port, and for that
purpose each process maintains its own free-list of which ports are
available. Now when the process requests a port that is in use by
another process, one of two things can happen, depending on which
options are in use. The request succeeds, and you end up with two
processes having the same port open. Obviously that is to be avoided.
The other thing that can happen is that the request fails, and the
process has to try again with a different port. Depending on how the
code is written, the process might try this only a certain number of
times before giving up.
So a shared port range is not a good idea at all. I don't know about
Asterisk but rtpengine expects to have the configured port range
available for its own use and doesn't have any provisions about checking
whether a port is in use by something else before requesting it from the
kernel.
If you still see errors with exclusive port ranges, then you need to
look at what is happening when it's happening. Increase log level, check
interface stats, attach strace if necessary.
Cheers