Error allocating media ports

Sergiu Pojoga

unread,

Apr 4, 2024, 10:16:51 AM4/4/24

to Sipwise rtpengine

Hi,

Could use some help chasing down this problem that's ravaging our systems for a good while now.

Two identical rtpengine nodes, bare metal, Ubuntu 20.04.

Receiving commands from several SIP proxies with various purposes.

With redis backend.
Happens sporadically once in a while, you start seeing these errors in logs.

What's interesting - on both nodes around the same time.

Apr 3 09:09:34 sbc2 rtpengine[1692545]: INFO: [59c4d833681dd8225069e3a44a1b0a20]: [control] Received command 'offer' from 10.22.0.222:50326

Apr 3 09:09:34 sbc2 rtpengine[1692545]: NOTICE: [59c4d833681dd8225069e3a44a1b0a20]: [core] Creating new call
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failure while trying to bind a port to the socket
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failure while trying to bind a port to the socket
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on interface 1.1.1.1 for media relay (last error: Success)
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on all locals of logical 'pub'
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Error allocating media ports
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on interface 1.1.1.1 for media relay (last error: Success)
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on all locals of logical 'pub'
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Destroying call
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Error allocating media ports
Apr 3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Destroying call
Apr 3 09:09:34 sbc2 rtpengine[1692545]: INFO: [59c4d833681dd8225069e3a44a1b0a20]: [core] Final packet stats:
Apr 3 09:09:34 sbc2 rtpengine[1692545]: WARNING: [59c4d833681dd8225069e3a44a1b0a20]: [control] Protocol error in packet from 10.22.0.222:50326: Ran out of ports [d8:supportsl10:load limite3:sdp230:v=0
Apr 3 09:09:34 sbc2 rtpengine[1692545]: WARNING: [59c4d833681dd8225069e3a44a1b0a20]: ... emuxe4:SDESl28:only-AES_CM_128_HMAC_SHA1_803:pade7:call-id59:59c4d833681dd8225069e3a44a1b0a2013:received-froml3:IP411:10.22.0.103e8:from-tag10:as649c0f627:command5:offere]

The RTP range is set to 10000-20000 on each node.
Pub/Priv interfaces each. Always runs out of ports on Pub iface only.

# rtpengine -v
Version: 11.5.2.0+0~mr11.5.2.0 git-mr11.5-55ff3bdb

What we checked:
- made sure all offers are properly ended by sending a delete from the SIP proxy
Total managed sessions :2922
Total rejected sessions :0
Total timed-out sessions via TIMEOUT :7
Total timed-out sessions via SILENT_TIMEOUT :0
Total timed-out sessions via FINAL_TIMEOUT :0
Total timed-out sessions via OFFER_TIMEOUT :0
Total regular terminated sessions :2915

- when it happens, rtpengine reports some 100+ active sessions, realistic load (via rtpengine-ctl list sessions all)
- netstat also reports under 1K UDP ports in use

What else can I check or do about it? Much obliged.

Sergiu Pojoga

unread,

Apr 4, 2024, 10:36:50 AM4/4/24

to Sipwise rtpengine

To add a few things:

- reduced delete delay to 5 seconds from default 30 (via SIP proxy's rtpengine_delete("delete-delay=5") to speed clean-ups
- kernelized
- no limits in terms of load of any kind, full config below

log-level-core = 6
log-level-spandsp = 6
log-level-ffmpeg = 6
log-level-transcoding = 6
log-level-codec = 6
log-level-rtcp = 6
log-level-ice = 6
log-level-crypto = 6
log-level-srtp = 7
log-level-internals = -1
log-level-http = 6
log-level-control = 6
log-level-dtx = 6
table = 0
max-sessions = -1
timeout = 60
silent-timeout = 3600
final-timeout = 0
offer-timeout = 3600
delete-delay = 30
redis-expires = 86400
tos = 184
control-tos = 184
graphite-interval = 0
redis-num-threads = 4
homer-protocol = 2
homer-id = 2003
no-fallback = 1
port-min = 10000
port-max = 20000
redis-db = 5
redis-write-db = -1
no-redis-required = 1
num-threads = 8
xmlrpc-format = 0
log_format = 0
redis_allowed_errors = -1
redis_disable_time = 10
redis_cmd_timeout = 0
redis_connect_timeout = 1000
max-cpu = 0.0
max-load = 0.00
max-bw = 0
interface[0] = pub\1.1.1.1
interface[1] = priv\10.22.0.20
b2b_url = (null)
redis-auth = (null)
redis-write-auth = (null)
recording-dir = /var/spool/rtpengine
recording-method = proc
recording-format = raw
iptables-chain = (null)
listen-ng = 0.0.0.0:2222
listen-ng = [::]:2222
listen-cli = 127.0.0.1:2224

Richard Fuchs

unread,

Apr 4, 2024, 10:51:21 AM4/4/24

to rtpe...@googlegroups.com

On 04/04/2024 10.16, Sergiu Pojoga wrote:
> The RTP range is set to 10000-20000 on each node.
> Pub/Priv interfaces each. Always runs out of ports on Pub iface only.

What does `rtpengine-ctl list interfaces` say?

The number of used ports is also exported via Prometheus or JSON, you
can add that to your monitoring.

Depending on how you start rtpengine, there is also the possibility of
the FD resource limit kicking it (`ulimit -n` etc).

Cheers

Sergiu Pojoga

unread,

Apr 4, 2024, 11:18:54 AM4/4/24

to Sipwise rtpengine

Thanks for the suggestions. Good to know commands indeed.

Interface 'pub' address '1.1.1.1' (IPv4)
Port range: 10000 - 20000
Ports used: 266 / 10001 ( 2.7%)
Interface 'priv' address '10.22.0.10' (IPv4)
Port range: 10000 - 20000
Ports used: 101 / 10001 ( 1.0%)

Rtpengine is handled by SystemD
[Service]
EnvironmentFile=/etc/default/rtpengine-daemon
RuntimeDirectory=rtpengine
PIDFile=/run/rtpengine/rtpengine-daemon.pid
User=rtpengine
Group=rtpengine
LimitNOFILE=150000

rtpengine@sbc1:/tmp$ prlimit -p $(</run/rtpengine/rtpengine-daemon.pid)
RESOURCE DESCRIPTION SOFT HARD UNITS
AS address space limit unlimited unlimited bytes
CORE max core file size unlimited unlimited bytes
CPU CPU time unlimited unlimited seconds
DATA max data size unlimited unlimited bytes
FSIZE max file size unlimited unlimited bytes
LOCKS max number of file locks held unlimited unlimited locks
MEMLOCK max locked-in-memory address space 8388608 8388608 bytes
MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes
NICE max nice prio allowed to raise 0 0
NOFILE max number of open files 150000 150000 files
NPROC max number of processes 128032 128032 processes
RSS max resident set size unlimited unlimited bytes
RTPRIO max real-time priority 0 0
RTTIME timeout for real-time tasks unlimited unlimited microsecs
SIGPENDING max number of pending signals 128032 128032 signals
STACK max stack size 8388608 unlimited bytes

root@sbc1:/tmp# runuser -u rtpengine -- bash
rtpengine@sbc1:/tmp$ ulimit -Hn
1048576
rtpengine@sbc1:/tmp$ ulimit -Sn
1024

Anything abnormal?

Richard Fuchs

unread,

Apr 4, 2024, 12:56:32 PM4/4/24

to rtpe...@googlegroups.com

On 04/04/2024 11.18, Sergiu Pojoga wrote:
> Anything abnormal?

No, that all looks good. I assume you've already checked that your
address 1.1.1.1 is actually bound continuously and that no other
services are infringing on that port range?

Enable debug logging at least for the core subsystem, and also for
internals if you can live with the added I/O, and see if the log reveals
anything.

Last resort is to attach an strace to rtpengine and try to catch the
failure in action.

Cheers

Sergiu Pojoga

unread,

Apr 4, 2024, 2:53:32 PM4/4/24

to Sipwise rtpengine

OK so now we are getting to the truly interesting part. I didn't mention it intentionally until now, not because I meant to waste your time but to avoid jumping into conclusions like "well... clearly it's because of that".

Yes, there is another process sharing the same UDP port ranges. It's Asterisk. Essentially, the two nodes run co-located with both RTPengine and Asterisk (don't ask me why). Same port ranges 10000-20000. I did try to separate the RTP ranges into like 10000-14000 for Asterisk and 14001-20000 for Rtpengine: it didn't solve this problem.

The thing is - it runs in this setup for weeks at a time, until it starts acting up like that. So I'm not convinced the co-location itself is the root problem. Rtpengine, as well as Asterisk I guess, must have a mechanism of selecting an available UDP port from the Linux kernel, they don't just randomly choose something that may be in use by the OS. Or am I wrong?

In your experience, is such a setup strictly prohibited/not recommended and bound to experience issues like this?

Thanks.

Richard Fuchs

unread,

Apr 4, 2024, 3:44:55 PM4/4/24

to rtpe...@googlegroups.com

On 04/04/2024 14.53, Sergiu Pojoga wrote:
> OK so now we are getting to the truly interesting part. I didn't
> mention it intentionally until now, not because I meant to waste your
> time but to avoid jumping into conclusions like "well... clearly it's
> because of that".

And in today's episode of "things that didn't age well" ... 🙃

> Yes, there is another process sharing the same UDP port ranges. It's
> Asterisk. Essentially, the two nodes run co-located with both
> RTPengine and Asterisk (don't ask me why). Same port ranges
> 10000-20000. I did try to separate the RTP ranges into like
> 10000-14000 for Asterisk and 14001-20000 for Rtpengine: it didn't
> solve this problem.
>
> The thing is - it runs in this setup for weeks at a time, until it
> starts acting up like that. So I'm not convinced the co-location
> itself is the root problem. Rtpengine, as well as Asterisk I guess,
> must have a mechanism of selecting an available UDP port from the
> Linux kernel, they don't just randomly choose something that may be in
> use by the OS. Or am I wrong?

Well, no, but yes.

You can request a random free port from the kernel, but then it really
is random (or rather, not predictable), and won't be within a given port
range (except the port range defined by the kernel). Even worse, the
usual RTP+RTCP combination requires the RTP port to be even-numbered,
and RTCP to be RTP +1. So the ports definitely can't be random.

The only other option is to request one specific port, and for that
purpose each process maintains its own free-list of which ports are
available. Now when the process requests a port that is in use by
another process, one of two things can happen, depending on which
options are in use. The request succeeds, and you end up with two
processes having the same port open. Obviously that is to be avoided.
The other thing that can happen is that the request fails, and the
process has to try again with a different port. Depending on how the
code is written, the process might try this only a certain number of
times before giving up.

So a shared port range is not a good idea at all. I don't know about
Asterisk but rtpengine expects to have the configured port range
available for its own use and doesn't have any provisions about checking
whether a port is in use by something else before requesting it from the
kernel.

If you still see errors with exclusive port ranges, then you need to
look at what is happening when it's happening. Increase log level, check
interface stats, attach strace if necessary.

Cheers

Sergiu Pojoga

unread,

Apr 4, 2024, 4:31:10 PM4/4/24

to rtpe...@googlegroups.com

Thanks for taking the time to explain all that.

I'll change the configs back to having non-overlapping RTP port ranges for rtpengine and asterisk.

In the meantime - hooked both nodes to Grafana, watching for ports in use vs free and other goodies.

I'm afraid you'll hear from me again should the problem resurface.

--
You received this message because you are subscribed to the Google Groups "Sipwise rtpengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtpengine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rtpengine/367b8809-4660-408d-9564-154e592947e3%40sipwise.com.

Reply all

Reply to author

Forward