Error allocating media ports

410 views
Skip to first unread message

Sergiu Pojoga

unread,
Apr 4, 2024, 10:16:51 AM4/4/24
to Sipwise rtpengine
Hi,

Could use some help chasing down this problem that's ravaging our systems for a good while now. 

Two identical rtpengine nodes, bare metal, Ubuntu 20.04. 
Receiving commands from several SIP proxies with various purposes. 
With redis backend.
Happens sporadically once in a while, you start seeing these errors in logs. 
What's interesting - on both nodes around the same time.

Apr  3 09:09:34 sbc2 rtpengine[1692545]: INFO: [59c4d833681dd8225069e3a44a1b0a20]: [control] Received command 'offer' from 10.22.0.222:50326
Apr  3 09:09:34 sbc2 rtpengine[1692545]: NOTICE: [59c4d833681dd8225069e3a44a1b0a20]: [core] Creating new call
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failure while trying to bind a port to the socket
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failure while trying to bind a port to the socket
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on interface 1.1.1.1 for media relay (last error: Success)
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on all locals of logical 'pub'
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Error allocating media ports
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on interface 1.1.1.1 for media relay (last error: Success)
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Failed to get 2 consecutive ports on all locals of logical 'pub'
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Destroying call
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Error allocating media ports
Apr  3 09:09:34 sbc2 rtpengine[1692545]: ERR: [59c4d833681dd8225069e3a44a1b0a20]: [core] Destroying call
Apr  3 09:09:34 sbc2 rtpengine[1692545]: INFO: [59c4d833681dd8225069e3a44a1b0a20]: [core] Final packet stats:
Apr  3 09:09:34 sbc2 rtpengine[1692545]: WARNING: [59c4d833681dd8225069e3a44a1b0a20]: [control] Protocol error in packet from 10.22.0.222:50326: Ran out of ports [d8:supportsl10:load limite3:sdp230:v=0
Apr  3 09:09:34 sbc2 rtpengine[1692545]: WARNING: [59c4d833681dd8225069e3a44a1b0a20]: ... emuxe4:SDESl28:only-AES_CM_128_HMAC_SHA1_803:pade7:call-id59:59c4d833681dd8225069e3a44a1b0a2013:received-froml3:IP411:10.22.0.103e8:from-tag10:as649c0f627:command5:offere]

The RTP range is set to 10000-20000 on each node. 
Pub/Priv interfaces each. Always runs out of ports on Pub iface only.

# rtpengine -v
Version: 11.5.2.0+0~mr11.5.2.0 git-mr11.5-55ff3bdb

What we checked:
- made sure all offers are properly ended by sending a delete from the SIP proxy
 Total managed sessions                          :2922
 Total rejected sessions                         :0
 Total timed-out sessions via TIMEOUT            :7
 Total timed-out sessions via SILENT_TIMEOUT     :0
 Total timed-out sessions via FINAL_TIMEOUT      :0
 Total timed-out sessions via OFFER_TIMEOUT      :0
 Total regular terminated sessions               :2915

- when it happens, rtpengine reports some 100+ active sessions, realistic load (via rtpengine-ctl list sessions all)
- netstat also reports under 1K UDP ports in use

What else can I check or do about it? Much obliged.

Sergiu Pojoga

unread,
Apr 4, 2024, 10:36:50 AM4/4/24
to Sipwise rtpengine
To add a few things:

- reduced delete delay to 5 seconds from default 30 (via SIP proxy's rtpengine_delete("delete-delay=5") to speed clean-ups
- kernelized
- no limits in terms of load of any kind, full config below

log-level-core = 6
log-level-spandsp = 6
log-level-ffmpeg = 6
log-level-transcoding = 6
log-level-codec = 6
log-level-rtcp = 6
log-level-ice = 6
log-level-crypto = 6
log-level-srtp = 7
log-level-internals = -1
log-level-http = 6
log-level-control = 6
log-level-dtx = 6
table = 0
max-sessions = -1
timeout = 60
silent-timeout = 3600
final-timeout = 0
offer-timeout = 3600
delete-delay = 30
redis-expires = 86400
tos = 184
control-tos = 184
graphite-interval = 0
redis-num-threads = 4
homer-protocol = 2
homer-id = 2003
no-fallback = 1
port-min = 10000
port-max = 20000
redis-db = 5
redis-write-db = -1
no-redis-required = 1
num-threads = 8
xmlrpc-format = 0
log_format = 0
redis_allowed_errors = -1
redis_disable_time = 10
redis_cmd_timeout = 0
redis_connect_timeout = 1000
max-cpu = 0.0
max-load = 0.00
max-bw = 0

interface[0] = pub\1.1.1.1
interface[1] = priv\10.22.0.20
b2b_url = (null)
redis-auth = (null)
redis-write-auth = (null)
recording-dir = /var/spool/rtpengine
recording-method = proc
recording-format = raw
iptables-chain = (null)
listen-ng = 0.0.0.0:2222
listen-ng = [::]:2222
listen-cli = 127.0.0.1:2224

Richard Fuchs

unread,
Apr 4, 2024, 10:51:21 AM4/4/24
to rtpe...@googlegroups.com
On 04/04/2024 10.16, Sergiu Pojoga wrote:
> The RTP range is set to 10000-20000 on each node.
> Pub/Priv interfaces each. Always runs out of ports on Pub iface only.

What does `rtpengine-ctl list interfaces` say?

The number of used ports is also exported via Prometheus or JSON, you
can add that to your monitoring.

Depending on how you start rtpengine, there is also the possibility of
the FD resource limit kicking it (`ulimit -n` etc).

Cheers

Sergiu Pojoga

unread,
Apr 4, 2024, 11:18:54 AM4/4/24
to Sipwise rtpengine
Thanks for the suggestions. Good to know commands indeed.

Interface 'pub' address '1.1.1.1' (IPv4)
 Port range: 10000 - 20000
 Ports used:   266 / 10001 (  2.7%)
Interface 'priv' address '10.22.0.10' (IPv4)
 Port range: 10000 - 20000
 Ports used:   101 / 10001 (  1.0%)

Rtpengine is handled by SystemD
[Service]
EnvironmentFile=/etc/default/rtpengine-daemon
RuntimeDirectory=rtpengine
PIDFile=/run/rtpengine/rtpengine-daemon.pid
User=rtpengine
Group=rtpengine
LimitNOFILE=150000

rtpengine@sbc1:/tmp$ prlimit -p $(</run/rtpengine/rtpengine-daemon.pid)
RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                 unlimited unlimited bytes
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited bytes
LOCKS      max number of file locks held      unlimited unlimited locks
MEMLOCK    max locked-in-memory address space   8388608   8388608 bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0
NOFILE     max number of open files              150000    150000 files
NPROC      max number of processes               128032    128032 processes
RSS        max resident set size              unlimited unlimited bytes
RTPRIO     max real-time priority                     0         0
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals         128032    128032 signals
STACK      max stack size                       8388608 unlimited bytes

root@sbc1:/tmp# runuser -u rtpengine -- bash
rtpengine@sbc1:/tmp$ ulimit -Hn
1048576
rtpengine@sbc1:/tmp$ ulimit -Sn
1024

Anything abnormal?

Richard Fuchs

unread,
Apr 4, 2024, 12:56:32 PM4/4/24
to rtpe...@googlegroups.com
On 04/04/2024 11.18, Sergiu Pojoga wrote:
> Anything abnormal?

No, that all looks good. I assume you've already checked that your
address 1.1.1.1 is actually bound continuously and that no other
services are infringing on that port range?

Enable debug logging at least for the core subsystem, and also for
internals if you can live with the added I/O, and see if the log reveals
anything.

Last resort is to attach an strace to rtpengine and try to catch the
failure in action.

Cheers


Sergiu Pojoga

unread,
Apr 4, 2024, 2:53:32 PM4/4/24
to Sipwise rtpengine
OK so now we are getting to the truly interesting part. I didn't mention it intentionally until now, not because I meant to waste your time but to avoid jumping into conclusions like "well... clearly it's because of that".

Yes, there is another process sharing the same UDP port ranges. It's Asterisk. Essentially, the two nodes run co-located with both RTPengine and Asterisk (don't ask me why). Same port ranges 10000-20000. I did try to separate the RTP ranges into like 10000-14000 for Asterisk and 14001-20000 for Rtpengine: it didn't solve this problem.

The thing is - it runs in this setup for weeks at a time, until it starts acting up like that. So I'm not convinced the co-location itself is the root problem. Rtpengine, as well as Asterisk I guess, must have a mechanism of selecting an available UDP port from the Linux kernel, they don't just randomly choose something that may be in use by the OS. Or am I wrong?

In your experience, is such a setup strictly prohibited/not recommended and bound to experience issues like this?

Thanks.

Richard Fuchs

unread,
Apr 4, 2024, 3:44:55 PM4/4/24
to rtpe...@googlegroups.com
On 04/04/2024 14.53, Sergiu Pojoga wrote:
> OK so now we are getting to the truly interesting part. I didn't
> mention it intentionally until now, not because I meant to waste your
> time but to avoid jumping into conclusions like "well... clearly it's
> because of that".
And in today's episode of "things that didn't age well" ... 🙃
> Yes, there is another process sharing the same UDP port ranges. It's
> Asterisk. Essentially, the two nodes run co-located with both
> RTPengine and Asterisk (don't ask me why). Same port ranges
> 10000-20000. I did try to separate the RTP ranges into like
> 10000-14000 for Asterisk and 14001-20000 for Rtpengine: it didn't
> solve this problem.
>
> The thing is - it runs in this setup for weeks at a time, until it
> starts acting up like that. So I'm not convinced the co-location
> itself is the root problem. Rtpengine, as well as Asterisk I guess,
> must have a mechanism of selecting an available UDP port from the
> Linux kernel, they don't just randomly choose something that may be in
> use by the OS. Or am I wrong?

Well, no, but yes.

You can request a random free port from the kernel, but then it really
is random (or rather, not predictable), and won't be within a given port
range (except the port range defined by the kernel). Even worse, the
usual RTP+RTCP combination requires the RTP port to be even-numbered,
and RTCP to be RTP +1. So the ports definitely can't be random.

The only other option is to request one specific port, and for that
purpose each process maintains its own free-list of which ports are
available. Now when the process requests a port that is in use by
another process, one of two things can happen, depending on which
options are in use. The request succeeds, and you end up with two
processes having the same port open. Obviously that is to be avoided.
The other thing that can happen is that the request fails, and the
process has to try again with a different port. Depending on how the
code is written, the process might try this only a certain number of
times before giving up.

So a shared port range is not a good idea at all. I don't know about
Asterisk but rtpengine expects to have the configured port range
available for its own use and doesn't have any provisions about checking
whether a port is in use by something else before requesting it from the
kernel.

If you still see errors with exclusive port ranges, then you need to
look at what is happening when it's happening. Increase log level, check
interface stats, attach strace if necessary.

Cheers

Sergiu Pojoga

unread,
Apr 4, 2024, 4:31:10 PM4/4/24
to rtpe...@googlegroups.com
Thanks for taking the time to explain all that.

I'll change the configs back to having non-overlapping RTP port ranges for rtpengine and asterisk.

In the meantime - hooked both nodes to Grafana, watching for ports in use vs free and other goodies.

I'm afraid you'll hear from me again should the problem resurface.

--
You received this message because you are subscribed to the Google Groups "Sipwise rtpengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtpengine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rtpengine/367b8809-4660-408d-9564-154e592947e3%40sipwise.com.
Reply all
Reply to author
Forward
0 new messages