mr26.0.1.4: Failed to pin shared kernel memory: Argument list too long

83 views
Skip to first unread message

George Diamantopoulos

unread,
May 6, 2026, 6:57:53 AM (13 days ago) May 6
to Sipwise rtpengine
Hello all,

I had an incident yesterday where both rtpengine instances in a pool stopped working within a 5-minute period.

By "stopped working" I mean kamailio printed the following in the logs:

rtpengine [rtpengine.c:3181]: rtpp_function_call(): no available proxies

And both rtpengine-daemon instances printed:

CRIT: [xxxxx-xxx-xxx port yyyyy]: [core] Failed to pin shared kernel memory: Argument list too long

However, after the issue manifested, I was able to make calls with audio successfully, while others reported failures, so I'm not sure what was happening. The issue was resolved by restarting both rtpengine instances.

I'm using packages from https://dfx.at/rtpengine on debian trixie, amd64.

Unfortunately I don't have much more information to offer. The 5 minute window during which both instances were affected makes me think that either some traffic with a specific characteristic cause the issue to emerge, or that it's a total-traffic-handled-since-last-restart issue since both instances were last brought online within a couple of hours and receive about 50% of the traffic each.

Do you think there is enough information there to open an issue on github? What other information could I collect if the issue appears again?

Thanks!
George

Richard Fuchs

unread,
May 6, 2026, 7:45:07 AM (13 days ago) May 6
to rtpe...@googlegroups.com
On 06/05/2026 06.57, George Diamantopoulos wrote:
> CRIT: [xxxxx-xxx-xxx port yyyyy]: [core] Failed to pin shared kernel
> memory: Argument list too long

Luckily that is a very unique error code and appears only once in the
module. It happens when `MAX_SHM_AREAS` is exceeded, which in this
version is hard coded to 16.

You can inspect the current number by looking at
`/proc/rtpengine/x/status` under "Memory pins". This is part of a memory
pool and the total size is reported as "Memory:" in the same file.

The memory pool is typically recycled over the lifetime of rtpengine, so
either you have an unusual workload for it to run full, or there is a
memory leak somewhere.

> Do you think there is enough information there to open an issue on
> github? What other information could I collect if the issue appears again?

You can always open an issue on GH if something is a problem with the
code, and this sure looks like it. But it will require more
investigation to find out what exactly lead to this.

Cheers

George Diamantopoulos

unread,
May 11, 2026, 6:24:06 AM (8 days ago) May 11
to Sipwise rtpengine
Hello again,

Thanks for the input. It happened again today, and here's my thoughts on it.

First of all, the 'Memory:' and 'Memory pins:' metrics never seem to get freed, even during the weekends or during night time when traffic is abysmally low. Is this expected, i.e. allocations are kept forever? Here's some graphs, unfortunately I didn't monitor these values with earlier versions so I'm not sure if this is new behaviour or not:

chart-mem-pins.png

Drop towards the end is due to restart when the issue manifested about an hour ago. Similarly for the 'Memory:' stats:

chart-mem.png
Unfortunately, I haven't been able to definitively find any patterns in the traffic that would explain this. I was previously on bookworm/rtpengine-daemon:12.5.1.33-1~bpo12+1 and the issue manifested after the upgrade to trixie/rtpengine-daemon:26.0.1.4-1~bpo13+1

I've been reviewing my kamailio config and I've noticed I've been calling rtpengine_delete() with a lot of unsupported parameters. I'll try fixing these as a first step and see if it makes a difference, although I understand it's unlikely to be the culprit here.

BR,
George

Richard Fuchs

unread,
May 11, 2026, 7:22:32 AM (8 days ago) May 11
to rtpe...@googlegroups.com
On 11/05/2026 06.24, George Diamantopoulos wrote:
> Thanks for the input. It happened again today, and here's my thoughts
> on it.
>
> First of all, the 'Memory:' and 'Memory pins:' metrics never seem to
> get freed, even during the weekends or during night time when traffic
> is abysmally low. Is this expected, i.e. allocations are kept forever?
> Here's some graphs, unfortunately I didn't monitor these values with
> earlier versions so I'm not sure if this is new behaviour or not:

That is in fact expected, and earlier versions (at least 12.5) simply
didn't employ this kind of allocator.

It's a simple bump allocator and usage is tracked internally through
reference counts. Individual memory pools ("pins") are reused when
references drop to zero. So with normal usage, you should never need
more than 2 or 3 pools. The exception would be if you have some very
long running calls (i.e. calls that never terminate) and are victim of
memory fragmentation.

Unless there is a reference leak somewhere of course.

I'll add the internal pool usage to the stats output to hopefully be
able to track down the leak if there is one.

> I've been reviewing my kamailio config and I've noticed I've been
> calling rtpengine_delete() with a lot of unsupported parameters. I'll
> try fixing these as a first step and see if it makes a difference,
> although I understand it's unlikely to be the culprit here.

Probably won't make a difference unless you notice that calls get stuck
and never terminate.

Cheers

George Diamantopoulos

unread,
May 11, 2026, 8:05:45 AM (8 days ago) May 11
to Sipwise rtpengine
Thanks for the follow-up. If I'm monitoring "Current Sessions (own)" ($..currentstatistics.sessionsown.first()), and the graph shows significant drops during low traffic, doesn't that exclude the possibility of long-running calls/calls which never terminate in your opinion? Here's a graph for the same time period, only for the rtpengine instances which have exhibited the memory exhaustion so far:

sessions-own.png

BR,
George

Richard Fuchs

unread,
May 11, 2026, 8:25:00 AM (8 days ago) May 11
to rtpe...@googlegroups.com
On 11/05/2026 08.05, George Diamantopoulos wrote:
> Thanks for the follow-up. If I'm monitoring "Current Sessions (own)"
> ($..currentstatistics.sessionsown.first()), and the graph shows
> significant drops during low traffic, doesn't that exclude the
> possibility of long-running calls/calls which never terminate in your
> opinion?

Well, the calls simply dropping to a low number wouldn't do that, but
your graph shows "min: 0" and if that's accurate, then that does indeed
rule out that possibility.

Cheers

Richard Fuchs

unread,
May 11, 2026, 10:09:43 AM (8 days ago) May 11
to rtpe...@googlegroups.com
On 11/05/2026 07.22, Richard Fuchs wrote:
> I'll add the internal pool usage to the stats output to hopefully be
> able to track down the leak if there is one.

This adds stats for the mem pools to the JSON output:
https://github.com/sipwise/rtpengine/commit/0faa777e172c0c6439201b13c9da16a939989e06

A quick test shows no leaks for basic call scenarios.

Let me know if you need help applying that to whatever version you're
running.

Cheers

George Diamantopoulos

unread,
May 11, 2026, 8:07:54 PM (8 days ago) May 11
to Sipwise rtpengine
Thanks for the patch! Do you think you could provide deb-src packages for https://dfx.at/rtpengine/26.0/pool/main/r/rtpengine? I'm not sure how the non-ngcp packages are configured and packaged. Thanks.

Cheers,
George

George Diamantopoulos

unread,
May 12, 2026, 6:32:41 AM (7 days ago) May 12
to Sipwise rtpengine
Never mind, the sources are there, I just hadn't added the deb-src repo type. I'll try building the package and come back with any findings.

George Diamantopoulos

unread,
May 13, 2026, 5:40:59 AM (6 days ago) May 13
to Sipwise rtpengine
Hello again,

I applied the patch against mr26.0.1.5, and here's what it looks like:

1. For each shard, references show a random-walk like behaviour for short periods, but when zooming out they monotonously increase up to a maximum value. After that, there's a large drop, from which there's no recovery. A new shard is introduced at around the same time as the large drop:
refs-stats.png
2. The used statistic shows rapid monotonous increase shortly after the creation of the shard, up to a maximum. Some shards stay in maximum use for a while and then drop to zero, and the monotonous increase starts over again. For others, there's no drop, but perhaps this is in their future:
used-stats.png
3. The size metric is simple. It is 0 before the shard's creation, then immediately goes to maximum right after:


size-stats.png
Is this how you expect allocations to behave? The refs distribution seems a little suspicious to me. Let me know if there's any other information I can gather, or if you'd like me to open an issue on github for this. Thanks!

BR,
George

George Diamantopoulos

unread,
May 13, 2026, 6:16:12 AM (6 days ago) May 13
to Sipwise rtpengine
I was wrong before, in that it appears that references do increase after a while in a similar fashion after they've been to 0 for a while. Here's a graph with a single shard's references behaviour for reference:
refs-single.png

George Diamantopoulos

unread,
May 13, 2026, 6:47:18 AM (6 days ago) May 13
to Sipwise rtpengine
I also did the thing and asked claude about this. I'm not betting a lot on it being right, but here's what it thinks after I've fed it with my findings...

Given all findings discussed so far, the current codebase and the error logged when the issue manifests: 
 * what is the most likely trigger for the issue? 
 * where to go from here in terms of dissection?

Most likely trigger
Pulling everything together, the failure mode is structurally: 

A monotonic growth of the bufferpool's shard count, driven by concurrent-allocation contention against a lock-free bufferpool_make_shard, terminating against the kernel module's hard MAX_SHM_AREAS=16 ceiling.

Why this rather than the alternatives: 
- It's not a refcount imbalance — your per-shard patch demonstrated individual shards complete fill/drain/recycle cycles. The __free_* paths are doing their job. 
- It's not the via-branch=auto or failure-route cleanup angle in kamailio — the docs and config audit show the SIP-side lifecycle is clean for the sbcprov path. 
- It's not the DTX random-seq bug — rtpengine isn't transcoding for you; that codepath is dormant. The seq jumps you saw are an Asterisk/peer-side artefact, unrelated to memory. 
- It's the only mechanism left that produces exactly this curve: shards recycle correctly, pin count climbs only at discrete events, plateaus in between, never decreases, and abruptly aborts at 16. 

The smoking gun is the structure of bufferpool_make_shard itself (lib/bufferpool.c:143-203): bufferpool_new_shard invokes bp->alloc() (→ kernel_allockernel_pin_memory) before the CAS that inserts the new shard into the pool. Multiple concurrent allocators that simultaneously find "no shard has room" each call bufferpool_make_shard, each pin 16 MB in the kernel, and then each succeed at inserting into a different slot. The losing thread's shard isn't discarded — it joins the pool, contributing +1 to nshms permanently. There is no path to give a shard back to the kernel. 

The acceleration "rare last week, 2–3 times per day since yesterday" is the part that's hardest to attribute without more data, but it's consistent with this trigger because the failure mode is inherently bursty: a single moment of concurrent fill-cliff against n worker threads adds up to n-1 phantom shards. Whatever changed yesterday made that moment more likely — could be a load step (more concurrent calls), a traffic-shape change (more allocation-heavy calls coinciding), or a code path that synchronises allocators (e.g., a periodic timer that touches all packet streams at once). 

The MT#55283 commit cluster is consistent with this being a recently-introduced regression: bcbe751d MT#55283 make bufferpool (mostly) lock-free, the "delegate shm refcount to table" series, and 847d0043 revamp kernel shm strategy together rewrote both the shard accounting and the kernel-pin path while making the hot path lock-free — exactly the sequence of changes that would surface this kind of issue. 

Where to go from here

In order of actionability:

1. Confirm or falsify the race hypothesis directly. One patch, three counters in bufferpool_make_shard
 static struct bpool_shard *bufferpool_make_shard(struct bufferpool *bp) {                             
     // ── instrument before pinning ──
     unsigned int snap_num = atomic_get_na(&bp->num_shards);
     unsigned int n_full = 0, n_has_space = 0;
     for (unsigned int i = 0; i < snap_num; i++) {
         struct bpool_shard *s = atomic_get_na(&bp->shards[i]);                                                   
         if (atomic_get(&s->full)) n_full++;
         else n_has_space++;
     }
                                                                                                                                                                          
     ilog(LOG_WARN, "bufferpool_make_shard: num=%u full=%u space=%u tid=%lu",                       
          snap_num, n_full, n_has_space, pthread_self());
     // ── end instrumentation ──

     struct bpool_shard *shard = bufferpool_new_shard(bp);                                                             
     ...
 }

 Then watch the log lines around a pin-count jump:

 - If every make_shard call shows full == num and space == 0, every new shard was genuinely needed — the issue is real peak growth, not a race.
 - If you see multiple make_shard calls within a millisecond from different tids, and at least one of them shows space > 0, the race is firing.

 This one experiment cleanly decides whether the fix is operational (raise the limit) or code-level (fix the race).
                                                               
 2. If the race is real, the fix in bufferpool_make_shard is straightforward: re-check after pinning, before inserting. Or move the kernel pin to after the CAS reservation slot is claimed, so only one thread per "this slot is mine" actually pins. Both are small patches; the second is structurally cleaner. Worth proposing to upstream alongside the issue.

 3. Operational mitigation regardless of root cause. Two things to do now, since the abort is hitting 2–3× per day:
  - Watch Memory pins: from /proc/rtpengine/<id>/status on a 1-min cadence. Alert at 12. Plan a rolling rtpengine restart cron (every N hours, where N is conservative enough that even worst-case growth keeps you below 14).
  - Bump MAX_SHM_AREAS from 16 to 32 in kernel-module/nft_rtpengine.c:442 and rebuild the kernel module. This is a one-line patch that doubles your headroom and roughly doubles time-to-abort. It buys you days while you investigate, with negligible runtime cost (the array is per-table, sized once at table allocation).
                                                            
 4. Capture the workload at the moment of growth. Independent of the race vs. peak diagnosis, you want to know what allocation site is consuming a shard's 16 MB. A small patch adding a per-allocation site tag (one byte in the back-pointer header is enough — there are only six callsites of bufferpool_alloc* against shm_bufferpool) and a periodic histogram dump would tell you whether you're filling shards with stream stats, SSRC entries, or per-PT entries. That information narrows where, if at all, the per-call footprint could be reduced.

Richard Fuchs

unread,
May 13, 2026, 6:59:35 AM (6 days ago) May 13
to rtpe...@googlegroups.com
On 13/05/2026 06.16, George Diamantopoulos wrote:
> I was wrong before, in that it appears that references do increase
> after a while in a similar fashion after they've been to 0 for a
> while. Here's a graph with a single shard's references behaviour for
> reference:

That is in fact exactly what is supposed to happen. Do you see some
shards that show a drop in references, but to non-zero and that then
remain at non-zero without ever being reused?

Cheers

George Diamantopoulos

unread,
May 13, 2026, 7:37:33 AM (6 days ago) May 13
to Sipwise rtpengine
Yes, but I can't say if it will stay that way or refs will reach zero eventually because not enough time has passed, judging from previous fill/cliff/drain cycles. A safe exception is likely shard 0, which reached a low value of 4 refs sometime around 09:00 and hasn't been reused since (or dropped to zero, it's still at 4):

refs-non-zero.png
Here's the current values for all shards, but without historical information aside from value changes in the last 30 seconds:
refs-summary.pngCheers,
George

Alex Balashov

unread,
May 13, 2026, 7:46:39 AM (6 days ago) May 13
to rtpe...@googlegroups.com
Let's save the slop for another place and time. 

Sent from mobile, apologies for brevity and errors.

On May 13, 2026, at 6:47 AM, George Diamantopoulos <georg...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "Sipwise rtpengine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtpengine+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rtpengine/b11369b5-48ca-4c7e-a716-1bd9d3afe97en%40googlegroups.com.

Richard Fuchs

unread,
May 13, 2026, 8:02:47 AM (6 days ago) May 13
to rtpe...@googlegroups.com
On 13/05/2026 07.37, George Diamantopoulos wrote:
> Yes, but I can't say if it will stay that way or refs will reach zero
> eventually because not enough time has passed, judging from previous
> fill/cliff/drain cycles. A safe exception is likely shard 0, which
> reached a low value of 4 refs sometime around 09:00 and hasn't been
> reused since (or dropped to zero, it's still at 4):

Yes, that one is known and will see some improvement in the future.

Maybe it all is just a matter of memory fragmentation as you seem to be
a heavy user 😁

Will see how that can be improved, unless you can determine that some
shards (other than the first one) do in fact never see their references
go to zero.

Cheers

George Diamantopoulos

unread,
May 13, 2026, 8:21:07 AM (6 days ago) May 13
to Sipwise rtpengine
Got it, thanks. I'll upgrade all instances to my custom build with the bufferpool statistics patch so that I have statistics if/when the issue manifests again.

Would it be safe to increase MAX_SHM_AREAS to, say, 24 or 32?

Cheers,
George

Richard Fuchs

unread,
May 13, 2026, 8:42:49 AM (6 days ago) May 13
to rtpe...@googlegroups.com
On 13/05/2026 08.21, George Diamantopoulos wrote:
> Got it, thanks. I'll upgrade all instances to my custom build with the
> bufferpool statistics patch so that I have statistics if/when the
> issue manifests again.
>
> Would it be safe to increase MAX_SHM_AREAS to, say, 24 or 32?

Yes, shouldn't be a problem at all, that's why it's a define 😁

Cheers

Richard Fuchs

unread,
May 14, 2026, 1:10:26 PM (5 days ago) May 14
to rtpe...@googlegroups.com
On 13/05/2026 08.00, Richard Fuchs wrote:
> Will see how that can be improved

Git master now includes
https://github.com/sipwise/rtpengine/commit/42cd2e019879ee57bf8285be1e7c19b704e26edd
which changes the list of mappings to dynamically allocated. This means
there won't be a hard upper limit and it will grow as needed.

Feel free to test with it if you want.

There still could be a leak of course, so do keep an eye on how many
pins are active.

Cheeres

George Diamantopoulos

unread,
May 14, 2026, 2:50:37 PM (5 days ago) May 14
to Sipwise rtpengine
Thanks for tackling this. The issue hasn't manifested since yesterday, so I don't have any updates yet.

I'll upgrade to the next 26.0 release to include that patch if it's ever merged in that branch.

In the meanwhile, I'm thinking that my aggressive failover policy in kamailio for failing branches may be contributing to this, if it's not a memory leak, of which there is no hard evidence so far. It may be a long shot, but maybe the below conditions accumulatively trigger the problem:
* Low quality/high CPS traffic from wholesale clients or call centres (scanning phone number ranges or similar illicit behaviour)
* Downstream carriers replying erroneously with 5xx codes instead of 4xx/6xx for unallocated or invalid numbers
* Each 5xx causing a new branch receiving 5xx again, resulting in 2-8 failover branches per incoming INVITE
* delete-delay of 30 seconds applies globally, filling shards up quicly

If that's the case, it might help if I do a delete with delay=0 for those serial forking scenarios, since I do rtpengine_manage anew in the post-failure branch route again. I'll give it a shot if you think it's plausible.

Additionally, I did find one call with a duration of 2 days on one of the rtpengine instances yesterday (although not one that manifested the issue). Isn't there a universal timeout for sessions, after which sessions are killed anyway? I'll try to look more into this if/when I have more examples in the future, although there doesn't seem to be enough 'ghost' calls to deplete a significant enough number of SHM shards.

BR,
George

Richard Fuchs

unread,
May 14, 2026, 4:15:33 PM (5 days ago) May 14
to rtpe...@googlegroups.com
On 14/05/2026 14.50, George Diamantopoulos wrote:
> Thanks for tackling this. The issue hasn't manifested since yesterday,
> so I don't have any updates yet.
>
> I'll upgrade to the next 26.0 release to include that patch if it's
> ever merged in that branch.
I do plan to backport it but I was going to hold off until confirmation
that there isn't a lingering reference leak.
> ...
>
> If that's the case, it might help if I do a delete with delay=0 for
> those serial forking scenarios, since I do rtpengine_manage anew in
> the post-failure branch route again. I'll give it a shot if you think
> it's plausible.

These sound in fact exactly like the conditions needed to trigger this.

It's quite easy to reproduce it with artificial call scenarios: lots and
lots of short-lived calls (or branches), with the occasional long-lived
one. What exactly counts as "long-lived" depends on how quickly the
offers are coming through.

A shorter delete-delay can mitigate it, but if the call scenarios are
unusual enough, it might still happen.

> Additionally, I did find one call with a duration of 2 days on one of
> the rtpengine instances yesterday (although not one that manifested
> the issue). Isn't there a universal timeout for sessions, after which
> sessions are killed anyway?

Only if enabled! 😁

The config option is "final-timeout."

Cheers

Reply all
Reply to author
Forward
0 new messages