Handling sequence number reset on same SSRC

501 views
Skip to first unread message

rhollis

unread,
Dec 22, 2023, 12:10:09 PM12/22/23
to rtpengine
Hello.

I have run into the same issue noted in these other tickets, where the sequence numbers for an SSRC are being reset to a lower base after the call is taken off hold, resulting in an error at the webRTC client.

Primarily, I'm just looking to understand why this can only be handled when transcoding is enabled. I am running with in-kernel forwarding for my calls, and I am concerned with the performance impact that changing some calls to transcoding will have. Our customers are putting calls on hold frequently, so even if I only engage transcoding for these scenarios, it will effect a large portion of our traffic.

Has anyone found a good way to handle this scenario when you are trying to keep the bulk of your traffic handled within the kernel space?

Thank you!
-rhonda-

Richard Fuchs

unread,
Dec 22, 2023, 3:16:39 PM12/22/23
to rtpe...@googlegroups.com
On 22/12/2023 12.10, [EXT] rhollis wrote:
> Hello.
>
> I have run into the same issue noted in these other tickets, where the
> sequence numbers for an SSRC are being reset to a lower base after the
> call is taken off hold, resulting in an error at the webRTC client.
> https://github.com/sipwise/rtpengine/issues/673
> https://github.com/sipwise/rtpengine/issues/664
>
> Primarily, I'm just looking to understand why this can only be handled
> when transcoding is enabled. I am running with in-kernel forwarding
> for my calls, and I am concerned with the performance impact that
> changing some calls to transcoding will have. Our customers are
> putting calls on hold frequently, so even if I only engage transcoding
> for these scenarios, it will effect a large portion of our traffic.

Can you give a more concrete example of what you're seeing?

In general, sequence number resets aren't a problem, but SRTP is
sensitive to them as there's an implicitly maintained roll-over counter,
and if sender and receiver disagree about how many times the sequence
numbers have rolled over, then SRTP breaks.

Does your hold scenario involve blocking or black-holing RTP from some
source?

Transcoding can be a solution because then rtpengine itself generates
the RTP stream and so can guarantee consistent sequence numbers, instead
of just forwarding whatever sequence number the other side is sending.

Cheers

rhollis

unread,
Dec 22, 2023, 5:32:01 PM12/22/23
to rtpengine
Thanks for the quick response, Richard.

There is no blocking or black-holing of RTP in our scenario.

I've attached a log of the call flow. Summary of the flow is as follows:
- outbound call from webRTC
- Immediate reInvite to webRTC from the media server to set up call recording
At this point we have the media coming in to the rtpengine from the non-webRTC side as: (RTP seq 39735 TS 1203186969 SSRC 3570bb5e)
- webRTC puts the call on hold
- after 10 minutes, webRTC takes the call off hold (we only see this issue when the call has been on hold for more than 10 minutes, but we have been unable to determine any way to resolve this before it reaches the rtpengine)
When the call comes off hold, we now have media coming in on the same SSRC, but lower sequence number: (RTP seq 9160 TS 1208783769 SSRC 3570bb5e)
This results in no inbound audio to the webRTC client and the following error in Chrome debug logs:
20114:122639:1025/153553.972727:WARNING:srtp_session.cc(254)] Failed to unprotect SRTP packet, err=10, previous failure count: 0
20114:122639:1025/153553.972786:ERROR:srtp_transport.cc(216)] Failed to unprotect RTP packet: size=188, seqnum=9160, SSRC=896580446, previous failure count: 0

I am able to workaround the issue by changing the p-time on the reInvite when I take the call off hold, resulting in transcoding being enabled and a new SSRC. However, as noted previously, I'm concerned about what sort of performance impact this may have.

Thanks again for your help.
-r

resetSeq_rtpengine.log

Richard Fuchs

unread,
Dec 23, 2023, 8:22:39 AM12/23/23
to rtpe...@googlegroups.com
On 22/12/2023 17.32, [EXT] rhollis wrote:
> Thanks for the quick response, Richard.
>
> There is no blocking or black-holing of RTP in our scenario.
>
> I've attached a log of the call flow. Summary of the flow is as follows:
> - outbound call from webRTC
> - Immediate reInvite to webRTC from the media server to set up call
> recording
> At this point we have the media coming in to the rtpengine from the
> non-webRTC side as: (RTP seq 39735 TS 1203186969 SSRC 3570bb5e)
> - webRTC puts the call on hold
> - after 10 minutes, webRTC takes the call off hold (we only see this
> issue when the call has been on hold for more than 10 minutes, but we
> have been unable to determine any way to resolve this before it
> reaches the rtpengine)
> When the call comes off hold, we now have media coming in on the same
> SSRC, but lower sequence number: (RTP seq 9160 TS 1208783769 SSRC
> 3570bb5e)
> This results in no inbound audio to the webRTC client and the
> following error in Chrome debug logs:
> 20114:122639:1025/153553.972727:WARNING:srtp_session.cc(254)] Failed
> to unprotect SRTP packet, err=10, previous failure count: 0
> 20114:122639:1025/153553.972786:ERROR:srtp_transport.cc(216)] Failed
> to unprotect RTP packet: size=188, seqnum=9160, SSRC=896580446,
> previous failure count: 0
So now the question is, what does rtpengine think the ROC is, and what
does WebRTC think the ROC is? Sadly the crypto debug log doesn't include
this information as the stream wasn't running long enough.

Another way to inspect the ROC during runtime is to look at the
information in /proc/rtpengine while the stream is being handled by the
kernel module. Or you can capture the RTP in a pcap and then manually
verify the encryption using the SRTP keys from the log, although this
can be tricky with AEAD.

There's a chance that rtpengine and WebRTC use a different algorithm to
determine whether a rollover has occurred with large sequence jumps like
this.

Cheers

rhollis

unread,
Jan 3, 2024, 12:43:04 PM1/3/24
to rtpengine
Hi Richard.

I'm just getting back to this after a holiday break. 

I ran another test and could see that rtpengine has the ROC as 0 after the call comes off hold and the sequence number has reset. I wasn't able to find any place to identify what webRTC has the ROC as at that point, but it seems like it would be 1, given the unprotect errors from that side. In this last test, the last sequence number before the hold was 65528, and then after the hold the sequence is 33277. Is rtpengine only resetting the ROC if a packet with Sequence number 65535 is received? (I'm learning how the ROC works as I look into this issue, so thank you for being patient with my questions.)

stats: 2736864 bytes, 15912 packets, 0 errors
RTP payload type 0: 0 bytes, 0 packets
RTP payload type 8: 0 bytes, 0 packets
RTP payload type 9: 2736864 bytes, 15912 packets
SSRC in: 30a67c, 6bbe9b4
output #0
SRTP encryption parameters:
cipher: AEAD-AES-GCM-256
master key: 9245a07bc1c8354f8f119974603912b12fdf63967a998d9bde993ebcc9dd2e28
master salt: 3ecf7e90c0d14526536cd52d0000
session key: 95d8a87aa896a1e87e50da4120550e6010475082790487120b2d917b78a741bd
session salt: 5632a3fa8aa0bdbb91022a6f0000
session auth: ec82baa0336f84a220128a66eb90a6d6a5893451
ROC: 0 (49371), 0 (329), 0 (0), 0 (0)
stats: 2914564 bytes, 15503 packets, 0 errors
RTP payload type 0: 0 bytes, 0 packets
RTP payload type 8: 0 bytes, 0 packets
RTP payload type 9: 2914564 bytes, 15503 packets
RTP payload type 110: 0 bytes, 0 packets
SSRC in: 5f06bf7c
SRTP decryption parameters:
cipher: AEAD-AES-GCM-256
master key: 6e77c9a3eeeea74899283f55ccb22afb4a94dbad5206419715d8b494734492bc
master salt: 948a3d5e31324886d335e07b0000
session key: 31461718b8e530063fe66b6a7b16b0fd9e41dafa933af7536f0cbabff812595c
session salt: da80a7c28ae65f0010e357150000
session auth: 8385c3fd5d992fcb1d532eb38bcbf8c68be3fe00
ROC: 0 (19849), 0 (0), 0 (0), 0 (0)
option: rtcp-mux
option: dtls
option: stun
output #0
Jan02_rtpengine.zip

Richard Fuchs

unread,
Jan 4, 2024, 2:06:42 PM1/4/24
to rtpe...@googlegroups.com
On 03/01/2024 12.43, [EXT] rhollis wrote:
> Hi Richard.
>
> I'm just getting back to this after a holiday break.
>
> I ran another test and could see that rtpengine has the ROC as 0 after
> the call comes off hold and the sequence number has reset. I wasn't
> able to find any place to identify what webRTC has the ROC as at that
> point, but it seems like it would be 1, given the unprotect errors
> from that side. In this last test, the last sequence number before the
> hold was 65528, and then after the hold the sequence is 33277. Is
> rtpengine only resetting the ROC if a packet with Sequence number
> 65535 is received? (I'm learning how the ROC works as I look into this
> issue, so thank you for being patient with my questions.)

ROC determination follows appendix A from RFC 3711
(https://datatracker.ietf.org/doc/html/rfc3711#page-51). Essentially the
difference between sequence numbers (sans 16-bit overflow) must be
larger than 32,768 for a ROC increase to occur.

Given what you've posted there should have been a ROC increase. But: ROC
is bound to a particular SSRC. The output you posted has shows two
SSRCs. A brand new SSRC would always start with a ROC of 0 regardless of
what happened with other SSRCs.

Cheers

rhollis

unread,
Jan 4, 2024, 6:41:55 PM1/4/24
to rtpengine
The sequence number rollover is happening within the life of the second SSRC. The call flow has an immediate re-INVITE to a recording media server, which results in the new SSRC. But it's later on within that new SSRC where we see the change in the sequence numbers.

So, in the last log that I attached, it looks like this: 

21:55:39 - Initial outbound INVITE
** SSRC 6bbe9b4: receive from BroadWorks, send to webRTC **
2024-01-02T21:55:41.512577+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] New ingress SSRC for: :0 SSRC: 6bbe9b4
2024-01-02T21:55:41.512954+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] New egress (direct) SSRC for: 192.168.1.145:49743 SSRC: 6bbe9b4
2024-01-02T21:55:42.078872+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Handling packet: remote 10.103.114.192:23526 (expected: 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 28 TS 4480 SSRC 6bbe9b4)
2024-01-02T21:55:42.079000+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] Forward to sink endpoint: local 208.89.108.16:20496 -> remote 68.5.242.25:49743 (RTP seq 28 TS 4480 SSRC 6bbe9b4)

** SSRC 5f06bf7c: receive from webRTC, send to BroadWorks**
2024-01-02T21:55:42.232051+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] New ingress SSRC for: 68.5.242.25:49743 SSRC: 5f06bf7c
2024-01-02T21:55:42.232509+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] New egress (direct) SSRC for: 10.103.114.192:23526 SSRC: 5f06bf7c
2024-01-02T21:55:42.232675+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] Handling packet: remote 68.5.242.25:49743 (expected: 68.5.242.25:49743) -> local 208.89.108.16:20496 (RTP seq 3190 TS 3994006908 SSRC 5f06bf7c)
2024-01-02T21:55:42.232757+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Forward to sink endpoint: local 10.103.114.27:21110 -> remote 10.103.114.192:23526 (RTP seq 3190 TS 3994006908 SSRC 5f06bf7c)

2024-01-02T21:55:43.997120+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21111]: [core] New ingress SSRC for: 10.103.114.192:23527 SSRC: 6bbe9b4
2024-01-02T21:55:43.997311+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21111]: [core] Handling packet: remote 10.103.114.192:23527 (expected: 10.103.114.192:23527) -> local 10.103.114.27:21111
2024-01-02T21:55:43.997405+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21111]: [core] New egress (direct) SSRC for: 192.168.1.145:60143 SSRC: 6bbe9b4
2024-01-02T21:55:43.998968+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Handling packet: remote 10.103.114.192:23526 (expected: 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 124 TS 19840 SSRC 6bbe9b4)
2024-01-02T21:55:43.999094+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] Forward to sink endpoint: local 208.89.108.16:20496 -> remote 68.5.242.25:49743 (RTP seq 124 TS 19840 SSRC 6bbe9b4)

2024-01-02T21:55:44.653666+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] New ingress SSRC for: 192.168.1.145:60143 SSRC: 5f06bf7c
2024-01-02T21:55:44.653865+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] New egress (direct) SSRC for: 10.103.114.192:23527 SSRC: 5f06bf7c
2024-01-02T21:55:44.654147+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] Handling packet: remote 68.5.242.25:49743 (expected: 68.5.242.25:49743) -> local 208.89.108.16:20496 (RTP seq 3315 TS 3994026908 SSRC 5f06bf7c)
2024-01-02T21:55:44.654193+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Forward to sink endpoint: local 10.103.114.27:21110 -> remote 10.103.114.192:23526 (RTP seq 3315 TS 3994026908 SSRC 5f06bf7c)

21:55:47 - INVITE to recording
** new SSRC 30a67c: receive from BroadWorks, send to webRTC **
024-01-02T21:55:48.197049+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] New ingress SSRC for: 10.103.114.192:23526 SSRC: 30a67c
2024-01-02T21:55:48.197141+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Handling packet: remote 10.103.114.192:23526 (expected: 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 65255 TS 1072075752 SSRC 30a67c)
2024-01-02T21:55:48.197174+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] New egress (direct) SSRC for: 68.5.242.25:49743 SSRC: 30a67c
2024-01-02T21:55:48.197210+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] Forward to sink endpoint: local 208.89.108.16:20496 -> remote 68.5.242.25:49743 (RTP seq 65255 TS 1072075752 SSRC 30a67c)

21:55:53 - INVITE on hold
** last packet before hold **
2024-01-02T21:55:53.658632+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Handling packet: remote 10.103.114.192:23526 (expected: 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 65528 TS 1072119432 SSRC 30a67c)
2024-01-02T21:55:53.658687+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] Forward to sink endpoint: local 208.89.108.16:20496 -> remote 68.5.242.25:49743 (RTP seq 65528 TS 1072119432 SSRC 30a67c)

22:06:59 - INVITE off hold
2024-01-02T22:07:02.977260+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Handling packet: remote 10.103.114.192:23526 (expected: 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 33457 TS 1077473992 SSRC 30a67c)

Richard Fuchs

unread,
Jan 5, 2024, 9:22:44 AM1/5/24
to rtpe...@googlegroups.com
On 04/01/2024 18.41, [EXT] rhollis wrote:
> 22:06:59 - INVITE off hold
> 2024-01-02T22:07:02.977260+00:00 rtpengine02-dev-s03
> rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core]
> Handling packet: remote 10.103.114.192:23526 (expected:
> 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 33457 TS
> 1077473992 SSRC 30a67c)
Is there a matching "forward to ..." for this packet?

rhollis

unread,
Jan 5, 2024, 11:47:14 AM1/5/24
to rtpengine
Yes. This is the first packet after the call is taken off hold:

2024-01-02T22:06:59.377060+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core] Handling packet: remote 10.103.114.192:23526 (expected: 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 33277 TS 1077445192 SSRC 30a67c)
2024-01-02T22:06:59.377173+00:00 rtpengine02-dev-s03 rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core] Forward to sink endpoint: local 208.89.108.16:20496 -> remote 68.5.242.25:49743 (RTP seq 33277 TS 1077445192 SSRC 30a67c)

The full log was attached previously in a zip, but here it is again.

-r
n3uhohjsanumk2qlc51o.log

Richard Fuchs

unread,
Jan 5, 2024, 12:28:04 PM1/5/24
to rtpe...@googlegroups.com
On 05/01/2024 11.47, rhollis wrote:
> Yes. This is the first packet after the call is taken off hold:
>
> 2024-01-02T22:06:59.377060+00:00 rtpengine02-dev-s03
> rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 21110]: [core]
> Handling packet: remote 10.103.114.192:23526 (expected:
> 10.103.114.192:23526) -> local 10.103.114.27:21110 (RTP seq 33277 TS
> 1077445192 SSRC 30a67c)
> 2024-01-02T22:06:59.377173+00:00 rtpengine02-dev-s03
> rtpengine[2750852]: DEBUG: [n3uhohjsanumk2qlc51o port 20496]: [core]
> Forward to sink endpoint: local 208.89.108.16:20496 -> remote
> 68.5.242.25:49743 (RTP seq 33277 TS 1077445192 SSRC 30a67c)

Right, so I played this through the pseudo-code from RFC 3711 as well as
the actual code in rtpengine, and I have to take back what I wrote in an
earlier email. The new seq 33277 is actually closer to the previous seq
65528 without rollover, so according to the the algorithm described in
RFC 3711, the ROC should not have increased. So going by this algorithm
rtpengine is actually doing the right thing.

Now if the WebRTC side disagreed and has increased the ROC anyway, then
it must obviously be using a different algorithm (or has had some other
reason to increase the ROC). Sadly my knowledge of WebRTC code bases is
lacking. A quick glance at srtp_estimate_index() from libsrtp seems to
do the right thing, but 🤷

One solution of course would be to start rewriting the sequence numbers
in rtpengine so that we can guarantee a consistent view for the
recipient, but alas that's not a feature we have.

Cheers

rhollis

unread,
Jan 5, 2024, 1:54:48 PM1/5/24
to rtpengine
OK. I appreciate you taking the time to look at this one, and for helping me better understand how the ROC works. If I figure out what's happening on the webRTC side, I'll let you know.

-r

Reply all
Reply to author
Forward
0 new messages