PSA: VP8 Simulcast in the SFU.

Philip Eliasson

unread,

Oct 20, 2017, 4:32:19 AM10/20/17

to discuss-webrtc

Following-up on various requests in the past about how to handle VP8 simulcast streams correctly in a SFU, we would like to share three guidelines for handling picture IDs. WebRTC has no notion of simulcast when receiving a video stream, and it is therefore critical that any switch made by the SFU must be completely transparent to the receiver.

1) In the case of sending an RTP stream without using picture ids the SFU has to rewrite the RTP sequence numbers so that the stream looks continuous, and rewrite the RTP timestamps so that there is no significant jump compared to the last frame. The timestamp may not jump backward.

2) If picture ids are used, besides following the steps outlined in 1), the SFU also has to rewrite the picture ids to make them continuous. Note that it is not possible to suddenly drop picture ids from the stream, as the receiver only keeps state relevant to picture ids, if they are used and therefore won't fall back to RTP sequence numbers.

3) If picture ids, tl0 picture indexes and temporal indexes are used, besides following the steps outlined in 2), the SFU also has to rewrite the tl0 picture indexes. Note that it is possible to drop the tl0 picture indexes and the temporal indexes from the stream, but if they are used again the tl0 picture index has to continue from where it left off. Also note that the state of the tl0 picture indexes is only updated by the receiver when both the tl0 picture index and temporal index is received.

Sergio Garcia Murillo

unread,

Oct 20, 2017, 4:50:52 AM10/20/17

to discuss...@googlegroups.com

Hi Philip,

Why should we mandate pict id rewriting at all? I think it was quite an agreement on the discussion on the issue:

https://bugs.chromium.org/p/webrtc/issues/detail?id=7897

TL0PICIDX gives you full dependency tracking of temporal level 0 frames, and thus long-term decodability.  As long as a TL0 frame's TL0PICIDX is one more than the previous TL0 frame's TL0PICIDX, it's decodable.

For short-term (for higher temporal level pictures between the tl0 frames), there are three things that can tell you a frame is definitely decodable:

1. If you have RTP sequence number continuity since the last TL0 frame.
2. If you have picture ID continuity since the last TL0 frame or frame.
3. If a frame has the Y bit set and has the same TL0PICIDX as the last TL0 frame.

This isn't complete -- it's possible to have a frame be decodable even when none of these three things is true -- but I believe it's the best that can be done.

Note that this has quite huge implications for implementing PERC and other end to end encryption proposals where you CAN'T rewrite the pict ids at all o the SFU

Best regards
Sergio

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/f5012b3b-bfdd-47f0-a781-d246223da9f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergio Garcia Murillo

unread,

Oct 20, 2017, 5:04:13 AM10/20/17

to discuss...@googlegroups.com

Moreover, switching between simulcast streams should be done on Iframe at the base layer, so why should we require continuity of tl0pcidx at all on that case? (Obviously, after the switch all tl0pcidx would be continuous again)

Best regards
Sergio

Iñaki Baz Castillo

unread,

Oct 20, 2017, 6:40:17 AM10/20/17

to discuss...@googlegroups.com

On 20 October 2017 at 10:32, 'Philip Eliasson' via discuss-webrtc

<discuss...@googlegroups.com> wrote:
> Following-up on various requests in the past about how to handle VP8
> simulcast streams correctly in a SFU, we would like to share three
> guidelines for handling picture IDs.

It seems to me that these guidelines are based on Chrome > M56
behaviour. The question is: does the VP8 RFC state the same
constraints for a VP8 receiver?

IMHO we need something more robust that just how Chrome decides to
implement stuff in each release.

--
Iñaki Baz Castillo
<i...@aliax.net>

Gustavo García

unread,

Oct 20, 2017, 9:23:03 AM10/20/17

to discuss...@googlegroups.com

Thank you very much Philip, I think it is a great summary of the situation.

@sergio
1/ If the SFU is dropping frames (temporal scalability) and you don't rewrite PictureIds there are cases where you won't be able to follow the rule 2 in your description and decode the frame even if the frame is actually decodable.
2/ tl0picdx has to be rewritten even if you switch ssrcs in a keyframe boundary because that keyframe could be lost : https://twitter.com/anarchyco/status/882906846528516096

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/CALiegfkQ3LW%3DoNKJC9BxXsDHgYKSm9GUsMT4Z%2BriK9kdA9%2Bhbw%40mail.gmail.com.

Sergio Garcia Murillo

unread,

Oct 20, 2017, 9:43:30 AM10/20/17

to discuss...@googlegroups.com

On 20/10/2017 15:22, Gustavo García wrote:

1/ If the SFU is dropping frames (temporal scalability) and you don't rewrite PictureIds there are cases where you won't be able to follow the rule 2 in your description and decode the frame even if the frame is actually decodable.

They were not my rules but Jonathan Lennox's ones.. ;)

Could you put an example of one of those cases?

2/ tl0picdx has to be rewritten even if you switch ssrcs in a keyframe boundary because that keyframe could be lost : https://twitter.com/anarchyco/status/882906846528516096

In your image neither the picId nor the rtp seq num would be continuous, so it will not be decoded, right?

Best regards
Sergio

Boris Grozev

unread,

Oct 20, 2017, 12:51:06 PM10/20/17

to discuss...@googlegroups.com, Sergio Garcia Murillo

Hi Sergio,

Can you clarify the semantics of the rules please? The only way to
interpret them that makes sense to me is that if *any* of the three
conditions holds, then the frame is decodable (since we want frame to be
decodable even in the absence of rtp continuity).

And in that case Gustavo's example is valid -- according to the rules
the frame is decodable (rule 3), but it shouldn't be.

I wonder if this problem won't be better solved by introducing some
limitations on the senders' side. For example:

* If PictureID is used, all simulcast streams must use the same
numbering (i.e. the same input frame will have the same PictureID in all
streams).
* If tl0picdx is used, all simulcast streams must use the same numbering.

Since streams can be paused, the numbering follows the numbering on the
base simulcast stream. And it may also be worth describing how streams
are allowed to be paused -- e.g. if a stream is paused all streams
"above" it are also paused (that is, it is not allowed to pause the 180p
stream but continue sending the 360p stream).

Regards,
Boris

Sergio Garcia Murillo

unread,

Oct 20, 2017, 3:48:08 PM10/20/17

to Boris Grozev, discuss...@googlegroups.com

On 20/10/2017 18:50, Boris Grozev wrote:
> Hi Sergio,
>
> On 20/10/2017 08:43, Sergio Garcia Murillo wrote:
>> On 20/10/2017 15:22, Gustavo García wrote:
>>>
>>> 1/ If the SFU is dropping frames (temporal scalability) and you
>>> don't rewrite PictureIds there are cases where you won't be able to
>>> follow the rule 2 in your description and decode the frame even if
>>> the frame is actually decodable.
>> They were not my rules but Jonathan Lennox's ones.. ;)
>>
>> Could you put an example of one of those cases?
>>
>>> 2/ tl0picdx has to be rewritten even if you switch ssrcs in a
>>> keyframe boundary because that keyframe could be lost :
>>> https://twitter.com/anarchyco/status/882906846528516096
>>
>> In your image neither the picId nor the rtp seq num would be
>> continuous, so it will not be decoded, right?
> Can you clarify the semantics of the rules please? The only way to
> interpret them that makes sense to me is that if *any* of the three
> conditions holds, then the frame is decodable (since we want frame to
> be decodable even in the absence of rtp continuity).

Yes, that is also my understanding.

>
> And in that case Gustavo's example is valid -- according to the rules
> the frame is decodable (rule 3), but it shouldn't be.

I don't think Gusavo's example matches rule 3, but I acknowledge that
there could be some corner cases in which this could happen (and that
could be prevented by detecting it at the sfu and requesting a new Iframe)

Please note that my intention is not preventing anyone to rewrite pic
ids and tl0picidx if they are in an environment that allows that. What I
am trying to avoid is to force to use it always and introduce an
artificial constrain that will make video freeze in most of the
situations in which pict id rewriting is not needed at all.

Best regards
Sergio

Boris Grozev

unread,

Oct 20, 2017, 4:58:19 PM10/20/17

to Sergio Garcia Murillo, discuss...@googlegroups.com

On 20/10/2017 14:47, Sergio Garcia Murillo wrote:
> On 20/10/2017 18:50, Boris Grozev wrote:
>> Hi Sergio,
>>
>> On 20/10/2017 08:43, Sergio Garcia Murillo wrote:
>>> On 20/10/2017 15:22, Gustavo García wrote:
>>>>
>>>> 1/ If the SFU is dropping frames (temporal scalability) and you
>>>> don't rewrite PictureIds there are cases where you won't be able to
>>>> follow the rule 2 in your description and decode the frame even if
>>>> the frame is actually decodable.
>>> They were not my rules but Jonathan Lennox's ones.. ;)
>>>
>>> Could you put an example of one of those cases?
>>>
>>>> 2/ tl0picdx has to be rewritten even if you switch ssrcs in a
>>>> keyframe boundary because that keyframe could be lost :
>>>> https://twitter.com/anarchyco/status/882906846528516096
>>>
>>> In your image neither the picId nor the rtp seq num would be
>>> continuous, so it will not be decoded, right?
>> Can you clarify the semantics of the rules please? The only way to
>> interpret them that makes sense to me is that if *any* of the three
>> conditions holds, then the frame is decodable (since we want frame to
>> be decodable even in the absence of rtp continuity).
>
> Yes, that is also my understanding.
>
>>
>> And in that case Gustavo's example is valid -- according to the rules
>> the frame is decodable (rule 3), but it shouldn't be.
>
> I don't think Gusavo's example matches rule 3,

Oh, you are right. What if the third packet has tl0picidx=0? That's how
I originally (mis)interpreted the example.

Regards,
Boris

Sergio Garcia Murillo

unread,

Nov 10, 2017, 7:44:10 AM11/10/17

to discuss...@googlegroups.com

Hi Philip,

I have created a patch for the VP8 reference frame implementation that checks seq num continuity in order to discard missing frames when there is a pict id discontinuity:

https://webrtc-review.googlesource.com/c/src/+/18460

I added test cases to cover this scenario and ensure also that doesn't cause any regression with current behavior. This will make the requirement of the rewriting pict ids not needed at all. Could you review it and let me know how do you feel about it?

For tl0picidx rewriting, I will work on it during next week, but I feel that all except quite extreme corner cases could be covered as well without requiring to rewrite it.

Best regards
Sergio

--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/f5012b3b-bfdd-47f0-a781-d246223da9f0%40googlegroups.com.

Iñaki Baz Castillo

unread,

Oct 12, 2018, 8:13:30 AM10/12/18

to discuss...@googlegroups.com

Hi, it would be great to know if something like 2) and 3) must be also
performed by the SFU when it's H264 with simulcast. Or is it enough by
just rewriting RTP seqs and timestamps?

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/discuss-webrtc/f5012b3b-bfdd-47f0-a781-d246223da9f0%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward