VP8 temporal scalability

1,732 views
Skip to first unread message

Gustavo García

unread,
May 20, 2013, 8:43:53 PM5/20/13
to discuss...@googlegroups.com
Hi guys,

We have been playing with VP8 temporal scalability and it is really awesome!!! After enabling multiple layers in WebRTC codebase (4 layers) Chrome&Firefox are able to process different streams (one with layers 0-1 and other with layers 0-3).

Are there plans to enable it soon? I think most of the pieces are in place and we just need to be able to activate it with SDP and/or media constraints.

Regards,
G

Justin Uberti

unread,
May 21, 2013, 12:07:35 PM5/21/13
to discuss-webrtc
Temporal scalability has an efficiency cost, and as you noted we don't have a control surface to enable it yet.

What benefit are you expecting to get from TS?



--

---
You received this message because you are subscribed to the Google Groups "discuss-webrtc" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss-webrt...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Dennis E. Dowhy

unread,
May 21, 2013, 12:11:57 PM5/21/13
to discuss...@googlegroups.com
Not everyone is doing peer to peer. Many of us are using mixers/translators for multiparty chat. 

Sent from my iPhone

Justin Uberti

unread,
May 21, 2013, 12:19:24 PM5/21/13
to discuss-webrtc
Sure. The question still remains - what exact benefit do you hope to obtain, and what controls do you need?

Want to understand the requirements before discussing the solution.

Vladimir Ralev

unread,
May 21, 2013, 1:35:21 PM5/21/13
to discuss...@googlegroups.com
The mixer servers can have any sort of logic to decide which participants get high quality or low quality streams. That can be based on speaker audio power, some UI configuration, bandwidth, traffic cost estimation or playback device capability (resolution, CPU). Sometimes people want to send high quality stream to paying customers and low quality stream for free as a sample. Many live tv broadcasters are adopting this including amateur broadcasters from justin/twitch tv. It will be a dramatic reduction in costs over there.

A google hangouts-like application will also benefit from it, because you don't have to transmit your own stream to individual participants encoded and encrypted separately with your own CPU. The mixer server will do this for you and it will decide who gets high quality and low quality. While you will still receive N-1 streams from each other conference participant, you will only send one stream as opposed to doing the conference peer to peer where you encode and encrypt your own stream N-1 times. Encoding is the most expensive operation so reducing it by a factor of N is a big deal. TS eliminates the need to reencode the stream in the mixer as well, it will just strip the quality bits for each participant. The reciprocal support from the browser is simply to support unidirectional inbound streams in various TS configurations.

Justin Uberti

unread,
May 21, 2013, 2:25:20 PM5/21/13
to discuss-webrtc
Note that you are probably not going to be able to get a low+high quality stream just from TS. TS only controls framerate, and as such has limited ability to affect overall bitrate.

That's why I'm trying to get a good handle on the exact requirements.

Dennis E. Dowhy

unread,
May 21, 2013, 2:42:23 PM5/21/13
to discuss...@googlegroups.com
Not only would I like to see the VP8's temporal scalability exposed, I would like to see the VP8's encoding simulcasting properties exposed too so that we can get independent streams at different spatial resolutions.  Then mixers can choose between both spatial streams and temporal properties.  (Quite honestly, I wish H.264 SVC would be supported but there's about a million other 'mandatory to implement codec choice' threads around where that discussion would be more appropriate for).

Vladimir Ralev

unread,
May 21, 2013, 2:58:26 PM5/21/13
to discuss...@googlegroups.com
Someone must post benchmarks about how much of a difference does TS make for bitrate and CPU. Because if it's not at capable of at least 50% reduction at some lower settings then I am sure everyone would agree that it's not worth it. 

Gustavo García

unread,
May 21, 2013, 3:09:07 PM5/21/13
to discuss...@googlegroups.com
Current configuration in WebRTC codebase for 4 layers encoding is
{0.25f, 0.4f, 0.6f, 1.0f} // 4 layers {25%, 15%, 20%, 40%}

If you forward only base layer you will get 25% reduction.

Gustavo García

unread,
May 21, 2013, 3:11:14 PM5/21/13
to discuss...@googlegroups.com
The benefit is being able to forward different number of layers from a mixer/translator to different receivers depending on the available bandwidth of each one.

The main requirement is being able to indicate to the sender to encode VP8 in multiple layers, including the number of layers and potentially the configuration or the dependencies between layers. This could be decided out of band (and indicated with media constraints) or negotiated during the SDP O/A (Offerer will indicate always support for multiple layers and Answerer will decide if it should be used or not).

The receivers should be prepared to receive single layered video or N layers of multi layered video and be able to decode and present it properly in any situation.

In any case it shouldn't be enabled by default because of the efficiency cost you mentioned (I don't have the data for VP8 but it is > 25% overhead in case of H264 SVC).

Do you think we should put together something more formal? or perhaps moving the discussion to W3C/IETF?

Regards,
G.

Steve Mcfarlin

unread,
May 21, 2013, 3:30:49 PM5/21/13
to discuss...@googlegroups.com
Even if there is not a large bit rate reduction I still think this has value under the assumption it has decoding performance benefits. I can see a situation where you have a multi-party 'talk show' that is being broadcast by a mixer. Should there be simulcast streams along with the ability to chose temporal scalability levels, you could target a very wide range of devices.  In my experience an iPhone 3GS can not decode high quality high frame rate VP8 streams. With simulcast and TS levels, it could be possible to allow a 3GS to consume multiple streams. 

Gustavo García

unread,
Sep 2, 2013, 11:32:28 PM9/2/13
to discuss...@googlegroups.com
Hi,

I've seen a lot of new properties in the codebase apparently related to scalability:
// Experimental: Enable multi layer?
Settable<bool> video_three_layers;
// Experimental: Enable one layer screencast?
Settable<bool> video_one_layer_screencast;
// Experimental: Enable WebRTC layered screencast.
Settable<bool> video_temporal_layer_screencast;

What's the status of these capabilities? Is it enabled only in a google private build?

How can we make some progress on this? I submitted a draft explaining the motivations and requirements of layered video coding for WebRTC [1] and we would be happy to submit a patch to include experimental support if there is any chance to have it merged.

Best regards,
G.

[1] http://tools.ietf.org/html/draft-garcia-simulcast-and-layered-video-webrtc-00


On 21/05/2013, at 12:06, Gustavo Garcia wrote:

> Current configuration in WebRTC codebase for 4 layers encoding is
> {0.25f, 0.4f, 0.6f, 1.0f} // 4 layers {25%, 15%, 20%, 40%}
>
> If you send only base layer you will get 25% reduction.
>
> On 21/05/2013, at 11:58, Vladimir Ralev wrote:
>
Reply all
Reply to author
Forward
0 new messages