Temporal scalability

485 views
Skip to first unread message

Renaud GHIA

unread,
Jun 3, 2010, 5:37:16 PM6/3/10
to webm-d...@webmproject.org
I didn't find lot of information about temporal scalability of VP8. I don't talk about temporal scalability on the encoder side but the capability to drop encoded frame for multi party video conferencing purpose. I have read that "Through use of the golden frame, normal frames, and droppable frames, VP8 achieves four levels of limited temporal scalability. This means we can produce a single bitstream that degrades as needed for each party. High-def parties pay no penalty for the slower connections in the conference."
In fact with a single bitstream of 30fps, i want to produce several streams of 15fps, 5fps and 2 fps.

How to achieve this ?
Must i to encode with a specific frame pattern ? Which frames are dropable ?

Thanks.

Renaud



John Koleszar

unread,
Jun 3, 2010, 5:53:05 PM6/3/10
to Renaud GHIA, webm-d...@webmproject.org

I can point you to one example of some of the work we've done with
this here[1]. This is the documentation for one of the example
programs that gets built when you build libvpx:

[1]: http://www.webmproject.org/tools/vp8-sdk/example_vp8_scalable_patterns.html

Suman, maybe you can describe some of the work you've done with this?

John

Suman Sunkara

unread,
Jun 3, 2010, 6:24:52 PM6/3/10
to Renaud GHIA, webm-d...@webmproject.org
Hello Renaud,

As you have rightly pointed out, it is possible to achieve temporal scalability in VP8 using different types of frames. The key is to have different reference frames, so that even if a frame is dropped, the other frames can still be decoded (since they do not refer to the dropped frame) without any error. To illustrate it better, look at the following scenario:

There are typically 3 different reference frames in VP8. 
a) Last Frame (L)
b) Golden Frame (G)
c) Alternate Reference Frame (A)

Consider a group of 16 frames as mentioned in [1]
Frame   0 :                                       Key Frame, Updates LGA .                   This updates all the 3 reference buffers.
Frames 1,3,5,7,9,11,13,15:             Use LGA.  Update None.                       These frames cannot be used as reference frames.
Frames  2,6,10,14:                          Use LGA.  Update L.                              Last frame reference buffer updated.
Frames  4,12:                                  Use  GA,   Update LG.                           Both last and golden frame reference buffers updated.
Frame    8:                                       Use  A,     Update LGA.                         All reference buffers updated.

Initial  level: Decode all frames  
Level  1: Drop 1,3,5,7,9,11,13,15 frames
Level  2: Drop Level 1 frames + 2,6,10,14 frames
Level  3: Drop Level 1,2 frames + 4,12 frames
Level  4: Drop all frames except Key frame (Frame 0)

We can have 5 different levels with different frame rates. 
When level 1 frames are dropped,  level 2 frames can still be decoded without errors since no frames from level 1 are used as reference frames for level 2.
Similarly when level 2 frames are dropped, level 3 frames can be decoded and so on.


Suman

On Thu, Jun 3, 2010 at 5:37 PM, Renaud GHIA <renau...@gmail.com> wrote:

Suman Sunkara

unread,
Jun 3, 2010, 6:40:12 PM6/3/10
to Renaud GHIA, webm-d...@webmproject.org
Hi,

Sorry for any confusion created earlier with the levels. This should give a clear picture of the frames in different levels.

Level  1:  1,3,5,7,9,11,13,15 frames
Level  2:  2,6,10,14 frames
Level  3:  4,12 frames
Level  4:  Frame 8
Level  5:  Frame 0

We can have 5 different levels with different frame rates. 
When level 1 frames are dropped,  frames from level 2 and higher can still be decoded without errors since no frames from level 1 are used as reference frames.
Similarly when level 1& level 2 frames are dropped, frames from level 3 & higher can be decoded and so on.
Finally, since level 5 is a key frame no frames from any of the previous levels are used to decode the same.

Suman.

Renaud GHIA

unread,
Jun 3, 2010, 6:45:28 PM6/3/10
to Suman Sunkara, webm-d...@webmproject.org
Hello Suman,

Thank you very much for your response. It's very clear now.
When we force a frame pattern at the encoder to achieve this, can it always respect the target birate ?

Renaud




2010/6/4 Suman Sunkara <sunk...@google.com>

Suman Sunkara

unread,
Jun 3, 2010, 9:25:40 PM6/3/10
to Renaud GHIA, webm-d...@webmproject.org

Hi Renaud,

The bitrate that you can achieve also depends on the frame rate in temporal scalability. I have not tested extensively for target bitrate but on a small set of clips it worked fine. It would be helpful if you can share your results to get the momentum going forward.

Suman

On Jun 3, 2010 6:45 PM, "Renaud GHIA" <rg...@tixeo.com> wrote:

Hello Suman,

Thank you very much for your response. It's very clear now.
When we force a frame pattern at the encoder to achieve this, can it always respect the target birate ?

Renaud




2010/6/4 Suman Sunkara <sunk...@google.com>


>
> Hello Renaud,
>
> As you have rightly pointed out, it is possible to achieve temporal scalabili...


Satendra

unread,
Jun 4, 2010, 3:09:40 AM6/4/10
to Suman Sunkara, Renaud GHIA, webm-d...@webmproject.org
Hi Suman,

thanks for nice explanation, it definitely cleared how to implement the temporal scaling in VP8.

Although, I have not gone through the complete specs, but is it required to use the same reference pattern?
From your example:
Level  1:  Use LGA.  Update None
Level  2:  Use LGA.  Update L
Level  3:  Use  GA,   Update LG
Level  4:  Use  A,     Update LGA
Level  5:  Uses None, Updates LGA

Isn't it suffice that, higher levels should not use the frames updated by lower levels, irrespective of L, G or A. Is there any constraint by specs or there any compression/quality benefits?

For example:
Level  1:  Use LGA.  Update None
Level  2:  Use LGA.  Update L
Level  3:  Use  GA,   Update A
Level  4:  Use  G,     Update LGA
Level  5:  Uses None, Updates LGA

Best Regards
Satendra
--

-----------------------------------------------------------------------------------------------------------------------------------
"We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise."  ------Larry Wall
-----------------------------------------------------------------------------------------------------------------------------------

Suman Sunkara

unread,
Jun 4, 2010, 9:44:05 AM6/4/10
to Satendra, Renaud GHIA, webm-d...@webmproject.org
Hi Satendra,

It is not required to use the same pattern. The example you suggested should work perfectly fine.
However, for level 3 you could update both L and A instead of just A. This would help for level 2 and lower as they refer to the last frame reference buffer.
If its not updated, they would refer to much earlier frames.

Level 2:  2,6,10,14 frames use LGA
Level 3:  4,12 frames update LA

Frames 6 and 14 would benefit with a better last frame reference buffer (will refer to 4,12). If not, they would refer to frames 2 and 10.

It is also possible to change the frame rate by varying the number of levels.
For example, consider a group of 9 frames as indicated in [1].

Level  1:  Use LGA.  Update L  (1,2,4,5,7,8)
Level  2:  Use GA.    Update LG         (3,6)
Level  3:  Uses  A,   Update LGA  (9,18,..) . Frame 0 should use Key frame.

Here, 2 out of every 3 frames can be dropped (level 1) which enables to achieve a different frame rate.


Suman

alt250

unread,
Jun 15, 2010, 7:27:37 AM6/15/10
to WebM Discussion
Hi Suman,

I did some tests with the first pattern from [1] (group of 16 frames)
and the quality degrade on frames 0, 4, 8 and 12.
It seems to come from these frames not being based on the same
references than the previous frame.
This results in a poor visual quality as the video quality is always
changing.

The second pattern (group of 9 frames) perform better, but has less
scalability levels.

Are there some "patterns" to avoid to preserve visual quality?
Do you have any tips to avoid the quality drops in the first pattern?

I noticed that in [1] the code for the second pattern is changing
encoder configuration on the fly (target bitrate & quantizers).
Is this something supported by the current VP8 encoder?
Was it added to improve the visual quality?

Denis

[1] http://www.webmproject.org/tools/vp8-sdk/example_vp8_scalable_patterns.html

On Jun 4, 3:25 am, Suman Sunkara <sunka...@google.com> wrote:
> Hi Renaud,
>
> The bitrate that you can achieve also depends on the frame rate in temporal
> scalability. I have not tested extensively for target bitrate but on a small
> set of clips it worked fine. It would be helpful if you can share your
> results to get the momentum going forward.
>
> Suman
>
> On Jun 3, 2010 6:45 PM, "Renaud GHIA" <rg...@tixeo.com> wrote:
>
> Hello Suman,
>
> Thank you very much for your response. It's very clear now.
> When we force a frame pattern at the encoder to achieve this, can it always
> respect the target birate ?
>
> Renaud
>
> 2010/6/4 Suman Sunkara <sunka...@google.com>

Suman Sunkara

unread,
Jun 16, 2010, 5:48:12 PM6/16/10
to alt250, WebM Discussion
Hi Denis,

This problem is more apparent in high motion clips as the frames 0,4,8,12 refer to frames whose content is totally different from the one being encoded. This might result in bad video quality for higher level frames. As the number of scalable levels decrease, the frames have better references and hence these frames are encoded at a better quality. This is the reason why the second scalability group (with 9 frames) performs better. In order to preserve visual quality, we could allocate more bits to the higher level frames and relatively less bits to the lower level frames so as to keep the overall bit rate target at the desired level. The second pattern uses a similar scheme to avoid drastic changes in video quality by distributing more bits to higher level frames. 
This was done to improve visual quality.  A better implementation should be able to change the quantization and bit rate parameters on the fly based on the scalability levels, resolution, target bitrate and the motion (low, medium,high). A trial and error scheme with a few clips having different resolutions and bitrates might help in generating a lookup table approach to set the quantization and the bitrate parameters on the fly.

Suman

Suman Sunkara

unread,
Jul 29, 2011, 12:18:11 PM7/29/11
to daniel m, WebM Discussion, agrange
Hi Daniel,

The scalable pattern quality will always be less than the normal case since it does not have the ability to reference/update all frames. However, the optimum case would be to get an output without much degradation. Can you suggest what parameters and clip you have been using for the scalable pattern? You can try changing the GOP size/bitrate/quality for different layers and see if that helps. This is still a work in progress and we are working on making it better.

Suman

On Wed, Jul 27, 2011 at 2:53 PM, daniel m <dan0...@gmail.com> wrote:
I recently implemented a droppable pattern mentioned above but the
quality of the resulting decoded frames (no levels
dropped) is terrible. If I do not encode with the pattern, the
quality
is much much better. Is there anyway of making the scalable pattern's
quality the same as the normal encoder's? I gave the higher levels
higher target bitrates and lower quantizer numbers but the quality was
still bad.

Here is the scalable pattern I used:

Level  1:  Use LGA.  Update L  (1,2,4,5,7,8)
Level  2:  Use GA.    Update LG         (3,6)
Level  3:  Uses  A,   Update LGA  (9,18,..) . Frame 0 should use Key

Reply all
Reply to author
Forward
0 new messages