Temporal / Spatial scalability decode

Skip to first unread message


Oct 31, 2013, 10:54:11 AM10/31/13
to apps-...@webmproject.org
I'm trying to understand how the temporal / spatial encodings are decoded.  My understanding of H.264 SVC mode is that sub-streams combine to create the ideal stream.

i.e. (temporal scalability example)
Stream 1 - Frame 0, 6, 12, 18, 24
Stream 2 - Frame 2, 4, 8, 10, 14, 16, 20, 22, 26, 28
Stream 3 - Frame 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29

So, you can receive Stream 1 at 5fps
If you want 15fps you receive Stream 1 and Stream 2
If you want 30fps you receive Stream 1, Stream 2 and Stream 3

Is this how the VP8 temporal encoder works or instead does each stream produce all the data for that quality level?

Stream 1 - Frame 0, 6, 12, 18, 24
Stream 2 - Frame 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28
Stream 3 - Frame 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29

I guess my question is, from a decoder perspective for a temporal stream do I pass in all streams data to decode 30fps or just the best stream (Stream 3).  If the former, how do you indicate to the decoder what stream the current packet belongs to or is that data built into the substream itself?

Hopefully this makes sense.  Thank you in advance,

Ehren Jarosek

Oct 31, 2013, 2:00:14 PM10/31/13
to apps-...@webmproject.org
Responding to myself as I figured out the answer partially:

What I missed in the vp8_scalable_patterns example is that the code was actually filtering out frames manually based on frames belonging to each stream.  So, to utilize the temporal scalability of vp8 you configure your sub-stream rates and then pick apart the encoder output to selectively send the appropriate data based on frame rate denominations.  i.e. the encoder produces a single stream but that stream can be split apart into lower framerates without video sync loss.

However, my question still stands for spatial scalability.  Does the VP8 spatial work as a base layer + enhancement layers where all layers need to be fed into the decoder or are all layers independent of each other?

Adrian Grange

Oct 31, 2013, 2:04:13 PM10/31/13
to apps-...@webmproject.org
Hi  Ehren,

VP8 works as per the first scenario you mention, i.e. (in terms of your example):

-- base-layer that consists of the 5fps stream,
-- base + 1st enhancement layer at 10fps,
-- base + 2 enhancement layers at the full 30fps.

The idea is that the server can decide which of the 3 options to provide and potentially switch between them at keyframe boundaries.

You can take a look at our example temporal scalability application, vp8_scalable_patterns.c, in our codebase.

Hope this helps.

You received this message because you are subscribed to the Google Groups "Application Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to apps-devel+...@webmproject.org.
To post to this group, send email to apps-...@webmproject.org.
Visit this group at http://groups.google.com/a/webmproject.org/group/apps-devel/.
For more options, visit https://groups.google.com/a/webmproject.org/groups/opt_out.

Ehren Jarosek

Oct 31, 2013, 3:44:04 PM10/31/13
to apps-...@webmproject.org

Thank you for the reply.  How about Spatial Scalability?

Layer 1: 320x240
Layer 2: 640x480
Layer 3: 1280x960

Do you need L1 and L2 to decode L3 or are each separate?  If all are needed is there a special way they need to be fed into the decoder?


Yunqing Wang

Oct 31, 2013, 3:58:45 PM10/31/13
to apps-...@webmproject.org
Hi Ehren,

In VP8, we have multiple resolution encoding (simulcasting), which generates multiple streams corresponding to different resolutions. Each stream can be decoded independently. This is not spatial scalability. Please look at vp8_multi_resolution_encoder.c for more details.


Ehren Jarosek

Oct 31, 2013, 4:14:39 PM10/31/13
to apps-...@webmproject.org
That explains it.  Thank you both for the quick answers.

Reply all
Reply to author
0 new messages