Intrinsic dimensions of video; container or VP8 frame size?

1,171 views
Skip to first unread message

Chris Pearce

unread,
Jan 23, 2011, 7:32:47 PM1/23/11
to WebM Discussion
This is in regards to Firefox bug 626979:
https://bugzilla.mozilla.org/show_bug.cgi?id=626979

There's a disparity in the WebM spec, between what the dimensions the WebM container says it contains, and the actual VP8 frame sizes contained in the video track.

From the WebM container spec http://www.webmproject.org/code/specs/container/#track
we can see that the Track element has children:

DisplayWidth: Width of the video frames to display.
DisplayHeight: Height of the video frames to display.

PixelWidth: Width of the encoded video frames in pixels.
PixelHeight: Height of the encoded video frames in pixels.

PixelCropBottom: The number of video pixels to remove at the bottom of the image (for HDTV content).
PixelCropTop: The number of video pixels to remove at the top of the image.
PixelCropLeft: The number of video pixels to remove on the left of the image.
PixelCropRight: The number of video pixels to remove on the right of the image.

I interpret this to mean that the frames you'll get out of that video track will be PixelWidth x PixelHeight, and you should crop them by PixelCrop* pixels, and then scale and the cropped image to DisplayWidth x DisplayHeight and display that.

But the vpx_image_t we get when we decode a VP8 stream can in fact have a size different to PixelWidth x PixelHeight. In particular this could happen in adaptive streaming applications (see the testcase in the Firefox bug linked above for an example of this).

When the image size we get out of a VP8 stream doesn't match Pixel*, what do we do? In particular:
  1. How do we handle PixelCrop*? Should we treat PixelCrop* as  a fraction relative to Pixel{Width,Height}, e.g. should we crop PixelCropLeft/PixelWidth*ActualFrameWidth pixels from the left of the image?
  2. Should the intrinsic (display) size of the video (and thus the HTML video element)  remain as DisplayWidth x DisplayHeight when the VP8 stream changes size?
We were thinking the above were reasonable, and were going to run with that in Firefox 4, but I'd like to clarify that with you guys first.

Can the wording of the spec be tightened up to make it clear that the VP8 stream can change dimensions on the fly, and to make it clear how to handle this situation?


Thanks,
Chris Pearce.

Steve Lhomme

unread,
Jan 24, 2011, 7:29:50 AM1/24/11
to webm-d...@webmproject.org
On Mon, Jan 24, 2011 at 1:32 AM, Chris Pearce <ch...@pearce.org.nz> wrote:
> This is in regards to Firefox bug 626979:
> https://bugzilla.mozilla.org/show_bug.cgi?id=626979
>
> There's a disparity in the WebM spec, between what the dimensions the WebM
> container says it contains, and the actual VP8 frame sizes contained in the
> video track.
>
> From the WebM container spec
> http://www.webmproject.org/code/specs/container/#track
> we can see that the Track element has children:
>
> DisplayWidth: Width of the video frames to display.
> DisplayHeight: Height of the video frames to display.
>
> PixelWidth: Width of the encoded video frames in pixels.
> PixelHeight: Height of the encoded video frames in pixels.
>
> PixelCropBottom: The number of video pixels to remove at the bottom of the
> image (for HDTV content).
> PixelCropTop: The number of video pixels to remove at the top of the image.
> PixelCropLeft: The number of video pixels to remove on the left of the
> image.
> PixelCropRight: The number of video pixels to remove on the right of the
> image.
>
> I interpret this to mean that the frames you'll get out of that video track
> will be PixelWidth x PixelHeight, and you should crop them by PixelCrop*
> pixels, and then scale and the cropped image to DisplayWidth x DisplayHeight
> and display that.

That's correct.

> But the vpx_image_t we get when we decode a VP8 stream can in fact have a
> size different to PixelWidth x PixelHeight. In particular this could happen
> in adaptive streaming applications (see the testcase in the Firefox bug
> linked above for an example of this).

For that to happen, the stream should have a new Segment and that
should not be supported either. Such resolution switch should be
handle by the browser to handle the new layout.

Now for adaptive streaming the display resolution should be the max
possible for that stream. But the switch should not happen in the same
stream/file. Maybe VP8 has a feature to handle this during encoding,
but then it should be transparent on the output of the decoder.

> How do we handle PixelCrop*? Should we treat PixelCrop* as  a fraction
> relative to Pixel{Width,Height}, e.g. should we crop
> PixelCropLeft/PixelWidth*ActualFrameWidth pixels from the left of the image?
> Should the intrinsic (display) size of the video (and thus the HTML video
> element)  remain as DisplayWidth x DisplayHeight when the VP8 stream changes
> size?

PixelCrop* as the name suggests is in pixel, not percentage.
DisplayWidth/DisplayHeight can be in other things than pixels in
Matroska, but I think in WebM it's always in pixels.

> We were thinking the above were reasonable, and were going to run with that
> in Firefox 4, but I'd like to clarify that with you guys first.
>
> Can the wording of the spec be tightened up to make it clear that the VP8
> stream can change dimensions on the fly, and to make it clear how to handle
> this situation?

IMO it shouldn't or should be transparent.

--
Steve Lhomme
Matroska association Chairman

Hiellwenn Fullmoon

unread,
Jan 24, 2011, 9:04:11 AM1/24/11
to webm-d...@webmproject.org
2011/1/24 Steve Lhomme <slh...@matroska.org>

 > > But the vpx_image_t we get when we decode a VP8 stream can in fact have a
 > > size different to PixelWidth x PixelHeight. In particular this could happen
 > > in adaptive streaming applications (see the testcase in the Firefox bug
 > > linked above for an example of this).
 >
 > For that to happen, the stream should have a new Segment and that
 > should not be supported either. Such resolution switch should be
 > handle by the browser to handle the new layout.

I write live streaming software and when I try to open a new segment, all tested players stop reading the stream.
Instead, changing the picture's size on I-Frames is pretty well supported.
 
 > Now for adaptive streaming the display resolution should be the max
 > possible for that stream. But the switch should not happen in the same
 > stream/file. Maybe VP8 has a feature to handle this during encoding,
 > but then it should be transparent on the output of the decoder.

Resizing at the output of the decoder (or find an equivalent way) seems to be a good fix while we do not have to deal with aspect ratio change.
In live streaming case, some television channels change the aspect ratio pretty often. It will be good to have a way to deal with that.
 

John Koleszar

unread,
Jan 24, 2011, 10:19:06 AM1/24/11
to webm-d...@webmproject.org

The intent of this resizing is to handle the codec's spatial
resampling feature, where the encoder will encode the frame at a
smaller resolution and the decoder will upscale it to the desired
dimension. It wasn't intended to be used to for making a change
visible to the user.

The libvpx decoder doesn't do the upscaling automatically, because the
application could have access to a hardware scaler or the stream could
be subject to a second scaling, and it'd be preferable to only do it
once.

So the operation you should perform should be equivalent to: scale up
to PixelWxH, crop according to PixelCrop, scale to DisplayWxH.

Steve Lhomme

unread,
Jan 24, 2011, 2:09:32 PM1/24/11
to webm-discuss

This may play well in video players that can already handle (digital)
TV signals. But that may not be suitable for a web browser. Even
though it can handle dynamic content as text and graphics. In any
dynamic change in a web page, the browser handles the event. So at the
very least there should be an event telling the browser about a
pending change. I don't know if that exists. But at the
container/stream level that would mean sending a new Segment with new
Track information (where the frame dimensions are set).

Again, this kind of feature works in "advanced" video players that
support the feature in Matroska. But web browsers are unlikely to
handle this right now.

Philip Jägenstedt

unread,
Jan 25, 2011, 4:28:39 AM1/25/11
to webm-d...@webmproject.org, John Koleszar
On Mon, 24 Jan 2011 16:19:06 +0100, John Koleszar <jkol...@google.com>
wrote:

How about a live stream where the aspect ratio changes between 4:3 and
16:9? I imagine that digital TV actually works like that. Is the correct
way to stream that as WebM to use chained WebM? Does anyone support
chained WebM yet?

--
Philip Jägenstedt
Core Developer
Opera Software

Steve Lhomme

unread,
Jan 25, 2011, 4:47:52 AM1/25/11
to webm-d...@webmproject.org
My generic answer to that would be to use adaptive streaming. So the
stream is downloaded in chunks anyway.

Given that will have to happen one way or another, the browser will
have to be able to decode and append chunks together. One can consider
chained streams into a single stream as a special case of this.

Now changing resolution (and thus a codec/display reset) is not a
trivial thing, certainly not a smooth operation. But again, it may
happen in adaptive streams as well. For example when a lower bitrate
of a the stream is used and so has smaller video resolution. In that
case the codec has to be reset, but not the video/display area.

> --
> You received this message because you are subscribed to the Google Groups
> "WebM Discussion" group.
> To post to this group, send email to webm-d...@webmproject.org.
> To unsubscribe from this group, send email to
> webm-discuss...@webmproject.org.
> For more options, visit this group at
> http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.

Frank Galligan

unread,
Jan 25, 2011, 9:47:15 AM1/25/11
to webm-d...@webmproject.org, John Koleszar
So I'm assuming the Display size will want to change when the source aspect ratio changed. I.E. Video data is being streamed 16:9 at 1920x1080 then the server wants to change to a commercial at 4:3 to be shown at 1440x1080.

If we wanted the smarts to be on the server side then this could be done with chained WebM. Another approach for live is to encode the video always at display size and move the source video around within the encoded video. I.E. from the example above when the server wants to switch to 4:3 it would add black bars at (0,0)-(239,1079) and (1680,0)-(1919,1079) and the source video at (240,0)-(1679,1079). This case could also be handled with client side live adaptive streaming. I know on the foms list we were mainly talking about VOD because that is the easier of the two. I think in the end we would want this handled by client side live adaptive streaming, but that could take a long time to come to agreement on the layout of the streams, the manifest, and the video tag api to control the streams.

So I guess the first question is does everyone think there is a big enough need to switch display size in the middle of a live stream that we can't wait for a client side live adaptive streaming solution to be defined and implemented? If yes then how would people want to handle it? Chained WebM or move the source video within the encoded video? Or another solution?

As a side note we are still actively working on client side VOD adaptive streaming. I think in a few weeks we will have something to share with everyone to get comments on.

Frank


Frank Galligan

unread,
Jan 25, 2011, 9:54:07 AM1/25/11
to webm-d...@webmproject.org
On Tue, Jan 25, 2011 at 4:47 AM, Steve Lhomme <slh...@matroska.org> wrote:
My generic answer to that would be to use adaptive streaming. So the
stream is downloaded in chunks anyway.
I agree.
 

Given that will have to happen one way or another, the browser will
have to be able to decode and append chunks together. One can consider
chained streams into a single stream as a special case of this.

Now changing resolution (and thus a codec/display reset) is not a
trivial thing, certainly not a smooth operation. But again, it may
happen in adaptive streams as well. For example when a lower bitrate
of a the stream is used and so has smaller video resolution. In that
case the codec has to be reset, but not the video/display area.
From the codec side wrt to VP8 it is not a big deal, assuming the switch happens on a key frame. VP8 does not have any setup data for a stream so switching encoded resolutions is not a big deal. We do have the issue with Vorbis.

Frank Galligan

unread,
Jan 25, 2011, 10:04:09 AM1/25/11
to webm-d...@webmproject.org, John Koleszar
Another possible solution to the resolution change could be to set AspectRatioType to 1 (keep aspect ratio)  and then <video> would scale the decoded video up to display WxH while maintaining source aspect ratio and centering the decoded video within display WxH.

Frank

Sylvain Gadrat

unread,
Jan 25, 2011, 11:35:33 AM1/25/11
to webm-d...@webmproject.org


2011/1/25 Frank Galligan <fgal...@google.com>

> Another approach for live is to encode the video always at display size
> and move the source video around within the encoded video.

It seems to be a good server-side workaround. But if the server can handle the change, the client should handle it the same way... without sending black bars over the network.

> So I guess the first question is does everyone think there is a big enough
> need to switch display size in the middle of a live stream that we can't
> wait for a client side live adaptive streaming solution to be defined and implemented?

I don't know how it is in international television channels, but in france (where I live), the aspect ratio of the video change very often. Adds are in 16:9 while most of the content is in 4:3.
So, to stream television in WebM, we evenly have to switch the display size .

> If yes then how would people want to handle it? Chained WebM or move the
> source video within the encoded video? Or another solution?

I love the solution where the client just have to download a file and the server have to handle the adaptive bitrate part of things.
First, I had good results with adaptive bitrate based on encoding different quality of each cluster, then choosing clusters to send based on the client's bandwidth.
Second, I think this model induce a lot less delay than any client side model.
Third, VP8 I-Frames containing informations about the size and the fact that VP8 is not base on many profiles make possible to change the quality of a stream without sending a lot of headers.

> Another possible solution to the resolution change could be to set
> AspectRatioType to 1 (keep aspect ratio) and then <video> would
> scale the decoded video up to display WxH while maintaining source
> aspect ratio and centering the decoded video within display WxH.

I love that.
It make possible to change the resolution (without changing the aspect ratio) to change the quality of the stream. When the video is stretched to display WxH, the final user only see the changing in the quality of the display.
And it make also possible to change the resolution because of changing the aspect ratio. The new aspect ratio will be contained in the video element without distortion.

Just a little thing : what about the PixelCrop* when aspect ratio change ?

Frank Galligan

unread,
Jan 25, 2011, 1:04:43 PM1/25/11
to webm-d...@webmproject.org
On Tue, Jan 25, 2011 at 11:35 AM, Sylvain Gadrat <sga...@gmail.com> wrote:


2011/1/25 Frank Galligan <fgal...@google.com>

> Another possible solution to the resolution change could be to set
> AspectRatioType to 1 (keep aspect ratio) and then <video> would
> scale the decoded video up to display WxH while maintaining source
> aspect ratio and centering the decoded video within display WxH.

I love that.
It make possible to change the resolution (without changing the aspect ratio) to change the quality of the stream. When the video is stretched to display WxH, the final user only see the changing in the quality of the display.
And it make also possible to change the resolution because of changing the aspect ratio. The new aspect ratio will be contained in the video element without distortion.

Just a little thing : what about the PixelCrop* when aspect ratio change ?
Unfortunately if you had a positive PixelCrop* value on one aspect ratio and you wanted to change it on another aspect ratio you would not be able too with the above solution. So PixelCrop* would be applied to all video frames within the Segment. I haven't seen any clips with a PixelCrop* set. I think most people will crop the source before they encode. We could say as a general rule if you are changing sizes within the same Segment then you should probably not be setting PixelCrop*.

If there was a requirement that PixelCrop* MUST be able to change on an aspect ratio change then we would need to either implement chained segments or client-side live adaptive streaming.

Steve Lhomme

unread,
Jan 25, 2011, 2:26:23 PM1/25/11
to webm-discuss
On Tue, Jan 25, 2011 at 5:35 PM, Sylvain Gadrat <sga...@gmail.com> wrote:
>
>
> 2011/1/25 Frank Galligan <fgal...@google.com>
>> Another approach for live is to encode the video always at display size
>> and move the source video around within the encoded video.
> It seems to be a good server-side workaround. But if the server can handle
> the change, the client should handle it the same way... without sending
> black bars over the network.
>> So I guess the first question is does everyone think there is a big enough
>> need to switch display size in the middle of a live stream that we can't
>> wait for a client side live adaptive streaming solution to be defined and
>> implemented?
> I don't know how it is in international television channels, but in france
> (where I live), the aspect ratio of the video change very often. Adds are in
> 16:9 while most of the content is in 4:3.
> So, to stream television in WebM, we evenly have to switch the display size

But your TV screen doesn't change its resolution in the mean time.
This is the same with a <video> window. It should use the video area
specified initially in the stream. If different segments are sent it
could be the responsibility of the underlying system to stretch (and
add black bars if needed) when the resolution changes.

Philip Jägenstedt

unread,
Jan 26, 2011, 4:10:46 AM1/26/11
to webm-discuss, Steve Lhomme
On Tue, 25 Jan 2011 20:26:23 +0100, Steve Lhomme <slh...@matroska.org>
wrote:

This analogy only goes so far. If width/height attributes aren't given on
the <video> elements, it is made the same size as the intrinsic
width/height of the video, that is: the width/height after aspect-ratio
correction and cropping has been applied. The intrinsic width is also
exposed in the videoWidth/videoHeight properties to scripts.

What this means that if the intrinsic size of the video changes from say
720×576 (4:3) to 1024×576 (16:9, but the physical size could be something
else) then videoWidth/videoHeight must reflect this. It would be
inconsistent if the layout size of <video> didn't change to reflect that,
and harder to implement.

So, it may be that frame size changes within the same stream (segment? my
terminology is fuzzy), but for chained WebM I think that it must be
reflected in the layout and videoWidth/videoHeight. If one doesn't want
the layout to change, then one must specify video/height attributes on
<video>. This should already the case for chained Ogg, I believe in at
least Opera and Firefox, but I haven't double-checked.

Note that there's no resize event on the <video> element to learn that
such a change has happened, so there's still work to do here...

Matthew Gregan

unread,
Jan 26, 2011, 4:26:07 AM1/26/11
to webm-discuss
At 2011-01-26T10:10:46+0100, Philip J�genstedt wrote:
> So, it may be that frame size changes within the same stream
> (segment? my terminology is fuzzy), but for chained WebM I think
> that it must be reflected in the layout and videoWidth/videoHeight.
> If one doesn't want the layout to change, then one must specify
> video/height attributes on <video>. This should already the case for
> chained Ogg, I believe in at least Opera and Firefox, but I haven't
> double-checked.

Firefox doesn't support chained Ogg yet, but our intention is to handle
resolution changes in the way you've described.

Cheers,
-mjg
--
Matthew Gregan |/
/| kin...@flim.org

Philip Jägenstedt

unread,
Jan 26, 2011, 7:48:24 AM1/26/11
to Steve Lhomme, webm-discuss
On Wed, 26 Jan 2011 13:20:45 +0100, Steve Lhomme <slh...@matroska.org>
wrote:

> In the case of adaptive streaming the whole point is to make the
> quality/bandwidth changes transparent to the user for a smooth user
> experience. So even if the underlying system has changes it should be
> transparent. But in that case the src="" in the <video> should be the
> manifest file and not a .webm file directly. So I guess the actual
> display size should be contained in it (not sure this is already the
> case in the public proposals).


>
>> What this means that if the intrinsic size of the video changes from say
>> 720×576 (4:3) to 1024×576 (16:9, but the physical size could be
>> something
>> else) then videoWidth/videoHeight must reflect this. It would be
>> inconsistent if the layout size of <video> didn't change to reflect
>> that,
>> and harder to implement.
>

> I agree. Although it could look strange to the user, after all if
> someone is sending such a stream, they should know about the drawback.
> Or they could use a simple manifest to force the display width/height
> no matter what the underlying video sends, or simply force the
> dimensions in the <video> tag. Which leads to another possible issue
> with <video> how is it supposed to handle aspect ratio ? Is it
> possible to have different modes like stretch, fit to area (with black
> bars) or fill the area (no back bar but the aspect ratio is kept) ?

Since the intrinsic size is the size *after* adjusting for aspect ratio
and cropping, if you force it to be WxY, then the video will simply be
shown at WxY, so this only works for switching resolution, if you should
switch aspect ratio then the video would be scaled to the wrong size.

The default rendering of <video> is to scale the video (up or down) to the
size of the content box while preserving aspect ratio. It'll also be
centered. This can be changed using the object-fit and object-position to
get the "stretch to content box", etc, but only Opera implements these CSS
properties so far, as -o-object-fit and -o-object-position.

>> So, it may be that frame size changes within the same stream (segment?
>> my
>> terminology is fuzzy), but for chained WebM I think that it must be
>> reflected in the layout and videoWidth/videoHeight. If one doesn't want
>> the
>> layout to change, then one must specify video/height attributes on
>> <video>.
>> This should already the case for chained Ogg, I believe in at least
>> Opera
>> and Firefox, but I haven't double-checked.
>>
>> Note that there's no resize event on the <video> element to learn that
>> such
>> a change has happened, so there's still work to do here...
>

> If the zooming/stretching is possible in javascript that might be
> enough to handle the aspect ratio modes described above, and also
> allow more fancy effects.

Sure, scripts could set a fixed width/height and use object-fit: fill to
force the aspect ratio to something.

My preferred solution, though, is to simply set the width/height
properties if you don't want <video> to ever resize. If you don't know the
size in advance and know that the initial size will be OK, just do:

<video src=anything></video>
<script>
var v = document.querySelector('video');
v.onloadedmetadata = function() {
v.width = v.videoWidth;
v.height = v.videoHeight;
}
</script>

(Not tested, but should work.)

Reply all
Reply to author
Forward
0 new messages