Exploring VP9 as a progressive still image codec

413 views
Skip to first unread message

Brion Vibber

unread,
Jun 14, 2016, 9:52:16 PM6/14/16
to webm-d...@webmproject.org
At Wikipedia we have long articles containing many images, some of which need a lot of detail and others which will be scrolled past or missed entirely. We're looking into lazy-loading, alternate formats such as WebP, and other ways to balance display density vs network speed.


I noticed that VP9 supports scaling of reference frames from different resolutions, so a frame that changes video resolution doesn't have to be a key frame.

This means that a VP9-based still image format (unlike VP8-based WebP) could encode multiple resolutions to be loaded and decoded progressively, at each step encoding only the differences from the previous resolution level.

So to load an image at 2x "Retina" display density, we'd load up a series of smaller, lower density frames, decoding and updating the display until reaching the full size (say, 640x360). If the user scrolls away before we reach 2x, we can simply stop loading -- and if they scroll back, we can pick up right where we left off.


I tried hacking up vpxenc to accept a stream of concatenated PNG images as input, and it seems plausible...

Demo page with a few sample images (not yet optimized for network load; requires Firefox or Chrome):

Compared to loading a series of intra-coded JPEG or WebP images, the total data payload to reach resolution X is significantly smaller. Compared against only loading the final resolution in WebP or JPEG, without any attempt at tuning I found my total payloads with VP9 to be about halfway between the two formats, and with tuning I can probably beat WebP.

Currently the demo loads the entire .webm file containing frames up to 4x resolution, seeking to the frame with the target density. Eventually I'll try repacking the frames into separately loadable files which can be fed into Media Source Extensions or decoded via JavaScript... That should prevent buffering of unused high resolution frames.


Some issues:

Changing resolutions always forces a keyframe unless doing single-pass encoding with frame lag set to 1. This is not super obvious, but is neatly enforced in encoder_set_config in vp9_cx_iface.h! Use --passes=1 --lag-in-frames=1 options to vpxenc.

Keyframes are also forced if width/height go above the "initial" width/height, so I had to start the encode with a stub frame of the largest size (solid color, so still compact). I'm a bit unclear on whether there's any other way to force the 'initial' frame size to be larger, or if I just have to encode one frame at the large size...

There's also a validity check on resized frames that forces a keyframe if the new frame is twice or more the size of the reference frame. I used smaller than 2x steps to work around this (tested with steps at 1/8, 1/6, 1/4, 1/2, 2/3, 1, 3/2, 2, 3, 4x of the base resolution).

I had to force updates of the golden & altref on every frame to make sure every frame ref'd against the previous, or the decoder would reject the output. --min-gf-interval=1 isn't enough; I hacked vpxenc to set the flags on the frame encode to VP8_EFLAG_FORCE_GF | VP8_EFLAG_FORCE_ARF.


I'm having trouble loading the VP9 webm files in Chrome on Android; I'm not sure if this is because I'm doing something too "extreme" for the decoder on my Nexus 5x or if something else is wrong... Is there any way I can do a validity check on my encoded frames?

-- brion vibber (brion @ pobox.com / brion @ wikimedia.org)

James Bankoski

unread,
Jun 16, 2016, 10:52:53 AM6/16/16
to webm-d...@webmproject.org
Hi Brion,   

Thanks for looking at all of these technologies!  

I ran some somewhat similar testsbut for a different reason.    My tests centered around when it is better quality wise to encode scalable keyframes rather than a single frame.    An image encoded at 1920x1080p but at a very high quantizer - often looks and is measurably worse on metrics like  ssim,  than a frame encoded at lower resolution and scaled up with a residual to catch any egregiously bad stuff. 

I'll try and respond to your issues in line: 
 

Currently the demo loads the entire .webm file containing frames up to 4x resolution, seeking to the frame with the target density. Eventually I'll try repacking the frames into separately loadable files which can be fed into Media Source Extensions or decoded via JavaScript... That should prevent buffering of unused high resolution frames.

Decoded via Javascript would be kind of a shame but certainly would give you a lot of control over things like container format and avoid any issues you might have with the way chrome / et al handle video ( eg autoplay on android) vs images.  That said it will be a lot slower and big images might take a long time to load.    
 


Some issues:

Changing resolutions always forces a keyframe unless doing single-pass encoding with frame lag set to 1. This is not super obvious, but is neatly enforced in encoder_set_config in vp9_cx_iface.h! Use --passes=1 --lag-in-frames=1 options to vpxenc.

When I played with it I had similar issues.     The way the codec handles lag and messages you send it is a bit weird.   

Could you file a bug with exactly what you tried here:   https://bugs.chromium.org/p/webm/issues/list

 
Keyframes are also forced if width/height go above the "initial" width/height, so I had to start the encode with a stub frame of the largest size (solid color, so still compact). I'm a bit unclear on whether there's any other way to force the 'initial' frame size to be larger, or if I just have to encode one frame at the large size...


See this bug: 

I actually used libvpx's internal scaling to do the down scale and thus avoided that problem.

I think this is different so please file another bug.   I'm guessing you did the scaling external to libvpx and that we don't provide a great way for you to pass in the maximum original size when doing it that way. 


There's also a validity check on resized frames that forces a keyframe if the new frame is twice or more the size of the reference frame. I used smaller than 2x steps to work around this (tested with steps at 1/8, 1/6, 1/4, 1/2, 2/3, 1, 3/2, 2, 3, 4x of the base resolution).

Yes because our internal scalers won't work in the case that you go outside our viable range. 
 

I had to force updates of the golden & altref on every frame to make sure every frame ref'd against the previous, or the decoder would reject the output. --min-gf-interval=1 isn't enough; I hacked vpxenc to set the flags on the frame encode to VP8_EFLAG_FORCE_GF | VP8_EFLAG_FORCE_ARF.

Yes on its own the codec will surely do the wrong thing.  

I'm having trouble loading the VP9 webm files in Chrome on Android; I'm not sure if this is because I'm doing something too "extreme" for the decoder on my Nexus 5x or if something else is wrong... Is there any way I can do a validity check on my encoded frames?

Currently shipped versions of chrome pass all video tags through to the android operating system ( in m52/m53 this changes). Unfortunately in android there are issues with changing framesizes on non keyframes which get in the way of this.  


Jim 



 

-- brion vibber (brion @ pobox.com / brion @ wikimedia.org)

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webm-discuss...@webmproject.org.
To post to this group, send email to webm-d...@webmproject.org.
Visit this group at https://groups.google.com/a/webmproject.org/group/webm-discuss/.
For more options, visit https://groups.google.com/a/webmproject.org/d/optout.

Brion Vibber

unread,
Jun 16, 2016, 1:01:25 PM6/16/16
to webm-d...@webmproject.org
On Thursday, June 16, 2016, 'James Bankoski' via WebM Discussion <webm-d...@webmproject.org> wrote:
Hi Brion,   

Thanks for looking at all of these technologies!  

I ran some somewhat similar testsbut for a different reason.    My tests centered around when it is better quality wise to encode scalable keyframes rather than a single frame.    An image encoded at 1920x1080p but at a very high quantizer - often looks and is measurably worse on metrics like  ssim,  than a frame encoded at lower resolution and scaled up with a residual to catch any egregiously bad stuff. 

That makes a lot of sense!
 

I'll try and respond to your issues in line: 
 

Currently the demo loads the entire .webm file containing frames up to 4x resolution, seeking to the frame with the target density. Eventually I'll try repacking the frames into separately loadable files which can be fed into Media Source Extensions or decoded via JavaScript... That should prevent buffering of unused high resolution frames.

Decoded via Javascript would be kind of a shame but certainly would give you a lot of control over things like container format and avoid any issues you might have with the way chrome / et al handle video ( eg autoplay on android) vs images.  That said it will be a lot slower and big images might take a long time to load.    

*nod* If I can use native video decoding up to the target frame that should perform best. That could be done by feeding parts of a WebM stream in via MSE and just stopping after the target frame.

I've gotten mediocre performance out of an emscripten JS port of libvpx on VP8, though for the thumbnail sizes I'm targeting it's not too bad. Haven't tested VP9 much yet in JS, but the YUV to RGB at least is easy to do on the GPU with WebGL. (Note the older Theora codec works wonderfully in JavaScript, though it's not nearly as good a compressor as the later VP8/VP9.)

Mostly I'm looking at thumbnail images under 320x240 base size, with 2x density versions up to 640x480. Maybe 3x and 4x for zooms and such, which brings us into single-frame HD territory.
 
 


Some issues:

Changing resolutions always forces a keyframe unless doing single-pass encoding with frame lag set to 1. This is not super obvious, but is neatly enforced in encoder_set_config in vp9_cx_iface.h! Use --passes=1 --lag-in-frames=1 options to vpxenc.

When I played with it I had similar issues.     The way the codec handles lag and messages you send it is a bit weird.   

Could you file a bug with exactly what you tried here:   https://bugs.chromium.org/p/webm/issues/list

Will do.

 

 
Keyframes are also forced if width/height go above the "initial" width/height, so I had to start the encode with a stub frame of the largest size (solid color, so still compact). I'm a bit unclear on whether there's any other way to force the 'initial' frame size to be larger, or if I just have to encode one frame at the large size...


See this bug: 

I actually used libvpx's internal scaling to do the down scale and thus avoided that problem.

I think this is different so please file another bug.   I'm guessing you did the scaling external to libvpx and that we don't provide a great way for you to pass in the maximum original size when doing it that way. 

Yes, I'm scaling the frames ahead of time with ImageMagick and sending them into vpxenc at their final size.

The internal scaling options confuse me a bit... Do I just set rc_resize_allowed, rc_scaled_width, and rc_scaled_height instead of g_w and g_h? Will test, see if that simplifies things. And will I get the scaled-down frames back out of the decoder, or will those get scaled up by the decoder to my max size? I actually want pixel-exact output at the smaller size if possible, rather than having to scale it back down again to match the display.




There's also a validity check on resized frames that forces a keyframe if the new frame is twice or more the size of the reference frame. I used smaller than 2x steps to work around this (tested with steps at 1/8, 1/6, 1/4, 1/2, 2/3, 1, 3/2, 2, 3, 4x of the base resolution).

Yes because our internal scalers won't work in the case that you go outside our viable range. 

Hmm... the bitstream spec draft PDF indicates that I should be able to do a 2x scale up in one step:

  1. 5.16 Reference frame scaling

    It is legal for different decoded frames to have different frame sizes (and aspect ratios). VP9 automatically handles resizing predictions from reference frames of different sizes.

    However, reference frames must share the same color depth and subsampling format for reference frame scaling to be allowed, and the amount of up/down scaling is limited to be no more than 16x larger and no less than 2x smaller (e.g. the new frame must not be more than 16 times wider or higher than any of its used reference frames). 


Aha, I think I read this check backwards:

static INLINE int valid_ref_frame_size(int ref_width, int ref_height,
int this_width, int this_height) {
return 2 * this_width >= ref_width &&
2 * this_height >= ref_height &&
this_width <= 16 * ref_width &&
this_height <= 16 * ref_height;
}

This means the earlier steps should be able to go in full 2x steps if desired, after all, but I still need to force the gf/altref updates to make sure I don't exceed 16x from first to last. Will test some more on this end as well.

(I do need at least one non-integral step from 1 to 1.5 to 2, since 150% display density is common on low end Android phones and on medium end Windows laptops and tablets. The lower steps are mostly for quick visual feedback during loading on slow network.)

 
 

I had to force updates of the golden & altref on every frame to make sure every frame ref'd against the previous, or the decoder would reject the output. --min-gf-interval=1 isn't enough; I hacked vpxenc to set the flags on the frame encode to VP8_EFLAG_FORCE_GF | VP8_EFLAG_FORCE_ARF.

Yes on its own the codec will surely do the wrong thing.  

:) on closer look I may have needed --max-gf-interval as well as min, will test that too.
 

I'm having trouble loading the VP9 webm files in Chrome on Android; I'm not sure if this is because I'm doing something too "extreme" for the decoder on my Nexus 5x or if something else is wrong... Is there any way I can do a validity check on my encoded frames?

Currently shipped versions of chrome pass all video tags through to the android operating system ( in m52/m53 this changes). Unfortunately in android there are issues with changing framesizes on non keyframes which get in the way of this.  

That's a shame, as mobile is a key target audience for this sort of cleverness, and any hw acceleration will be a bonus. Hope that gets fixed up!

Thanks for the feedback!

-- brion 
Reply all
Reply to author
Forward
0 new messages