Encode 8k VP9 for fast decoding

1,364 views
Skip to first unread message

mirosla...@gmail.com

unread,
Sep 8, 2014, 5:18:37 AM9/8/14
to webm-d...@webmproject.org
Hi,

I'm testing 8k (8192 x 4096 @ 30 fps) playback using VP9 but the performance is not that great. I wondering what I can do in a encoding stage to optimize for faster playback. Optimized compression is not a goal and I want to be able to play highest possible quality without dropping frames.

The usage will be to play 4k x 4k fisheye stereoscopic movies in domes/planetariums. I'm packing the stereoscopic pairs side by side so therefore the 8k x 4k resolution. I'm using OpenGL for displaying the video and the bottleneck at the moment is to decode the frames. I'm using FFmpeg with libvpx. I have tried to encode the video with VP8 which decodes much faster but the quality is lower and some frames fails to encode and results in errors in the video.

What parameters can I use to maximize the decoding speed? 

At the moment I'm using the following parameters using libavcodec/FFmpeg:

cContext->b_frame_strategy = 1;
cContext->max_b_frames = 3;
cContext->gop_size = 30; // intra frame interval
av_opt_set(cContext->priv_data, "quality", "realtime", AV_OPT_SEARCH_CHILDREN); //can be good, best or realtime
av_opt_set(cContext->priv_data, "passes", "2", AV_OPT_SEARCH_CHILDREN);

I have tested with different quantifier settings that affects the quality but not so much the decoding speed. For H264 there is a tune setting called "fastdecode" is there something equivalent in libvpx that I can set through FFmpeg? 

Best regards,

Miroslav Andel

James Zern

unread,
Sep 8, 2014, 3:10:10 PM9/8/14
to WebM Discussion
You can set '--frame-parallel 1' and use the ffvp9 decoder with
threads to improve the decode performance, setting --tile-columns with
frame-parallel is another way to get some parallelization in the
decode in libvpx.

Miroslav Andel

unread,
Sep 21, 2014, 10:01:49 AM9/21/14
to webm-d...@webmproject.org
Thanks, but it didn't help much.

Changing the parameters didn't affect the decoding performance noticeably. The decoder doesn't seem to be that CPU intensive when using a lot of cores. The CPU is mostly idle while decoding and the usage is around 12-15% when using 12-24 logical cores. I'm reading the files from raided SSD drivers so the bottleneck seems to be in RAM memory allocation or some kind of inefficient thread synchronisation.

Is there a good profiling tool which I can use in windows with visual studio?

Best regards,

Miroslav

James Zern

unread,
Sep 22, 2014, 6:20:37 PM9/22/14
to WebM Discussion
On Sun, Sep 21, 2014 at 7:01 AM, Miroslav Andel
<mirosla...@gmail.com> wrote:
> Thanks, but it didn't help much.
>
> Changing the parameters didn't affect the decoding performance noticeably.
> The decoder doesn't seem to be that CPU intensive when using a lot of cores.
> The CPU is mostly idle while decoding and the usage is around 12-15% when
> using 12-24 logical cores. I'm reading the files from raided SSD drivers so
> the bottleneck seems to be in RAM memory allocation or some kind of
> inefficient thread synchronisation.
>

Possibly. Could you include your full settings and an equivalent
ffmpeg command line you're using for posterity?

> Is there a good profiling tool which I can use in windows with visual
> studio?
>

I haven't used any. vtune I imagine is a (costly) option.

Miroslav Andel

unread,
Sep 30, 2014, 7:34:17 AM9/30/14
to webm-d...@webmproject.org
Hi, I'm coding my own encoder and decoder using libavcodec and OpenGL. For encoding I'm using the following where:

mBitrate is somewhere between 100-400
mWidth = 8192
mHeight = 4096
The time base is set to 30 fps.

avcodec_get_context_defaults3(cContext, codec);

int numberOfThreads = std::thread::hardware_concurrency();
if (numberOfThreads < 2)
    numberOfThreads = 2;

cContext->bit_rate = (mBitrate * 1024 * 1024);
cContext->width = mWidth;
cContext->height = mHeight;
cContext->thread_count = numberOfThreads;
cContext->time_base.num = mFrameRate[NUM];
cContext->time_base.den = mFrameRate[DEN];

cContext->pix_fmt = PIX_FMT_YUV420P;

//use square pixels
cContext->sample_aspect_ratio.num = 1;
cContext->sample_aspect_ratio.den = 1;

// some formats want stream headers to be separate
if(oContext->oformat->flags & AVFMT_GLOBALHEADER)
   cContext->flags |= CODEC_FLAG_GLOBAL_HEADER;

cContext->b_frame_strategy = 1;
cContext->max_b_frames = 3;
cContext->gop_size = 30; // intra frame interval
cContext->qmin = 0;
cContext->qmax = 10;
cContext->i_quant_factor = 0.769f;
cContext->b_quant_factor = 1.4f;
cContext->max_qdiff = 4;

av_opt_set(cContext->priv_data, "realtime", "1", AV_OPT_SEARCH_CHILDREN);
av_opt_set(cContext->priv_data, "cpu-used", "8", AV_OPT_SEARCH_CHILDREN);
av_opt_set(cContext->priv_data, "frame-parallel", "1", AV_OPT_SEARCH_CHILDREN);//Multi-threaded decoding
av_opt_set(cContext->priv_data, "tile-columns", "6", AV_OPT_SEARCH_CHILDREN);




--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webm-discuss...@webmproject.org.
To post to this group, send email to webm-d...@webmproject.org.
Visit this group at http://groups.google.com/a/webmproject.org/group/webm-discuss/.
For more options, visit https://groups.google.com/a/webmproject.org/d/optout.

James Zern

unread,
Oct 1, 2014, 2:40:12 AM10/1/14
to WebM Discussion
Hi,

On Tue, Sep 30, 2014 at 4:34 AM, Miroslav Andel
<mirosla...@gmail.com> wrote:
> Hi, I'm coding my own encoder and decoder using libavcodec and OpenGL. For
> encoding I'm using the following where:
>
> [...]
>
> cContext->b_frame_strategy = 1;
> cContext->max_b_frames = 3;

vp9 doesn't support b-frames.

> cContext->gop_size = 30; // intra frame interval

These keyframes will be large and slow to code. If you make this
larger you should get some gain. Note for raw frames of this size
disk/memory bandwidth will likely come into play too.

> [...]
> cContext->i_quant_factor = 0.769f;
> cContext->b_quant_factor = 1.4f;
> cContext->max_qdiff = 4;
>
These 3 are meaningless for vp9.

> av_opt_set(cContext->priv_data, "realtime", "1", AV_OPT_SEARCH_CHILDREN);
> av_opt_set(cContext->priv_data, "cpu-used", "8", AV_OPT_SEARCH_CHILDREN);

12 should be the fastest, but for this size I'm not seeing much difference.

Miroslav Andel

unread,
Oct 1, 2014, 7:17:02 AM10/1/14
to webm-d...@webmproject.org
Ok, thanks. In my case I want to optimize the video for as fast decoding as possible using hi-performance a hi-performance workstation with raided SSDs. The level of compression is not that important as long as the file size is smaller than the raw files. I ran a test using these settings where I increased the GOP size.

cContext->gop_size = 300; // intra frame interval
cContext->qmin = 0;
cContext->qmax = 10;

av_opt_set(cContext->priv_data, "realtime", "1", AV_OPT_SEARCH_CHILDREN);
av_opt_set(cContext->priv_data, "cpu-used", "12", AV_OPT_SEARCH_CHILDREN);
av_opt_set(cContext->priv_data, "frame-parallel", "1", AV_OPT_SEARCH_CHILDREN);//Multi-threaded decoding
av_opt_set(cContext->priv_data, "tile-columns", "6", AV_OPT_SEARCH_CHILDREN);

The time to decode the first 30 frames in milliseconds was:

71.3638
160.699
152.207
141.001
134.473
133.347
132.348
127.572
121.072
106.285
17.8069
148.184
135.935
132.837
129.561
130.323
127.71
122.659
116.208
100.039
19.7444
144.588
119.845
123.94
118.536
125.994
120.018
114.818
105.365
16.4283

Running 12 logical cores. The overall CPU usage 15-25% where two of the cores wasn't hardly used at all. Libavcodec/FFmpeg reports that its using 13 threads for the decode. In order to play a move at 30 fps the decoder needs to be a lot faster than that (less than 33 ms per frame). 

Using the VP8 instead decodes faster ~20 ms per frame on average but the problem is that the decoded image looks corrupted (pink and blocky noise). I just want to find a way that to make the playback work...

Cheers,

Miroslav 


James Zern

unread,
Oct 1, 2014, 1:47:46 PM10/1/14
to WebM Discussion
You could try -c:v libvpx to see if the behavior is similar, this
could either be an encoder or decoder bug given this resolution
doesn't see much use, especially with vp8.

Miroslav Andel

unread,
Oct 3, 2014, 3:44:54 AM10/3/14
to webm-d...@webmproject.org
Isn't that the same as in code set the codec id to AV_CODEC_ID_VP8? Like: codec = avcodec_find_encoder(AV_CODEC_ID_VP8);

I get less errors in the decoded video if I encode with very small GOP, but then it doesn't make any sense when the GOP is set to 1.

James Zern

unread,
Oct 3, 2014, 2:19:29 PM10/3/14
to WebM Discussion
On Fri, Oct 3, 2014 at 12:44 AM, Miroslav Andel
<mirosla...@gmail.com> wrote:
> Isn't that the same as in code set the codec id to AV_CODEC_ID_VP8? Like:
> codec = avcodec_find_encoder(AV_CODEC_ID_VP8);
>

No you'd want to use avcodec_find_decoder_by_name() unless you've
explicitly disabled ffvp8, as that will be preferred.
Reply all
Reply to author
Forward
0 new messages