Faster intra4 / intra16 decision for -m 0 and -m 1 methods

22 views
Skip to first unread message

Pascal Massimino

unread,
Jul 15, 2016, 2:50:24 PM7/15/16
to WebP Discussion, pablo.e...@gmail.com
(forking to a separate thread)

Pablo,

On Wed, Jun 29, 2016 at 7:28 AM, Pascal Massimino <pascal.m...@gmail.com> wrote:
Hi Pablo,

On Mon, Jun 27, 2016 at 7:44 PM, Pablo Enfedaque <pablo.e...@gmail.com> wrote:

Hi there,


In the function VP8Decimate() there is an if/else that separates two cases:


1. if (rd_opt > RD_OPT_NONE):

For a given macroblock, the rate and distortion of different prediction modes are measured + reconstruction of that MB.

2. else:

Only distortion measurements are performed + reconstruction of that MB.


The comment before the “else” says that:


// At this point we have heuristically decided intra16 / intra4.

// For method >= 2, pick the best intra4/intra16 based on SSE (~tad slower).

// For method <= 1, we don't re-examine the decision but just go ahead with

// quantization/reconstruction.


There is necessarily a previous decision regarding the prediction mode in that MB? Could not be the first time that it is evaluated??


Actually, the mode is 'decided' to be forced to intra16 during the analysis pass: we call DoSegmentJob() in analysis.c, which in turn calls MBAnalyze(), which calls MBAnalyzeBestIntra16Mode() first. Then, for method >= 5, we refine the decision with MBAnalyzeBestIntra4Mode().

Actually, i went ahead an implemented a better (as in: better than nothing!) heuristic for method 0 and 1. This is patch #359635[*]. Not only is the compression better, but also faster now, using this heuristic.
Thanks for bringing this to the attention!

skal/


A simple example:

BEFORE:
./examples/cwebp bryce.webp -v -short -m 0 -print_ssim
Time to encode picture: 0.865s
4017918 28.8958

./examples/cwebp bryce.webp -v -short -m 1 -print_ssim
Time to encode picture: 1.093s
3934888 28.9114

AFTER:
-m 0
Time to encode picture: 0.760s
3975776 29.1778

-m 1
Time to encode picture: 0.985s
3884778 29.1960


(-m 1 is now very close to -m 2 in terms of speed/distortion, actually).


hope it helps,
skal/


From what I understand, when inside StatLoop() → OneStatPass() → VP8Decimate, if method < 3 then rd_opt=RD_OPT_NONE, so the evaluation in VP8Decimate goes to the “else” branch. However, this will be the first time that that MB is checked, so no previous decisions have been made. Is this right?


Depending on the method (0..6) used, in the main loop of VP8EncLoop(), it could also be the first time that a MB is visited, if inside StatLoop() not all macroblocks have been evaluated (fast mode).


Can someone please clarify those scenarios?


Thanks!


Pablo

--
You received this message because you are subscribed to the Google Groups "WebP Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webp-discuss...@webmproject.org.
To post to this group, send email to webp-d...@webmproject.org.
Visit this group at https://groups.google.com/a/webmproject.org/group/webp-discuss/.
For more options, visit https://groups.google.com/a/webmproject.org/d/optout.


Pablo Enfedaque

unread,
Jul 18, 2016, 8:24:52 PM7/18/16
to WebP Discussion, pablo.e...@gmail.com
Hi skal,

I have been checking out the code and the idea looks pretty neat. 
Quite impressive though that x1.1 speedup only by avoiding the MBAnalyzeBestIntra16Mode() in methods 0 and 1, I would have not expected that much.

Cheers,

Pablo
Reply all
Reply to author
Forward
0 new messages