We recently checked in an improvement to the VP9 multi-threaded (MT) encoder. In addition to the existing column tile based multi-threading, libvpx VP9 encoder now supports multi-threading within a single column tile using a block row based threading approach, resulting in significantly faster encoding. In tests[1] of encoding HD videos with 4 column tiles, the improved VP9 MT encoder achieved speedups over the original of 11% with 2 threads, 27% with 4 threads, 101% with 8 threads, and 135% with 16 threads.
With the improved threading scheme, VP9 encoder can achieve:
>100% speed improvement for 720p/1080p videos by allowing the encoder to use more than 4 threads;
fast encoding by enabling multi-threads for small resolution videos;
>10% speed improvement even when there is no change of number of encoding threads.
To accommodate the set of adaptive features in VP9[2], the improved MT encoder is non-deterministic. However, our tests show that the quality impact is negligible.
Currently, the improved MT encoder works in 1-pass/2-pass good quality mode encoding at speed 0, 1, 2, 3 and 4.
Please note that the block row based MT encoder is off by default. You can use the encoding option "--row-mt=<arg>" to turn it on. For example, if you prefer the original deterministic MT encoder, use the default "--row-mt=0". On the other hand, use "row-mt=1" to enable it to get the improved performance from block row based multi-threading.
Please test the new MT encoder and file an issue[3] if it doesn't work for you.
[1]. Tests were run on the 16-core desktop with Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz.
[2]. The adaptive features in VP9 use previously encoded blocks' stats to modify encoding parameters and make decisions in the current block's encoding, which results in the non-determinism in row-based MT encoder.