Contribution proposal for GPU-accelerated VP8 encoder

Alessandro Petrini

unread,

May 18, 2016, 10:58:14 AM5/18/16

to codec...@webmproject.org

Dear all,

as a contribution to the European Project "T-NOVA" (www.t-nova.eu), University of Milan - Department of Computer Science (www.di.unimi.it) and ITALTEL (www.italtel.com) have developed a GPU-accelerated VP8 encoder based on the existing libvpx 1.5.0 code. The main mission goal was to provide an accelerated encoder which outperformed - in terms of computational time - the open source vpxenc encoder, while retaining the same visual quality

After evaluating different approaches to the problem, we found that the most promising strategy was to integrate the libvpx with a novel parallel motion estimation (ME) algorithm and provide a proper implementation specifically tailored for GP-GPU. We targeted modern NVidia graphic cards, and the parallel code was developed under the CUDA programming paradigm. This novel ME algorithm falls between the diamond search and the full search strategies since, during each iteration, several (up to 128) candidates inside the diamond area are evaluated at the same time, in order to exploit the massive parallelism exposed by GPU processors.
Three different ME CUDA kernels have been developed, each one having different features and performances, with the most accurate kernel featuring a complete splitmv (down to 4x4 blocks) search method and sub-pixel interpolation up to 1/4 pel motion vectors.

Regarding the performance, the slowest and most accurate kernel is able to boost the entire coding time up to 2x, while the speed-up obtained with the fastest MV algorithm goes up to 4.5x, at the expense of a small loss in visual quality.

Some preliminary tests were performed and the encodings were evaluated by following the submission recommendations stated on the Webm website. The results performed on different video resolutions (CIF, 720p and 1080p), on single pass and on single thread (*) are available at https://github.com/Topopiccione/vpxenc_test

The tests were run on an Intel Xeon E5-2620 v3 @ 2.40 GHz, 64 GB ram, on Linux Ubuntu 14.05 64-bit, while the graphic card is a NVidia GTX980 with 4 GB of GDDR5 video memory.

At current time, we managed to fully integrate the CUDA kernels into the libvpx code, with a small exception for the configure and the automatically generated makefiles, which still need to be manually edited. A convenient command-line option (--cuda-me) has been provided to vpxenc in order to switch between standard ME and our CUDA accelerated version.
Also, we successfully managed to integrate the accelerated libvpx library into the popular open-source libav library to further improve the usability of the accelerated encoder.
We are constantly optimizing the kernels and we would like to contribute to the Webm project by submitting our work to the libvpx developer community; moreover, thanks to the promising results we have obtained with the VP8, we are considering adapting the accelerated ME implementation to the more recent VP9/10 standards.

Best regards,

Pietro Paglierani (ITALTEL) - email: pietro.p...@italtel.com
Giuliano Grossi (UNIMI) - email: gro...@di.unimi.it
Federico Pedersini (UNIMI) - email: pede...@di.unimi.it
Alessandro Petrini (UNIMI) - email: alessandr...@unimi.it

(*) command-line:
./vpxenc $i -o $i-$b.vp8.webm --best --cpu-used=0 --target-bitrate=$b --auto-alt-ref=1 -v --minsection-pct=0 --maxsection-pct=800 --lag-in-frames=25 --kf-min-dist=0 --kf-max-dist=99999 --static-thresh=0 --min-q=0 --max-q=63 --drop-frame=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --psnr --arnr-maxframes=7 --arnr-strength=3 --arnr-type=3 --codec=vp8

MailScanner Signature Unimi

Kagami Hiiragi

unread,

May 18, 2016, 4:11:10 PM5/18/16

to codec...@webmproject.org

> <https://github.com/Topopiccione/vpxenc_test.>

> MailScanner Signature Unimi
>
> --
> You received this message because you are subscribed to the Google Groups "Codec
> Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email
> to codec-devel...@webmproject.org
> <mailto:codec-devel...@webmproject.org>.
> To post to this group, send email to codec...@webmproject.org
> <mailto:codec...@webmproject.org>.
> Visit this group at https://groups.google.com/a/webmproject.org/group/codec-devel/.
> For more options, visit https://groups.google.com/a/webmproject.org/d/optout.

Hi!

Very interesting results. Are you modifications available somewhere? I would
like to try it out. (I can see you have libvpx fork at "Topopiccione" account
but seems like it doesn't contain any related commits.)

Best regards.

Alessandro Petrini

unread,

May 19, 2016, 10:44:27 AM5/19/16

to codec...@webmproject.org

Hi,
thanks for your interest!

I just noticed that the link to the test result I posted yesterday is incorrect.
https://github.com/Topopiccione/vpxenc_tests is the right one.

We are still finalizing the code and performing more tests on different architectures; source code of the mod will be released shortly.

Best Regards,
Alessandro Petrini (Unimi)

https://github.com/Topopiccione/vpxenc_tests

MailScanner Signature Unimi

offra...@gmail.com

unread,

Jun 29, 2016, 1:20:13 PM6/29/16

to Codec Developers, alessandr...@unimi.it

Hi all,
code for the CUDA accelerated VP8 encoder has been posted on the following github:
https://github.com/Italtel-Unimi/libvpx
I also uploaded a compiled binary for Linux-64 and NVidia GPU with Compute Capabilites >= 3.5
Feel free to test it and leave a feedback!

Thanks and best regards.
Alessandro Petrini - Università degli studi di Milano

alle...@gmail.com

unread,

Sep 27, 2019, 1:50:02 PM9/27/19

to Codec Developers, alessandr...@unimi.it

any news about porting your code in teh official libvpx? it will be a great improovment :D

On Wednesday, 18 May 2016 16:58:14 UTC+2, Alessandro Petrini wrote:

Dear all,

as a contribution to the European Project "T-NOVA" (www.t-nova.eu), University of Milan - Department of Computer Science (www.di.unimi.it) and ITALTEL (www.italtel.com) have developed a GPU-accelerated VP8 encoder based on the existing libvpx 1.5.0 code. The main mission goal was to provide an accelerated encoder which outperformed - in terms of computational time - the open source vpxenc encoder, while retaining the same visual quality

After evaluating different approaches to the problem, we found that the most promising strategy was to integrate the libvpx with a novel parallel motion estimation (ME) algorithm and provide a proper implementation specifically tailored for GP-GPU. We targeted modern NVidia graphic cards, and the parallel code was developed under the CUDA programming paradigm. This novel ME algorithm falls between the diamond search and the full search strategies since, during each iteration, several (up to 128) candidates inside the diamond area are evaluated at the same time, in order to exploit the massive parallelism exposed by GPU processors.
Three different ME CUDA kernels have been developed, each one having different features and performances, with the most accurate kernel featuring a complete splitmv (down to 4x4 blocks) search method and sub-pixel interpolation up to 1/4 pel motion vectors.

Regarding the performance, the slowest and most accurate kernel is able to boost the entire coding time up to 2x, while the speed-up obtained with the fastest MV algorithm goes up to 4.5x, at the expense of a small loss in visual quality.

Some preliminary tests were performed and the encodings were evaluated by following the submission recommendations stated on the Webm website. The results performed on different video resolutions (CIF, 720p and 1080p), on single pass and on single thread (*) are available at https://github.com/Topopiccione/vpxenc_test

The tests were run on an Intel Xeon E5-2620 v3 @ 2.40 GHz, 64 GB ram, on Linux Ubuntu 14.05 64-bit, while the graphic card is a NVidia GTX980 with 4 GB of GDDR5 video memory.

At current time, we managed to fully integrate the CUDA kernels into the libvpx code, with a small exception for the configure and the automatically generated makefiles, which still need to be manually edited. A convenient command-line option (--cuda-me) has been provided to vpxenc in order to switch between standard ME and our CUDA accelerated version.
Also, we successfully managed to integrate the accelerated libvpx library into the popular open-source libav library to further improve the usability of the accelerated encoder.
We are constantly optimizing the kernels and we would like to contribute to the Webm project by submitting our work to the libvpx developer community; moreover, thanks to the promising results we have obtained with the VP8, we are considering adapting the accelerated ME implementation to the more recent VP9/10 standards.

Best regards,

Pietro Paglierani (ITALTEL) - email: pietro.p...@italtel.com
Giuliano Grossi (UNIMI) - email: gro...@di.unimi.it
Federico Pedersini (UNIMI) - email: pede...@di.unimi.it

Alessandro Petrini (UNIMI) - email: alessand...@unimi.it

(*) command-line:
./vpxenc $i -o $i-$b.vp8.webm --best --cpu-used=0 --target-bitrate=$b --auto-alt-ref=1 -v --minsection-pct=0 --maxsection-pct=800 --lag-in-frames=25 --kf-min-dist=0 --kf-max-dist=99999 --static-thresh=0 --min-q=0 --max-q=63 --drop-frame=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --psnr --arnr-maxframes=7 --arnr-strength=3 --arnr-type=3 --codec=vp8

Vadim Asadov, CEO

unread,

Sep 27, 2019, 2:10:20 PM9/27/19

to codec...@webmproject.org, alessandr...@unimi.it

I will check with our dev team on Tue weekly meeting and let you know.

Regards

Sent from my T-Mobile 4G LTE Device

--

You received this message because you are subscribed to the Google Groups "Codec Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to codec-devel...@webmproject.org.
To view this discussion on the web visit https://groups.google.com/a/webmproject.org/d/msgid/codec-devel/f2a913c7-e092-4dce-b4e5-9f1bad1ee83a%40webmproject.org.

Dennis Mungai

unread,

Sep 27, 2019, 5:12:47 PM9/27/19

to codec...@webmproject.org

Hmm, fascinating! Can't wait to try it out!

--

You received this message because you are subscribed to the Google Groups "Codec Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codec-devel...@webmproject.org.

To post to this group, send email to codec...@webmproject.org.

offra...@gmail.com

unread,

Sep 30, 2019, 8:57:28 AM9/30/19

to Codec Developers

Dear all,

thanks for showing a renewed interest in our contribution proposal.

Since the original post, we marginally improved the implementation with minor updates, but no major updates have been made to the codebase, which is still tied to v1.6.0 of the libvpx repository. However, we published the details of the implementation as well as test results in the following article:

G. Grossi, P. Paglierani, F. Pedersini, A. Petrini, "Enhanced multicore–manycore interaction in high-performance video encoding", Journal of Real-Time Image Processing, Nov 02 2018, https://doi.org/10.1007/s11554-018-0834-4

As the FP7 European funded "T-Nova" project - which was the framework for the development of our contribution - ended three years ago, we currently do not have the resources for keeping the development active. However, we can provide help and advice for compiling the code and the integration in the official libvpx code repository, if needed.

All the best,

Alessandro Petrini

Il giorno venerdì 27 settembre 2019 23:12:47 UTC+2, Dennis Mungai ha scritto:

Hmm, fascinating! Can't wait to try it out!

On Wed, May 18, 2016, 17:58 Alessandro Petrini <alessand...@unimi.it> wrote:

Dear all,

as a contribution to the European Project "T-NOVA" (www.t-nova.eu), University of Milan - Department of Computer Science (www.di.unimi.it) and ITALTEL (www.italtel.com) have developed a GPU-accelerated VP8 encoder based on the existing libvpx 1.5.0 code. The main mission goal was to provide an accelerated encoder which outperformed - in terms of computational time - the open source vpxenc encoder, while retaining the same visual quality

After evaluating different approaches to the problem, we found that the most promising strategy was to integrate the libvpx with a novel parallel motion estimation (ME) algorithm and provide a proper implementation specifically tailored for GP-GPU. We targeted modern NVidia graphic cards, and the parallel code was developed under the CUDA programming paradigm. This novel ME algorithm falls between the diamond search and the full search strategies since, during each iteration, several (up to 128) candidates inside the diamond area are evaluated at the same time, in order to exploit the massive parallelism exposed by GPU processors.
Three different ME CUDA kernels have been developed, each one having different features and performances, with the most accurate kernel featuring a complete splitmv (down to 4x4 blocks) search method and sub-pixel interpolation up to 1/4 pel motion vectors.

Regarding the performance, the slowest and most accurate kernel is able to boost the entire coding time up to 2x, while the speed-up obtained with the fastest MV algorithm goes up to 4.5x, at the expense of a small loss in visual quality.

Some preliminary tests were performed and the encodings were evaluated by following the submission recommendations stated on the Webm website. The results performed on different video resolutions (CIF, 720p and 1080p), on single pass and on single thread (*) are available at https://github.com/Topopiccione/vpxenc_test

The tests were run on an Intel Xeon E5-2620 v3 @ 2.40 GHz, 64 GB ram, on Linux Ubuntu 14.05 64-bit, while the graphic card is a NVidia GTX980 with 4 GB of GDDR5 video memory.

At current time, we managed to fully integrate the CUDA kernels into the libvpx code, with a small exception for the configure and the automatically generated makefiles, which still need to be manually edited. A convenient command-line option (--cuda-me) has been provided to vpxenc in order to switch between standard ME and our CUDA accelerated version.
Also, we successfully managed to integrate the accelerated libvpx library into the popular open-source libav library to further improve the usability of the accelerated encoder.
We are constantly optimizing the kernels and we would like to contribute to the Webm project by submitting our work to the libvpx developer community; moreover, thanks to the promising results we have obtained with the VP8, we are considering adapting the accelerated ME implementation to the more recent VP9/10 standards.

Best regards,

Pietro Paglierani (ITALTEL) - email: pietro.p...@italtel.com
Giuliano Grossi (UNIMI) - email: gro...@di.unimi.it
Federico Pedersini (UNIMI) - email: pede...@di.unimi.it

Alessandro Petrini (UNIMI) - email: alessand...@unimi.it

(*) command-line:
./vpxenc $i -o $i-$b.vp8.webm --best --cpu-used=0 --target-bitrate=$b --auto-alt-ref=1 -v --minsection-pct=0 --maxsection-pct=800 --lag-in-frames=25 --kf-min-dist=0 --kf-max-dist=99999 --static-thresh=0 --min-q=0 --max-q=63 --drop-frame=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --psnr --arnr-maxframes=7 --arnr-strength=3 --arnr-type=3 --codec=vp8

--
You received this message because you are subscribed to the Google Groups "Codec Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to codec...@webmproject.org.

Vadim Asadov

unread,

Oct 1, 2019, 4:22:24 AM10/1/19

to codec...@webmproject.org, alessandr...@unimi.it

Dear all!

We are working on release right now. It should be ready in Oct. Then we'll able to provide dll with our encoding / decoding implementation.

Regards,

Vadim

l...@3cx.com

unread,

Oct 1, 2019, 3:57:20 PM10/1/19

to Codec Developers, alessandr...@unimi.it

Can't wait to try this out.

On Wednesday, May 18, 2016 at 5:58:14 PM UTC+3, Alessandro Petrini wrote:

Dear all,

as a contribution to the European Project "T-NOVA" (www.t-nova.eu), University of Milan - Department of Computer Science (www.di.unimi.it) and ITALTEL (www.italtel.com) have developed a GPU-accelerated VP8 encoder based on the existing libvpx 1.5.0 code. The main mission goal was to provide an accelerated encoder which outperformed - in terms of computational time - the open source vpxenc encoder, while retaining the same visual quality

After evaluating different approaches to the problem, we found that the most promising strategy was to integrate the libvpx with a novel parallel motion estimation (ME) algorithm and provide a proper implementation specifically tailored for GP-GPU. We targeted modern NVidia graphic cards, and the parallel code was developed under the CUDA programming paradigm. This novel ME algorithm falls between the diamond search and the full search strategies since, during each iteration, several (up to 128) candidates inside the diamond area are evaluated at the same time, in order to exploit the massive parallelism exposed by GPU processors.
Three different ME CUDA kernels have been developed, each one having different features and performances, with the most accurate kernel featuring a complete splitmv (down to 4x4 blocks) search method and sub-pixel interpolation up to 1/4 pel motion vectors.

Regarding the performance, the slowest and most accurate kernel is able to boost the entire coding time up to 2x, while the speed-up obtained with the fastest MV algorithm goes up to 4.5x, at the expense of a small loss in visual quality.

Some preliminary tests were performed and the encodings were evaluated by following the submission recommendations stated on the Webm website. The results performed on different video resolutions (CIF, 720p and 1080p), on single pass and on single thread (*) are available at https://github.com/Topopiccione/vpxenc_test

The tests were run on an Intel Xeon E5-2620 v3 @ 2.40 GHz, 64 GB ram, on Linux Ubuntu 14.05 64-bit, while the graphic card is a NVidia GTX980 with 4 GB of GDDR5 video memory.

At current time, we managed to fully integrate the CUDA kernels into the libvpx code, with a small exception for the configure and the automatically generated makefiles, which still need to be manually edited. A convenient command-line option (--cuda-me) has been provided to vpxenc in order to switch between standard ME and our CUDA accelerated version.
Also, we successfully managed to integrate the accelerated libvpx library into the popular open-source libav library to further improve the usability of the accelerated encoder.
We are constantly optimizing the kernels and we would like to contribute to the Webm project by submitting our work to the libvpx developer community; moreover, thanks to the promising results we have obtained with the VP8, we are considering adapting the accelerated ME implementation to the more recent VP9/10 standards.

Best regards,

Pietro Paglierani (ITALTEL) - email: pietro.p...@italtel.com
Giuliano Grossi (UNIMI) - email: gro...@di.unimi.it
Federico Pedersini (UNIMI) - email: pede...@di.unimi.it

Alessandro Petrini (UNIMI) - email: alessand...@unimi.it

(*) command-line:
./vpxenc $i -o $i-$b.vp8.webm --best --cpu-used=0 --target-bitrate=$b --auto-alt-ref=1 -v --minsection-pct=0 --maxsection-pct=800 --lag-in-frames=25 --kf-min-dist=0 --kf-max-dist=99999 --static-thresh=0 --min-q=0 --max-q=63 --drop-frame=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --psnr --arnr-maxframes=7 --arnr-strength=3 --arnr-type=3 --codec=vp8

Reply all

Reply to author

Forward