Dear all,
as a contribution to the European Project "T-NOVA" (www.t-nova.eu), University of Milan - Department of Computer Science (www.di.unimi.it) and ITALTEL (www.italtel.com) have developed a GPU-accelerated VP8 encoder based on the existing libvpx 1.5.0 code. The main mission goal was to provide an accelerated encoder which outperformed - in terms of computational time - the open source vpxenc encoder, while retaining the same visual quality
After evaluating different approaches to the problem, we found that the most promising strategy was to integrate the libvpx with a novel parallel motion estimation (ME) algorithm and provide a proper implementation specifically tailored for GP-GPU. We targeted modern NVidia graphic cards, and the parallel code was developed under the CUDA programming paradigm. This novel ME algorithm falls between the diamond search and the full search strategies since, during each iteration, several (up to 128) candidates inside the diamond area are evaluated at the same time, in order to exploit the massive parallelism exposed by GPU processors.
Three different ME CUDA kernels have been developed, each one having different features and performances, with the most accurate kernel featuring a complete splitmv (down to 4x4 blocks) search method and sub-pixel interpolation up to 1/4 pel motion vectors.
Regarding the performance, the slowest and most accurate kernel is able to boost the entire coding time up to 2x, while the speed-up obtained with the fastest MV algorithm goes up to 4.5x, at the expense of a small loss in visual quality.
Some preliminary tests were performed and the encodings were evaluated by following the submission recommendations stated on the Webm website. The results performed on different video resolutions (CIF, 720p and 1080p), on single pass and on single thread (*) are available at https://github.com/Topopiccione/vpxenc_test
The tests were run on an Intel Xeon E5-2620 v3 @ 2.40 GHz, 64 GB ram, on Linux Ubuntu 14.05 64-bit, while the graphic card is a NVidia GTX980 with 4 GB of GDDR5 video memory.
At current time, we managed to fully integrate the CUDA kernels into the libvpx code, with a small exception for the configure and the automatically generated makefiles, which still need to be manually edited. A convenient command-line option (--cuda-me) has been provided to vpxenc in order to switch between standard ME and our CUDA accelerated version.
Also, we successfully managed to integrate the accelerated libvpx library into the popular open-source libav library to further improve the usability of the accelerated encoder.
We are constantly optimizing the kernels and we would like to contribute to the Webm project by submitting our work to the libvpx developer community; moreover, thanks to the promising results we have obtained with the VP8, we are considering adapting the accelerated ME implementation to the more recent VP9/10 standards.
Best regards,
Pietro Paglierani (ITALTEL) - email: pietro.p...@italtel.com
Giuliano Grossi (UNIMI) - email: gro...@di.unimi.it
Federico Pedersini (UNIMI) - email: pede...@di.unimi.it
Alessandro Petrini (UNIMI) - email: alessand...@unimi.it
(*) command-line:
./vpxenc $i -o $i-$b.vp8.webm --best --cpu-used=0 --target-bitrate=$b --auto-alt-ref=1 -v --minsection-pct=0 --maxsection-pct=800 --lag-in-frames=25 --kf-min-dist=0 --kf-max-dist=99999 --static-thresh=0 --min-q=0 --max-q=63 --drop-frame=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --psnr --arnr-maxframes=7 --arnr-strength=3 --arnr-type=3 --codec=vp8
--
You received this message because you are subscribed to the Google Groups "Codec Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codec-devel...@webmproject.org.
To post to this group, send email to codec...@webmproject.org.
Hmm, fascinating! Can't wait to try it out!
On Wed, May 18, 2016, 17:58 Alessandro Petrini <alessand...@unimi.it> wrote:
Dear all,
as a contribution to the European Project "T-NOVA" (www.t-nova.eu), University of Milan - Department of Computer Science (www.di.unimi.it) and ITALTEL (www.italtel.com) have developed a GPU-accelerated VP8 encoder based on the existing libvpx 1.5.0 code. The main mission goal was to provide an accelerated encoder which outperformed - in terms of computational time - the open source vpxenc encoder, while retaining the same visual quality
After evaluating different approaches to the problem, we found that the most promising strategy was to integrate the libvpx with a novel parallel motion estimation (ME) algorithm and provide a proper implementation specifically tailored for GP-GPU. We targeted modern NVidia graphic cards, and the parallel code was developed under the CUDA programming paradigm. This novel ME algorithm falls between the diamond search and the full search strategies since, during each iteration, several (up to 128) candidates inside the diamond area are evaluated at the same time, in order to exploit the massive parallelism exposed by GPU processors.
Three different ME CUDA kernels have been developed, each one having different features and performances, with the most accurate kernel featuring a complete splitmv (down to 4x4 blocks) search method and sub-pixel interpolation up to 1/4 pel motion vectors.
Regarding the performance, the slowest and most accurate kernel is able to boost the entire coding time up to 2x, while the speed-up obtained with the fastest MV algorithm goes up to 4.5x, at the expense of a small loss in visual quality.
Some preliminary tests were performed and the encodings were evaluated by following the submission recommendations stated on the Webm website. The results performed on different video resolutions (CIF, 720p and 1080p), on single pass and on single thread (*) are available at https://github.com/Topopiccione/vpxenc_test
The tests were run on an Intel Xeon E5-2620 v3 @ 2.40 GHz, 64 GB ram, on Linux Ubuntu 14.05 64-bit, while the graphic card is a NVidia GTX980 with 4 GB of GDDR5 video memory.
At current time, we managed to fully integrate the CUDA kernels into the libvpx code, with a small exception for the configure and the automatically generated makefiles, which still need to be manually edited. A convenient command-line option (--cuda-me) has been provided to vpxenc in order to switch between standard ME and our CUDA accelerated version.
Also, we successfully managed to integrate the accelerated libvpx library into the popular open-source libav library to further improve the usability of the accelerated encoder.
We are constantly optimizing the kernels and we would like to contribute to the Webm project by submitting our work to the libvpx developer community; moreover, thanks to the promising results we have obtained with the VP8, we are considering adapting the accelerated ME implementation to the more recent VP9/10 standards.
Best regards,
Pietro Paglierani (ITALTEL) - email: pietro.p...@italtel.com
Giuliano Grossi (UNIMI) - email: gro...@di.unimi.it
Federico Pedersini (UNIMI) - email: pede...@di.unimi.it
Alessandro Petrini (UNIMI) - email: alessand...@unimi.it
(*) command-line:
./vpxenc $i -o $i-$b.vp8.webm --best --cpu-used=0 --target-bitrate=$b --auto-alt-ref=1 -v --minsection-pct=0 --maxsection-pct=800 --lag-in-frames=25 --kf-min-dist=0 --kf-max-dist=99999 --static-thresh=0 --min-q=0 --max-q=63 --drop-frame=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --psnr --arnr-maxframes=7 --arnr-strength=3 --arnr-type=3 --codec=vp8
--
You received this message because you are subscribed to the Google Groups "Codec Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to codec...@webmproject.org.
Dear all,
as a contribution to the European Project "T-NOVA" (www.t-nova.eu), University of Milan - Department of Computer Science (www.di.unimi.it) and ITALTEL (www.italtel.com) have developed a GPU-accelerated VP8 encoder based on the existing libvpx 1.5.0 code. The main mission goal was to provide an accelerated encoder which outperformed - in terms of computational time - the open source vpxenc encoder, while retaining the same visual quality
After evaluating different approaches to the problem, we found that the most promising strategy was to integrate the libvpx with a novel parallel motion estimation (ME) algorithm and provide a proper implementation specifically tailored for GP-GPU. We targeted modern NVidia graphic cards, and the parallel code was developed under the CUDA programming paradigm. This novel ME algorithm falls between the diamond search and the full search strategies since, during each iteration, several (up to 128) candidates inside the diamond area are evaluated at the same time, in order to exploit the massive parallelism exposed by GPU processors.
Three different ME CUDA kernels have been developed, each one having different features and performances, with the most accurate kernel featuring a complete splitmv (down to 4x4 blocks) search method and sub-pixel interpolation up to 1/4 pel motion vectors.
Regarding the performance, the slowest and most accurate kernel is able to boost the entire coding time up to 2x, while the speed-up obtained with the fastest MV algorithm goes up to 4.5x, at the expense of a small loss in visual quality.
Some preliminary tests were performed and the encodings were evaluated by following the submission recommendations stated on the Webm website. The results performed on different video resolutions (CIF, 720p and 1080p), on single pass and on single thread (*) are available at https://github.com/Topopiccione/vpxenc_test
The tests were run on an Intel Xeon E5-2620 v3 @ 2.40 GHz, 64 GB ram, on Linux Ubuntu 14.05 64-bit, while the graphic card is a NVidia GTX980 with 4 GB of GDDR5 video memory.
At current time, we managed to fully integrate the CUDA kernels into the libvpx code, with a small exception for the configure and the automatically generated makefiles, which still need to be manually edited. A convenient command-line option (--cuda-me) has been provided to vpxenc in order to switch between standard ME and our CUDA accelerated version.
Also, we successfully managed to integrate the accelerated libvpx library into the popular open-source libav library to further improve the usability of the accelerated encoder.
We are constantly optimizing the kernels and we would like to contribute to the Webm project by submitting our work to the libvpx developer community; moreover, thanks to the promising results we have obtained with the VP8, we are considering adapting the accelerated ME implementation to the more recent VP9/10 standards.
Best regards,
Pietro Paglierani (ITALTEL) - email: pietro.p...@italtel.com
Giuliano Grossi (UNIMI) - email: gro...@di.unimi.it
Federico Pedersini (UNIMI) - email: pede...@di.unimi.it
Alessandro Petrini (UNIMI) - email: alessand...@unimi.it
(*) command-line:
./vpxenc $i -o $i-$b.vp8.webm --best --cpu-used=0 --target-bitrate=$b --auto-alt-ref=1 -v --minsection-pct=0 --maxsection-pct=800 --lag-in-frames=25 --kf-min-dist=0 --kf-max-dist=99999 --static-thresh=0 --min-q=0 --max-q=63 --drop-frame=0 --bias-pct=50 --minsection-pct=0 --maxsection-pct=800 --psnr --arnr-maxframes=7 --arnr-strength=3 --arnr-type=3 --codec=vp8