WebM CUDA Encoder/Decoder

1697 views
Skip to first unread message

TiSi

unread,
Aug 8, 2011, 12:38:40 PM8/8/11
to Application Developers
Hello everyone,

My name is Ruben from Spain, and I'm in the last year in my career
(Computer science). I have to do a project to finish my career, and I
have been thinking use the CUDA platform with some high volume data
algorithm, and I think the video processing is one of the most
indicated. So, my objective is use the CUDA platform to accelerate the
decoding/encoding process in WebM.

I read the source code available in the web and I not found the
appropiate method that decodes/encodes the data.

Someone knows if my goal is possible?, and how can I begin?

Thanks.

James Zern

unread,
Aug 16, 2011, 9:15:35 PM8/16/11
to apps-...@webmproject.org
On Mon, Aug 8, 2011 at 09:38, TiSi <tisi...@gmail.com> wrote:
> Hello everyone,
>
> My name is Ruben from Spain, and I'm in the last year in my career
> (Computer science). I have to do a project to finish my career, and I
> have been thinking use the CUDA platform with some high volume data
> algorithm, and I think the video processing is one of the most
> indicated. So, my objective is use the CUDA platform to accelerate the
> decoding/encoding process in WebM.
>
> I read the source code available in the web and I not found the
> appropiate method that decodes/encodes the data.
>
If you're looking for the high-level encode/decode calls the examples
are a good place to start. Either vpx(dec|enc).c or
simple_encoder/decoder.c (generated by configure).

> Someone knows if my goal is possible?, and how can I begin?
>
> Thanks.
>

> --
> You received this message because you are subscribed to the Google Groups "Application Developers" group.
> To post to this group, send email to apps-...@webmproject.org.
> To unsubscribe from this group, send email to apps-devel+...@webmproject.org.
> For more options, visit this group at http://groups.google.com/a/webmproject.org/group/apps-devel/?hl=en.
>
>

John Koleszar

unread,
Aug 23, 2011, 2:31:20 PM8/23/11
to apps-...@webmproject.org
On Tue, Aug 16, 2011 at 9:15 PM, James Zern <jz...@google.com> wrote:
> On Mon, Aug 8, 2011 at 09:38, TiSi <tisi...@gmail.com> wrote:
>> Hello everyone,
>>
>> My name is Ruben from Spain, and I'm in the last year in my career
>> (Computer science). I have to do a project to finish my career, and I
>> have been thinking use the CUDA platform with some high volume data
>> algorithm, and I think the video processing is one of the most
>> indicated. So, my objective is use the CUDA platform to accelerate the
>> decoding/encoding process in WebM.
>>
>> I read the source code available in the web and I not found the
>> appropiate method that decodes/encodes the data.
>>
> If you're looking for the high-level encode/decode calls the examples
> are a good place to start. Either vpx(dec|enc).c or
> simple_encoder/decoder.c (generated by configure).
>

You also may be interested in an initial OpenCL implementation:

http://review.webmproject.org/gitweb?p=libvpx.git;a=commitdiff;h=sandbox/awatry/initial_opencl_implementation

Geff

unread,
Sep 9, 2011, 3:19:08 AM9/9/11
to Application Developers
If any news on that, PLEASE report. Would be great a webm encoder with
CUDA support.

Geff

unread,
Sep 9, 2011, 3:18:25 AM9/9/11
to Application Developers
if any news on that, PLEASE report =)
would love a webm encoder with cuda support

On Aug 8, 12:38 pm, TiSi <tisi1...@gmail.com> wrote:

Geff

unread,
Sep 9, 2011, 3:33:10 AM9/9/11
to Application Developers
Come on DEVs! CUDA power to WEBM! It's one of the two issues that
makes me (and probably a bunch of other people) not dive into WebM.

The other one is when Apple adopt decoding of WebM in their iDevices
=) (never =/)

But CUDA is a great start!

On Aug 23, 3:31 pm, John Koleszar <jkoles...@google.com> wrote:
> On Tue, Aug 16, 2011 at 9:15 PM, James Zern <jz...@google.com> wrote:
> > On Mon, Aug 8, 2011 at 09:38, TiSi <tisi1...@gmail.com> wrote:
> >> Hello everyone,
>
> >> My name is Ruben from Spain, and I'm in the last year in my career
> >> (Computer science). I have to do a project to finish my career, and I
> >> have been thinking use the CUDA platform with some high volume data
> >> algorithm, and I think the video processing is one of the most
> >> indicated. So, my objective is use the CUDA platform to accelerate the
> >> decoding/encoding process in WebM.
>
> >> I read the source code available in the web and I not found the
> >> appropiate method that decodes/encodes the data.
>
> > If you're looking for the high-level encode/decode calls the examples
> > are a good place to start. Either vpx(dec|enc).c or
> > simple_encoder/decoder.c (generated by configure).
>
> You also may be interested in an initial OpenCL implementation:
>
> http://review.webmproject.org/gitweb?p=libvpx.git;a=commitdiff;h=sand...

Ruben Sanchez Castellano

unread,
Sep 9, 2011, 9:36:57 AM9/9/11
to apps-...@webmproject.org

Hi all!

Sorry for the late response. I have been busy the last month (summer jobs u.u). And thanks all of you for your interest!!

Now I'm reading the draft of WebM implementation in order to understand the libvpx code better.

And I looked the openCL implementation. But it seems to load the entire lib and use the openCL lib methods, isn't it?. My idea is rewrite the libvpx decoding/encoding methods to CUDA kernels.

El 09/09/2011 14:46, "Geff" <geffa....@googlemail.com> escribió:

Franco Tecchia

unread,
Sep 9, 2011, 10:03:55 AM9/9/11
to apps-...@webmproject.org

Hi all,

 

great that someone is adding GPU support to webm. It’s a VERY important feature.

 

Still, I would favor OpenCL implementation over CUDA. It’s an open standard and, more importantly, it would work on both AMD & NVIDIA hardware. AMB Brazos (OpenCL only) architectures right now are incredibly more attractive for low-power solution than  Intel+NVIDIA ION.

 

Franco

Ruben Sanchez Castellano

unread,
Nov 6, 2011, 6:35:12 PM11/6/11
to apps-...@webmproject.org
Hi all,

I'm studying the simple decoder code. I understand it until this point in vpx_decoder.c (line 137):

res = ctx->iface->dec.decode(ctx->priv->alg_priv, data, data_sz, user_priv, deadline);

The "decode()" function is a vpx_codec_decode_fn_t type object (line 303 of vpx_codec_internal.h):

vpx_codec_decode_fn_t     decode;

And the type definition of  vpx_codec_decode_fn_t (line 192 of vpx_codec_internal.h) is:

typedef vpx_codec_err_t (*vpx_codec_decode_fn_t)(vpx_codec_alg_priv_t  *ctx,
        const uint8_t         *data,
        unsigned int     data_sz,
        void        *user_priv,
        long         deadline);

It seems to be a callback function, isn't it?. Where's the implementation of this function? 

My objective is find the implementation of the decoding process in order to transform it in to a CUDA kernel.

Can anyone help me with this?

Thanks!!

2011/9/9 Franco Tecchia <franco....@sssup.it>



--
Rubén Sánchez Castellano
tisi...@gmail.com
rubsan...@hotmail.com

John Koleszar

unread,
Nov 7, 2011, 12:07:26 PM11/7/11
to apps-...@webmproject.org
Hi Ruben,

See vp8_decode() in vp8/vp8_dx_iface.c

Ruben Sanchez Castellano

unread,
Nov 8, 2011, 11:48:19 AM11/8/11
to apps-...@webmproject.org
Thanks John!! I'll continue by this way.

Maybe my questions seems trivial for you, but I'm ending my studies and all of this is new to me. I really appreciate your help!

2011/11/7 John Koleszar <jkol...@google.com>

Ruben Sanchez Castellano

unread,
Nov 26, 2011, 9:52:54 AM11/26/11
to apps-...@webmproject.org
Hello everyone!

I'm debugging simple_decoder.cpp and studying the used functions. I found VP8 uses assembler functions. I worry about this.

My idea for a CUDA simple_decoder is: one CUDA thread decodes a keyframe and all subsequent frames till the next keyframe (excluded). But the assembler functions force me to use the CPU for these functions. Imagine 150 CUDA threads decoding frames.... maybe, my idea is a too low-level implementation.

I've seen that simple_decoder uses this assembler functions:

vp8_build_intra_predictors_mbuv_ssse3
vp8_build_intra_predictors_mbuv_x86
vp8_dequant_idct_add_uv_block_sse2
vp8_sixtap_predict16x16_ssse3
vp8_sixtap_predict8x8_ssse3
vp8_intra_pred_uv_dc128_mmx
vp8_intra_pred_uv_dc_mmx2
vp8_intra_pred_uv_dctop_mmx2
vp8_intra_pred_uv_dcleft_mmx2
vp8_loop_filter_mbv_sse2
...

I think this assembler implementation ties the program to the CPU. Isn't it?.

Maybe can I implement some high-level CUDA?

Thanks!!

Ruben Sanchez Castellano

unread,
Nov 27, 2011, 7:23:22 PM11/27/11
to apps-...@webmproject.org
Hi all!

I continue reading libvpx code. After leaving my first idea, I have focused on analyzing the one single frame decoding. But the  problem remains the same, the dequantization code is implemented in assembler!.
I think that this way I can't do so much with this code.

Has anyone any idea to continue?

Johann Koenig

unread,
Nov 27, 2011, 9:30:22 PM11/27/11
to apps-...@webmproject.org
On Sun, Nov 27, 2011 at 16:23, Ruben Sanchez Castellano
<tisi...@gmail.com> wrote:
> But the  problem remains the same, the dequantization code is implemented in
> assembler!

There is always a reference version in C. For any given function in
the assembly, you can usually find the C version by replacing the end
of the function name (_sse2 for example) with _c and searching.

With gcc, you can build the C version with --target=generic-gnu

In order to build all C with something like visual studio, you could
just disable all the assembly with
--disable-[mmx|sse|sse2|sse3|ssse3|etc]
--
- johann koenig
  google

Pascal Massimino

unread,
Nov 27, 2011, 11:54:19 PM11/27/11
to apps-...@webmproject.org
Hi Ruben,

have you tried looking at the reference decoder's code ('dixie') too?
Might be a good alternate source of inspiration.
Code is here:
(or in the RFC #6386, but that's less practical)

skal

- johann koenig
  google

Ruben Sanchez Castellano

unread,
Nov 28, 2011, 2:14:57 PM11/28/11
to apps-...@webmproject.org
Thanks Johann! The --disable-[mmx|sse|sse2|sse3|ssse3|etc] option in VS9 worked perfectly without modifying the code. 

Although I had problems with the GNU/linux version, I build it without any options, I guess make automatically detect the features of my CPU and generate an optimizaed build of libvpx for my CPU. After rebuild libvpx with ./configure --target=generic-gnu and build again the project in Netbeans I could compile simple_decoder without any assembler code.

Pascal, I have seen the dixie implementation, but, is this experimental code, isn't it? Thanks anyway.

Now, I'll continue studying the code.

2011/11/28 Pascal Massimino <pascal.m...@gmail.com>

--
Rubén Sánchez Castellano
tisi...@gmail.com

Denis

unread,
Jun 5, 2013, 8:18:59 AM6/5/13
to apps-...@webmproject.org
http://csukhyeun.blogspot.com/2013/03/vpxenc-cuda.html

понедельник, 8 августа 2011 г., 19:38:40 UTC+3 пользователь TiSi написал:

Maulik Prabhudesai

unread,
Jun 5, 2013, 8:20:06 AM6/5/13
to apps-...@webmproject.org
profile the application using gprof or any other tool. you will find the computation intensive parts which  need porting


Hartelijke groeten /Warm regards /शुभेच्छा 

Maulik Prabhudesai 
m.prab...@student.tudelft.nl
MSc student, EWI, TU-Delft


--
You received this message because you are subscribed to the Google Groups "Application Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to apps-devel+...@webmproject.org.

To post to this group, send email to apps-...@webmproject.org.

geffa.o...@gmail.com

unread,
Jan 11, 2016, 2:18:15 PM1/11/16
to Application Developers
Any news on this matter? It's 2016 and till now I'm not aware of any hardware GPU accelerated encoding for webm vp8, or vp9. How Google wants to mainstream this format if the encoding is not efficient?

Also for decoding would be nice to see the use of our AMD and NVIDIA cards.
Reply all
Reply to author
Forward
0 new messages