Got some problems on accessing decoded image

53 views
Skip to first unread message

Xi Yang

unread,
May 26, 2021, 4:16:24 AM5/26/21
to webm-d...@webmproject.org
Hi everyone:

I got some problems when using libvpx + libwebm to decode a VP9-encoded WebM video. In brief,

  • the resultant image claims not containing alpha channel (which it should have).
  • accessing to the V channel always causes index overflow, although U and V channel are using same index number.

The input video file plays properly in Firefox browser.

My image accessing code looks like below:


void handle_frame(const vpx_image_t* img)
{
    bool half_y = img->fmt == VPX_IMG_FMT_I420 || img->fmt == VPX_IMG_FMT_I440 || img->fmt == VPX_IMG_FMT_I42016 || img->fmt == VPX_IMG_FMT_I44016;
    bool half_x = img->fmt == VPX_IMG_FMT_I420 || img->fmt == VPX_IMG_FMT_I422 || img->fmt == VPX_IMG_FMT_I42016 || img->fmt == VPX_IMG_FMT_I42216;

    bool has_alpha = ( img->fmt & VPX_IMG_FMT_HAS_ALPHA ) != 0;
    for ( int y = 0; y < img->h; y++ )
    {
        int y_Y = y;
        int y_U = y;
        int y_V = y;
        int y_A = y;
        if ( half_y )
        {
            y_U /= 2;
            y_V /= 2;
        }

        for ( int x = 0; x < img->w; x++ )
        {
            int x_Y = x;
            int x_U = x;
            int x_V = x;
            int x_A = x;
            if ( half_x )
            {
                x_U /= 2;
                x_V /= 2;
            }

            int idx_Y = y_Y * img->stride[VPX_PLANE_Y] + x_Y;
            int idx_U = y_U * img->stride[VPX_PLANE_U] + x_U;
            int idx_V = y_V * img->stride[VPX_PLANE_V] + x_V;
            int idx_A = y_A * img->stride[VPX_PLANE_ALPHA] + x_A;
            auto iY = img->planes[VPX_PLANE_Y][idx_Y];
            auto iU = img->planes[VPX_PLANE_U][idx_U];
            auto iV = img->planes[VPX_PLANE_V][idx_V]; // it crashes here
            unsigned char iA = has_alpha ? img->planes[VPX_PLANE_ALPHA][idx_A] : 255;
            // do something with color iY iU iV iA
        }
    }
}

MahanStreamer Management

unread,
May 26, 2021, 11:05:29 AM5/26/21
to webm-d...@webmproject.org
Is it at all possible to upload input file?

My (probably very inaccurate guess) is you are making misassumptions about the sampling size in your code with the half x and half y vars.

Thanks
Mahanstreamer Management

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webm-discuss...@webmproject.org.
To view this discussion on the web visit https://groups.google.com/a/webmproject.org/d/msgid/webm-discuss/TYBP286MB031896AC50AC2539BFE5789FBA249%40TYBP286MB0318.JPNP286.PROD.OUTLOOK.COM.

Xi Yang

unread,
May 26, 2021, 10:48:23 PM5/26/21
to webm-d...@webmproject.org
A sliced video file is attached with this email.

It seems I indeed not calculating indices correctly. When I try to play this sliced video, the overflow index varies.

I did not find any detailed document on the libvpx image struct. I just read some info about reduced-pixel encoding, and write the index calculation code by guess.



From: MahanStreamer Management <mahans...@gmail.com>
Sent: Wednesday, May 26, 2021 3:05 PM
To: webm-d...@webmproject.org <webm-d...@webmproject.org>
Subject: Re: [webm-discuss] Got some problems on accessing decoded image
 
crop.webm

err...@gmail.com

unread,
May 27, 2021, 8:25:45 AM5/27/21
to webm-d...@webmproject.org
msprobe (mahanstreamer probe) tells me this video is yuv420p, so that means for every 4x2 pixels there wil be 2 sets of UV pixels, each one covering 2x2 pixels (Image: https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fi.stack.imgur.com%2FcwG0i.png&f=1&nofb=1).

I don't understand why you are dividing by two here? If you know the video will be in yuv420p in advance (99% of all videos ive seen are yuv420p,), then for every two pixels in y you take, increment u and v indexes by one.
 For all rows that are multiples of two, reset the counter to the original u and v indexes at the beginning of the last row since they share the same values for u and v. Or you can create a loop that processes two rows at a time.

Xi Yang

unread,
May 27, 2021, 11:31:04 PM5/27/21
to webm-d...@webmproject.org
I read the libvpx example code in more detail, and found the reason for index overflow. Actually only d_w/d_h are accessable, but I accessed w/h that are larger than d_w/d_h.
So what does w/h stand for?


From: err...@gmail.com <err...@gmail.com>
Sent: Thursday, May 27, 2021 12:25 PM

MahanStreamer Management

unread,
May 28, 2021, 12:09:37 AM5/28/21
to webm-d...@webmproject.org
What is the example code you are looking at? And what is d_w? Sorry
its finals week so i may be missing a detail or two because i'm tired
but i don't see d_w, wh, or d_w/d_h anywhere in your code
> To view this discussion on the web visit https://groups.google.com/a/webmproject.org/d/msgid/webm-discuss/TYBP286MB031843E3D6B97698FD4084FABA229%40TYBP286MB0318.JPNP286.PROD.OUTLOOK.COM.

Xi Yang

unread,
May 28, 2021, 1:01:33 AM5/28/21
to webm-d...@webmproject.org
The example code is inside libvpx source code package, simple_decoder program, tools_common.c. I referred to the function vpx_img_write that simply write binary contents of a vpx_img_t to disk. It called functions vpx_img_plane_width and vpx_img_plane_height that uses d_w and d_h to calculate sizes.

The vpx_img_t struct has three sets of members that describe different sizes:
  • w/h for "stored image width/height"
  • d_w/d_h for "displayed image width/height"
  • r_w/r_h for "intended rendering image width/height"


From: MahanStreamer Management <mahans...@gmail.com>
Sent: Friday, May 28, 2021 4:09 AM

MahanStreamer Management

unread,
May 28, 2021, 1:13:26 AM5/28/21
to webm-d...@webmproject.org
I'm about to go to sleep so i will give you a more authoritative
answer later. But the vpx_img_plane_width function seems to adjust the
d_w height only if there is subsampling involved, meaning that you
probably want to be using d_w/d_H instead of w/h. I remember hearing
something about how width/heights can change in between frames (like
with ts files) so maybe the stored image width is just an initial
value and the d_w width is the value that's actually being used.
> To view this discussion on the web visit https://groups.google.com/a/webmproject.org/d/msgid/webm-discuss/TYBP286MB03180825F358EB53B6A22619BA229%40TYBP286MB0318.JPNP286.PROD.OUTLOOK.COM.

Xi Yang

unread,
May 28, 2021, 2:56:41 AM5/28/21
to webm-d...@webmproject.org
I have changed to use d_w and d_h and now the reading of chroma components works well. Thanks a lot for your help!

The remaining problem is the missing alpha channel. Does it actually stored in the WebM video tracks' block data, together with YUV components? Or it is stored separately in some auxiliary position so that I need to extract and decode manually?

From: MahanStreamer Management <mahans...@gmail.com>
Sent: Friday, May 28, 2021 5:13 AM

Xi Yang

unread,
May 28, 2021, 2:05:48 PM5/28/21
to webm-d...@webmproject.org
So as there's only one U/V for every 2x2 pixel, would it be equivalent to just divide X|Y corrdinate to access the UV? Nevertheless, in future I will regenerate the video in 444 so the coordinates should be identical.

In addition, I encoded this video using FFMPEG with YUVA420p, and Firefox could play the video with correct transpareny. But libvpx-parsed image reports it has no alpha channel. Where is it? Should I have extra configurations to make the alpha channel being decoded?

From: err...@gmail.com <err...@gmail.com>
Sent: Thursday, May 27, 2021 12:25 PM

MahanStreamer Management

unread,
Jun 1, 2021, 1:25:23 PM6/1/21
to webm-d...@webmproject.org


On Fri, May 28, 2021 at 2:05 PM Xi Yang <jiand...@msn.com> wrote:
So as there's only one U/V for every 2x2 pixel, would it be equivalent to just divide X|Y corrdinate to access the UV? Nevertheless, in future I will regenerate the video in 444 so the coordinates should be identical.
Do that if you want a slightly larger video file. The hussle of yuv420p is worth it.
You can divide x and y by two to get the RESOLUTION of the chroma planes. (If you are using yuv420) But when you are looking for the chroma that belongs to a certain pixel youll need more than that. If you want you can create your own chroma plane that upscales the chroma plane coming with the image and has the same resolution as the luma.

In addition, I encoded this video using FFMPEG with YUVA420p, and Firefox could play the video with correct transpareny.

Youll soon be using mahanstreamer as well to encode videos :)

But libvpx-parsed image reports it has no alpha channel. Where is it? Should I have extra configurations to make the alpha channel being decoded?
ill look at this later. But generally, if theres no alpha channel, then maybe check to make sure that one is being generated in the first place.

James Zern

unread,
Jun 1, 2021, 5:58:09 PM6/1/21
to WebM Discussion
Hi,

On Fri, May 28, 2021 at 11:05 AM Xi Yang <jiand...@msn.com> wrote:
So as there's only one U/V for every 2x2 pixel, would it be equivalent to just divide X|Y corrdinate to access the UV? Nevertheless, in future I will regenerate the video in 444 so the coordinates should be identical.

In addition, I encoded this video using FFMPEG with YUVA420p, and Firefox could play the video with correct transpareny. But libvpx-parsed image reports it has no alpha channel. Where is it? Should I have extra configurations to make the alpha channel being decoded?

The alpha channel is stored as a separate frame. You'll need a separate decoder for handling the data:
 

Xi Yang

unread,
Jun 1, 2021, 11:48:56 PM6/1/21
to webm-d...@webmproject.org
I noticed the auxiliary frame is fetched by function av_packet_get_side_data. As I'm not using libavcodec, how can I get this in libwebm?

From: 'James Zern' via WebM Discussion <webm-d...@webmproject.org>
Sent: Tuesday, June 1, 2021 9:57 PM
To: WebM Discussion <webm-d...@webmproject.org>

MahanStreamer Management

unread,
Jun 2, 2021, 12:39:18 AM6/2/21
to webm-d...@webmproject.org
If you read the faq, you'll notice the storage of the alpha channel is
out of band and depends on the container. Since webm is based on
matroska, i'm assuming that av packet get side data simply gets what
is in the Block Additional header. So, in order to get the alpha
channel, you would need to somehow get to that part of the file. If
you are not using FFMPEG, then i assume you have your own parser for
mkv or webm? You'll need to start there I guess. Every frame should be
a block, and every block should have an alpha data for another
instance of the decoder in blockadditonal.

If you don't want to use FFMPEG at all, there are some C mkv parsers
not affiliated with FFMPEG .
> To view this discussion on the web visit https://groups.google.com/a/webmproject.org/d/msgid/webm-discuss/TYBP286MB03189ECB2E5F18620FB99600BA3D9%40TYBP286MB0318.JPNP286.PROD.OUTLOOK.COM.

James Zern

unread,
Jun 2, 2021, 6:00:33 PM6/2/21
to WebM Discussion
On Tue, Jun 1, 2021 at 8:48 PM Xi Yang <jiand...@msn.com> wrote:
I noticed the auxiliary frame is fetched by function av_packet_get_side_data. As I'm not using libavcodec, how can I get this in libwebm?

mkvparser doesn't support retrieving the block additional data [1]. The webm_parser [2] does, however. Take a look at the demo [3].

 
Reply all
Reply to author
Forward
0 new messages