VP9 Tiles

Pieter Kapsenberg

unread,

Jun 6, 2013, 12:40:42 AM6/6/13

to webm-d...@webmproject.org

Correct me if I am mistaken, but it looks like the size of each tile is a 32-bit number placed at the end of each tile in the coded stream. So a parallel decoder has to start at the end of the frame, read the size, and keep seeking backwards to find the position of each tile right?

This seems like an unnecessarily challenging design for a streaming hardware decoder. Wouldn't it make much more sense to code the sizes of each tile (except the last one) in the header partition?

In particular, it forces the decoder to buffer the entire frame before it can begin decoding it in parallel, thus introducing more latency than is necessary (undesirable for video conferencing applications).

Ronald Bultje

unread,

Jun 6, 2013, 9:27:22 AM6/6/13

to webm-d...@webmproject.org

Hi Pieter,

if you look at vp9_decodframe.c line 852 in current git experimental branch (http://git.chromium.org/gitweb/?p=webm/libvpx.git;a=blob;f=vp9/decoder/vp9_decodframe.c;h=a9717222ae615b444dbb58ecbb1e7410b50fe081;hb=refs/heads/experimental#l851), the code reads the current tile's offset at the _beginning_ of each tile's data, not the end.

Regardless, you're right that rather than instead of interleaving the tile sizes between the tile data, you could also put it before all actual tile data. I don't think this matters much in terms of complexity. In both cases, the offsets are directly readable:

offset[0] = 0;

for (tile = 1; tile < n_tiles; tile++)

offset[tile] = offset[tile - 1] + read_tile_size(tile - 1);

were read_tile_size(n) is either le32(&data_start[n * 4]) (your approach) or le32(&data_start[offset[n]]) (current approach). The reason we went with this approach is so that encoders can start streaming data without having finished encoding the full frame (particularly with row tiling enabled), since they don't need to know the size of all tiles before the header partition is finished - the size of each tile is just part of its own tile data, not of the header partition. In that design, end-to-end latency can theoretically be improved by (n-1)/(n*fps) seconds on the encoder side, where n is the number of row tiles. This doesn't affect the decoder.

HTH,

Ronald

--
You received this message because you are subscribed to the Google Groups "WebM Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webm-discuss...@webmproject.org.
To post to this group, send email to webm-d...@webmproject.org.
Visit this group at http://groups.google.com/a/webmproject.org/group/webm-discuss/?hl=en.
For more options, visit https://groups.google.com/a/webmproject.org/groups/opt_out.

Pieter Kapsenberg

unread,

Jun 6, 2013, 11:25:33 AM6/6/13

to webm-d...@webmproject.org

Ok, is the plan for this tweak to make it into the final version?

Also - for a single threaded HW design that just consumes stream bytes, can the bool coder rely on the following procedure working?

Decode tile
read bits until byte-aligned
discard 32 bits (the size of last or next tile)
re-init bool coder
Decode next tile
etc...

Ronald Bultje

unread,

Jun 6, 2013, 12:29:23 PM6/6/13

to webm-d...@webmproject.org

Hi Pieter,

the tweak is already in git/experimental, yes.

As for bool decoder; no, we may truncate the bool decoder with trailing zeros, so you need to do it like this:

- (if not last) read tile size, save current data ptr

- read bits

- (if not last) next_data_ptr = data_ptr + tile_size, go back to step 1 for next tile

- (else: last tile) finished

Ronald

Pieter Kapsenberg

unread,

Jun 6, 2013, 12:51:10 PM6/6/13

to webm-d...@webmproject.org

Doesn't this prevent a hardware decoder from being able to stream, if it has to flush and reload the bool coder for each tile?

Reply all

Reply to author

Forward