Extracting Information from Bitstream

30 views

Skip to first unread message

Sebastian

unread,

Dec 3, 2024, 7:14:51 PM12/3/24

to gav1-devel

Hi there,

I'm currently doing a student project where the goal is to extract useful informations out of AV1 bitstreams. Specifically, I'd be interested in getting the residuals/motion vectors per decoded block and then process these blocks further (e.g. for visualization like in AOMAnalyzer).

I've used the gav1_decode example and followed the decoding process, with the following call stack:

DecodeTiles
-> DecodeTilesNonFrameParallel
-> ProcessSuperBlockRow
-> ProcessSuperBlock
-> ProcessPartition
-> ProcessBlock
-> ComputePrediction
-> Residual
-> TransformBlock
-> ReadTransformCoeffcient
-> ReconstructBlock
-> Reconstruct

Now from my understanding is that the entire frame is divided into 4x4 blocks which then get decoded (by a DFS traversal over the superblocks and checking if it already has been decoded). The inverse transforms are applied row/column wise and the result is written back into the residual buffer, where each block holds a reference to.

I still have problems to understand where the residuals are stored and in what format. Before the call to reconstruct, are there only the residuals in the block.residual buffer (and the inverse transform is applied to them)? And since there are different possible plane sizes, how can one access the individual values?

I'd be very happy if someone could help me and provide additional insights.

Best,
Sebastian

James Zern

unread,

Dec 4, 2024, 10:34:27 PM12/4/24

to gav1-devel

Hi Sebastian,

On Tue, Dec 3, 2024 at 4:14 PM Sebastian <mail.bauer...@gmail.com> wrote:

Hi there,

I'm currently doing a student project where the goal is to extract useful informations out of AV1 bitstreams. Specifically, I'd be interested in getting the residuals/motion vectors per decoded block and then process these blocks further (e.g. for visualization like in AOMAnalyzer).

Thanks for your interest in AV1!

I've used the gav1_decode example and followed the decoding process, with the following call stack:

DecodeTiles
-> DecodeTilesNonFrameParallel
-> ProcessSuperBlockRow
-> ProcessSuperBlock
-> ProcessPartition
-> ProcessBlock
-> ComputePrediction
-> Residual
-> TransformBlock
-> ReadTransformCoeffcient
-> ReconstructBlock
-> Reconstruct

Now from my understanding is that the entire frame is divided into 4x4 blocks which then get decoded (by a DFS traversal over the superblocks and checking if it already has been decoded). The inverse transforms are applied row/column wise and the result is written back into the residual buffer, where each block holds a reference to.

Block is a bit of an overloaded term in the bitstream (partition block, transform block, mode info ...). Mode info blocks are specified in terms of 4x4 blocks, which I believe you're referring to here.

I still have problems to understand where the residuals are stored and in what format. Before the call to reconstruct, are there only the residuals in the block.residual buffer (and the inverse transform is applied to them)?

In libgav1, generally in the headers, the code often has comments like 'Section X.Y.Z' or 'X.Y.Z'. These refer to the bitstream specification. For instance, Residual() [1] implements 5.11.34 [2].

It may help to have a look at those as well as the 'Decoding process' section [3].

And since there are different possible plane sizes, how can one access the individual values?

The size of the residual is calculated based on the subsampling for the chroma planes. See GetResidualBufferSize() [4].

I'd be very happy if someone could help me and provide additional insights.

[1]: https://chromium.googlesource.com/codecs/libgav1/+/refs/tags/v0.19.0/src/tile.h#538

[2]: https://aomediacodec.github.io/av1-spec/#residual-syntax

[3]: https://aomediacodec.github.io/av1-spec/#decoding-process

[4]: https://chromium.googlesource.com/codecs/libgav1/+/refs/tags/v0.19.0/src/utils/common.h#499

Reply all

Reply to author

Forward

0 new messages