Re: [chromium-dev] Access spatial and temporal layers using WebAPI's VideoDecoder without WebRTC

Dale Curtis

unread,

Aug 14, 2024, 12:48:26 PM8/14/24

to ma...@shuttle.video, Chromium-dev, media-dev

Yes, each layer is in it's own chunk. See the WPT test for another example:

https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/web_tests/external/wpt/webcodecs/temporal-svc-encoding.https.any.js;l=43;drc=e0e07506b7fc8a8ddd4e9f7799e8e572a8c57612

- dale

On Wed, Aug 14, 2024 at 2:42 AM Matthew Kim <ma...@shuttle.video> wrote:

Following the svc.html example, is there a way I can access the temporal and spatial layers from the WebAPI's VideoDecoder without using WEBRTC?

EncodedVideoChunkMetadata provides the temporal layer ID of that object. Does that mean each layer is encoded into it's own chunk?

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/5e9a73de-1379-4227-829c-e7b35c24142dn%40chromium.org.

Dale Curtis

unread,

Aug 15, 2024, 1:15:50 PM8/15/24

to Matthew Kim, Chromium-dev, media-dev

No, we haven't completed the design for SVC support yet, there's still a lot of discussion ongoing on what shape it should take:

https://github.com/w3c/webcodecs/issues/285

It's unclear what level of hardware / software support SVC would have either, so if this is required for your use case you'll likely need a fallback wasm decoder anyways.

- dale

On Thu, Aug 15, 2024 at 9:29 AM Matthew Kim <ma...@shuttle.video> wrote:

Thank you Dale. Is there any way to get the spatial layers? Seems like it is on the docket: https://github.com/w3c/webcodecs/pull/756, but is there an existing workaround for this? Our use case doesn't involve WebRTC.

Apologies for the broad question, I'm just a chunk of coal and would like some guidance.

Sincerely,
Matthew

Matthew Kim

unread,

Aug 15, 2024, 1:25:22 PM8/15/24

to Dale Curtis, Chromium-dev, media-dev

Very interesting -- thank you!

Another question just to confirm: implementing a SF middlebox would be the developers responsibility? My current thinking is since the VideoEncoder can dispatch video chunks by temporal layers, the underlying mechanism to selectively send which chunks needs to be implemented.

Is there a good example of this? Am I thinking correctly?

Thank you again.

Matthew

Dale Curtis

unread,

Aug 15, 2024, 1:47:35 PM8/15/24

to Matthew Kim, Chromium-dev, media-dev

I'm unfamiliar with the term SF middlebox, but yes the app would be responsible for everything around sending and receiving after/before VideoEncoder/VideoDecoder.

- dale

Matthew Kim

unread,

Aug 15, 2024, 2:41:43 PM8/15/24

to Dale Curtis, Chromium-dev, media-dev

Apologies, when I say SF middlebox, I meant the selective forwarding middlebox.

Thank you for your help! I really appreciate it.

Matthew

Matthew Kim

unread,

Aug 15, 2024, 3:02:48 PM8/15/24

to Dale Curtis, Chromium-dev, media-dev

> It's unclear what level of hardware / software support SVC would have either, so if this is required for your use case you'll likely need a fallback wasm decoder anyways.

Interesting. I don't mean to nit, but do you mean decoder and encoder? Only needing a wasm decoder would be super cool; suppose I configure my VideoEncoder to follow the AV1 codec, then would every EncodedVideoChunk follow an AV1 specific format?

I ask because Chromium's AV1 decoder calls libgav1 to parse and decode the video frame. So would libgav1's decoder plug in nicely in place of VideoDecoder?

The question I mean to ask is: Our use case requires retrieving chunks by spatial and temporal layers. But without reimagining the entire process, I would like to leverage the VideoEncoder/VideoDecoder to get such insight.

I'm inspired by the SVC extension for WebRTC. If they can do it, why can't I?

Thank you again for your super helpful responses.

Matthew

Dale Curtis

unread,

Aug 15, 2024, 3:22:44 PM8/15/24

to Matthew Kim, Chromium-dev, media-dev

Sorry I did mean encoder. I believe the decoders will output the highest layer for av1, vp8, vp9. I'm not sure about h264, h265. I've asked Dan to summarize the results of his investigation on the bug:

https://crbug.com/338929751

To be clear, you should be able to use temporal SVC today w/o issue, it's just spatial SVC that's not wired up. AFAIK that's the extent of what is supported in WebRTC in Chrome.

- dale

Matthew Kim

unread,

Aug 16, 2024, 1:31:53 PM8/16/24

to Dale Curtis, Chromium-dev, media-dev

Thank you Dale. Is there any way to get the spatial layers? Seems like it is on the docket: https://github.com/w3c/webcodecs/pull/756, but is there an existing workaround for this? Our use case doesn't involve WebRTC.

Apologies for the broad question, I'm just a chunk of coal and would like some guidance.

Sincerely,

Matthew

On Wed, Aug 14, 2024 at 12:48 PM Dale Curtis <dalec...@chromium.org> wrote:

Matthew Kim

unread,

Oct 7, 2024, 1:46:41 PM10/7/24

to Dale Curtis, Chromium-dev, media-dev

Hi Dale,

Thank you for referencing the summary! It's great to see that the AV1 codec can output the highest spatial layer. I wrote a proof of concept that does exactly this; the VideoDecoder intercepts and decodes 'L3T3' SVC-encoded video chunks from a WebRTC peer connection.

Given an 'L3T3' encoded chunk, how can we filter and serve video conent by different spatial resolutions? Filtering by temporal layers is quite trivial; for every encoded frame, we only decode frames that have a temporal index lower than or equal to the desired index. For reference, here is the code. For spatial layer filtering, I suspect the logic to be more complex? I was wondering if you had any guides or references related to spatial layer filtering. Or phrasing it differently: given a continuous stream of 'L3T3' SVC-encoded frames, how can we use VideoDecoder to decode frames with the base spatial layer? Or the second layer?