Intent to Ship: MediaStreamTrack Insertable Streams (a.k.a. Breakout Box)

746 views
Skip to first unread message

Guido Urdaneta

unread,
Jul 8, 2021, 9:38:42 AM7/8/21
to blink-dev


Contact emails

h...@chromium.org, top...@chromium.org, gui...@chromium.org


Explainer

https://github.com/w3c/mediacapture-transform/blob/main/explainer.md


Specification

https://w3c.github.io/mediacapture-transform/


API spec

Yes


Design docs


https://w3c.github.io/mediacapture-transform/

https://github.com/w3c/mediacapture-transform/blob/main/explainer.md


Summary

This feature defines an API surface for manipulating raw media carried by MediaStreamTracks such as the output of a camera, microphone, screen capture, and to programmatically produce MediaStreamTracks from raw media frames. It uses WebCodecs interfaces to represent raw media frames and exposes them using streams. We aim to ship this API at the same time as WebCodecs.



Blink component

Blink>MediaStream


TAG review

https://github.com/w3ctag/design-reviews/issues/603


TAG review status

Complete. 

Resolution: satisfied.


Risks



Interoperability and Compatibility

The main interoperability risk is that other browsers do not implement it, which is likely, at least in the short run. There is no compatibility risk since existing behavior is unaffected if the feature is not used.


WebKit: Negative. See https://github.com/w3c/mediacapture-transform/issues/created_by/youennf

They object to:

They have been working on a callback-based alternative over the last few months. A draft of their ideas is available at https://github.com/youennf/mediacapture-extensions/blob/main/video-raw-transform.md.



Gecko: Negative. Their main objections are audio support and allowing processing on the Window scope. See:

https://github.com/w3c/mediacapture-transform/issues/38

All the issues raised in that review have been addressed, except for the points of disagreement:

https://github.com/w3c/mediacapture-transform/issues/23 

https://github.com/w3c/mediacapture-transform/issues/29

They do not object to the stream-based shape of MediaStreamTrackProcessor and MediaStreamTrackGenerator, but part of their objection to the Window scope is that they oppose relying on transferable streams for Worker support and would instead prefer to wait for transferable MediaStreamTracks to be specified and implemented.


The objection of WebKit and Gecko to allowing this API on the Window scope is similar to their objection to allowing the WebCodecs API on the Window scope. In this regard, we intend to align with the resolution of the WebCodecs issue.


Web developers: Positive.


From Origin Trial participants:


Zoom:

They use the API together with WebCodecs. They find the stream-based API easy to use as well as the WebCodecs VideoFrame API. They currently use video, but plan to experiment with audio as well. They use the worker scope.


bash.video:

They use the API in both the Worker and Window scope, depending on the application type. They find the stream-based API easy to use as well as the WebCodecs VideoFrame API. They use only the video part of the API.


More feedback from OT participants, including participants who did not authorize public sharing of their replies (accessible only to Googlers): https://docs.google.com/document/d/1LQMrD1rIv1tJG5Z_x6VXaC5vAa3N9bbiAjXaoxDzJek/edit?usp=sharing


Public version of the above document, containing only feedback authorized to be made public, already summarized above: https://docs.google.com/document/d/1DBmr-VpThkr6uqTn39npXzxGVq1YHFmx3PW51Bv09ws/edit?usp=sharing


The general feedback from all OT participants is that they think the Streams API and the VideoFrame API (from WebCodecs) are easy to use in the context of the Breakout Box API, although some participants have not evaluated it yet for all their intended use cases.  Video is used far more than audio, but audio is used or planned to be used by some participants. Having the same API for both is mentioned as an advantage. Those who use Breakout Box together with WebCodecs think the integration is good. 



From Twitter:  

https://twitter.com/search?q=url%3Ahttps%3A%2F%2Fweb.dev%2Fmediastreamtrack-insertable-media-processing%2F%20-from%3A%40tomayac&src=typed_query&f=live


Ergonomics

* Are there any other platform APIs this feature will frequently be used in tandem with?

This API will be used together with:

  • Other MediaStream and WebRTC related APIs, such as getUserMedia, getDisplayMedia, and RTCPeerConnection. 

  • Transferable streams, in order to move processing to a worker.

  • It is expected that in many cases it will be used together with WebCodecs. It uses the WebCodecs interfaces to represent raw media, which makes integration with WebCodecs encoders and decoders straightforward.



* Could the default usage of this API make it hard for Chrome to maintain good performance (i.e. synchronous return, must run on a certain thread, guaranteed return timing)? 

No.



Activation

* Will it be challenging for developers to take advantage of this feature immediately, as-is? 

No.


* Would this feature benefit from having polyfills, significant documentation and outreach, and/or libraries built on top of it to make it easier to use?

No.



Security

This API defines a MediaStreamTrack source and a MediaStreamTrack sink. The security and privacy of the source (MediaStreamTrackGenerator) relies on the same-origin policy. That is, the data MediaStreamTrackGenerator can make available in the form of a MediaStreamTrack must be visible to the document before a VideoFrame or AudioData object can be constructed and pushed into the MediaStreamTrackGenerator. Any attempt to create VideoFrame or AudioData objects using cross-origin data will fail.


The MediaStreamTrack sink introduced by this API (MediaStreamTrackProcessor) exposes the same data that is exposed by other MediaStreamTrack sinks such as WebRTC peer connections, Web Audio MediaStreamAudioSourceNode and media elements. The security and privacy of MediaStreamTrackProcessor relies on the security and privacy of the sources of the tracks to which MediaStreamTrackProcessor is connected. For example, camera, microphone and screen-capture tracks rely on explicit use authorization via permission dialogs, while element capture and MediaStreamTrackGenerator rely on the same-origin policy.


A potential issue with MediaStreamTrackProcessor is resource exhaustion. For example, a site might hold on to too many open VideoFrame objects and deplete a system-wide pool of GPU-memory-backed frames. The Chromium implementation mitigates this risk by limiting the number of pool-backed frames a renderer process can hold. Accidental exhaustion is also mitigated by automatic closing of VideoFrame and AudioData objects once they are written to a MediaStreamTrackGenerator.


All the code for this feature runs in the sandboxed renderer process and therefore respects the Rule of Two.



Is this feature fully tested by web-platform-tests?

Yes


Flag name

MediaStreamInsertableStreams


Tracking bug

https://crbug.com/1142955


Launch bug

https://crbug.com/1146805


Sample links

https://webrtc.github.io/samples/src/content/insertable-streams/video-processing/

https://webrtc.github.io/samples/src/content/insertable-streams/audio-processing/ 

Guido Urdaneta

unread,
Jul 9, 2021, 11:44:51 AM7/9/21
to Guido Urdaneta, blink-dev

Guido Urdaneta

unread,
Jul 10, 2021, 10:33:01 AM7/10/21
to Guido Urdaneta, blink-dev
And another one, posted yesterday: https://note.com/skyway/n/ned76e8596d9c#uQeH1

Alex Russell

unread,
Jul 15, 2021, 8:56:17 PM7/15/21
to blink-dev, Guido Urdaneta, blink-dev, jungke...@microsoft.com
Hey all,

Thanks for all the context here, Guido. The support from partners is particularly encouraging given the headwinds from other potential implementers. It's also good to hear that there are teams that are using BreakoutBox's Streams API with the callback-based FrameDecoder classes from WebCodecs without issue. I'll admit that it has been a bit confusing to track which feature is using which style, but knowing they compose well is helpful.

Can add to this list that there are teams at Microsoft who are keen to use it and have prototypes they're happy with. IIRC, they are not using the API from within workers, although Jungkee can confirm.

In general, every engine is going to need transferrable streams sooner or later, so that doesn't strike me as a reason not to use them here.

LGTM1

Yoav Weiss

unread,
Jul 16, 2021, 12:19:45 AM7/16/21
to Alex Russell, blink-dev, Guido Urdaneta, jungke...@microsoft.com
LGTM2 to ship.

It seems like there are multiple contentious points here:
  • Exposure to Window contexts
  • Stream-based API shape
  • Support for audio
The OT enabled us to have a clear signal from developers that Window exposure is important, and does not imply performance issues. It similarly enabled us to see that audio support is important and is something developers are using and planning to use in the future. Finally, it showed that the stream based API is something developers find easy to use, and is at the very least a viable candidate.

On the point of Window exposure, I appreciated the intent owner's willingness to align with whatever decision WebCodecs will take. I hope the latter's decision would align with the clear signals we have from the OT regarding developers' needs on that front.

On the point of API shape, looking at API shape discussions, it doesn't seem like we have a high probability of convergence. The discussion there talks about a competing proposal, but it doesn't seem like there's been progress on that in the last 2 months. We also don't have clear timelines on how long we can expect this particular bikeshed to continue.

Given that, it seems like we have 2 options:
1) Ship the design that's been field-tested with real applications that want to use this API yesterday, and do that in the face of opposition from other vendors.
2) Wait for another N months for a competing proposal to emerge, and then weigh that theoretical design's pros and cons. We would probably need to do all that while the Origin Trial is dragging on and being (ab)used as a "tentative launch", since this feature is currently benefiting real users of real applications.

None of these options is ideal, but (2) seems categorically worse. As such, I support shipping this API in its current form.


 

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/afc5ac3f-8129-4b67-90bf-3b3ea9baaf9bn%40chromium.org.

Chris Harrelson

unread,
Jul 19, 2021, 2:34:24 PM7/19/21
to Yoav Weiss, Alex Russell, blink-dev, Guido Urdaneta, jungke...@microsoft.com
Thanks for the thorough description of the tradeoffs in this intent, and also the very detailed and useful origin trial feedback. It was very useful and informative!

I have two comments/questions:

1. Regarding exposure on Window: you mention that you intend to align with the decision on WebCodecs. That makes sense to me. Has this point been raised with the working group also, and is it being included in the WebCodecs decision process? Given the intended alignment, I would also prefer to give final approval to this intent only when that decision has been made. (I don't think that would cause any extra delays, but let me know if you have a concern with that.)

2. Regarding audio: my understanding of the OT feedback is that multiple participants thought audio in the same system makes sense, but that they didn't actually use the audio part of the API as yet. Is that correct? If so, how about just shipping the video part for now since it has more consensus and experience, and coming back to audio once partners have an opportunity to try it and see if it has definite advantages vs AudioWorklet (which AIUI is the alternative suggestion raised by Mozilla)? Perhaps that new feedback and experience in practice would help to inform and drive consensus in the working group.

Thanks,
Chris


Dale Curtis

unread,
Jul 19, 2021, 3:13:45 PM7/19/21
to Chris Harrelson, Yoav Weiss, Alex Russell, blink-dev, Guido Urdaneta, jungke...@microsoft.com
On Mon, Jul 19, 2021 at 11:34 AM Chris Harrelson <chri...@chromium.org> wrote:
Thanks for the thorough description of the tradeoffs in this intent, and also the very detailed and useful origin trial feedback. It was very useful and informative!

I have two comments/questions:

1. Regarding exposure on Window: you mention that you intend to align with the decision on WebCodecs. That makes sense to me. Has this point been raised with the working group also, and is it being included in the WebCodecs decision process? Given the intended alignment, I would also prefer to give final approval to this intent only when that decision has been made. (I don't think that would cause any extra delays, but let me know if you have a concern with that.)

The Media WG that WebCodecs is a part of is aware of this linkage, but isn't thrilled by it given the already large scope of our discussions there. The relevant minutes are here:

jean-ivar: Google announced an intent to ship of a new API, the question of whether to expose that API to window was tied to this issue
jean-ivar: i would question whether it is correct to tie those decisions together
chcunningham: the googlers here are not the same people who worked on that intent-to-ship
chcunningham: lets not stall or increase the scope of this conversation by pulling in that intent-to-ship into this WG
jan-ivar: that makes sense, and that's why i wanted to bring this up; others in the WG might not have been aware of it
chcunningham: the two APIs are tied together in origin trials, but they're separate Apis and implementations in our process
cpn: in terms of the WG, Insertable Streams is not a deliverable of this WG; it doesn't have a home or an official status

 

Jungkee Song

unread,
Jul 19, 2021, 3:57:15 PM7/19/21
to blink-dev, Chris Harrelson, sligh...@chromium.org, blink-dev, Guido Urdaneta, Jungkee Song, yoav...@chromium.org

I'd like to share feedback from partner teams at Microsoft that have experimented with this feature. They prototyped part of their code with this proposed API to replace existing code that inevitably used canvas and timer APIs or requestAnimationFrame to create custom effects to the media streams.

They used this API in window context and confirmed it significantly improved the performance and predictability in runtime. Also, it allowed them to count on media stream parameters directly without having to hypothetically adjust the output video quality. They also provided feedback that the proposed API has concrete advantages over the callback approach including enabling support for buffering of VideoFrames and additional potential for performance optimization.

The partner teams at Microsoft are keen to use this proposed API asap to deliver improved experience they experimented for their users.


Thanks,

Jungkee

Guido Urdaneta

unread,
Jul 19, 2021, 4:31:09 PM7/19/21
to Chris Harrelson, Yoav Weiss, Alex Russell, blink-dev, Guido Urdaneta, jungke...@microsoft.com
Hi,

See replies below.

On Mon, Jul 19, 2021 at 8:34 PM Chris Harrelson <chri...@chromium.org> wrote:
Thanks for the thorough description of the tradeoffs in this intent, and also the very detailed and useful origin trial feedback. It was very useful and informative!

I have two comments/questions:

1. Regarding exposure on Window: you mention that you intend to align with the decision on WebCodecs. That makes sense to me. Has this point been raised with the working group also, 
The decision to align with the WebCodecs decision has not been raised to the WebRTC WG. The rationale for the alignment is twofold:
1. The members on the WebRTC WG opposing Window exposure for the functionality provided by this API are the same ones opposing the exposure of WebCodecs on the Window scope on the Media WG, and have provided the same arguments in both cases. Therefore, the discussion is very similar, even though they are two separate APIs.
2. More importantly, if the VideoFrame and AudioData interfaces from WebCodecs are not exposed on the Window scope, the usefulness of the Breakout Box API on Window is significantly limited. 
Decoupling from the WebCodecs decision is acceptable to us too, but Window support in Breakout Box is coupled at least to VideoFrame and AudioData being exposed on Window.

 
and is it being included in the WebCodecs decision process?
This is not and should not be part of the WebCodecs decision process. We are choosing to align with the WebCodecs decision for the reasons mentioned above. 

 
Given the intended alignment, I would also prefer to give final approval to this intent only when that decision has been made. (I don't think that would cause any extra delays, but let me know if you have a concern with that.)

This API has a hard dependency on WebCodecs, so it cannot ship before WebCodecs ships. My understanding is that WebCodecs will not ship until that decision is made, so this API cannot ship either until then, independently of whether we align with the decision or not. Approving or delaying approval to ship cannot change that timeline. 

 
2. Regarding audio: my understanding of the OT feedback is that multiple participants thought audio in the same system makes sense, but that they didn't actually use the audio part of the API as yet. Is that correct? 
The part about participants thinking audio in the same system makes sense is correct.
The part about participants not using the audio part yet is incorrect. At least one partner is using the audio API (maybe their email reply was not very explicit, but they confirmed through other channels that they are using the audio API). 
There is another partner that has used the audio API for internal experiments, but has not decided yet to release products based on it. They found no issues with the audio API. Their decision to ship a product based on it depends on other factors.
There is evidence of usage of the audio API by other developers. See for example Zenn: Handle Audio with MediaStreamTrackProcessor.


 
f so, how about just shipping the video part for now since it has more consensus and experience, and coming back to audio once partners have an opportunity to try it and see if it has definite advantages vs AudioWorklet (which AIUI is the alternative suggestion raised by Mozilla)? Perhaps that new feedback and experience in practice would help to inform and drive consensus in the working group.

Based on audio usage by some partners as described above, I would strongly prefer to ship with audio support.

Thanks,
GU

Dale Curtis

unread,
Jul 29, 2021, 2:13:35 PM7/29/21
to Guido Urdaneta, Chris Harrelson, Yoav Weiss, Alex Russell, blink-dev, jungke...@microsoft.com
The media wg has posted the chair's decision on window/worker for WebCodecs:

Relevant to this thread is:
> It was brought up at the July 13th Media WG meeting [6] that there is a separate but parallel discussion ongoing around the proposed MediaStreamTrack Insertable Media Processing using Streams API [7]. See the Chrome Intent to Ship for this feature [8] for details.
>
> A decision to expose WebCodecs in Window context should not be interpreted as precedent for allowing the proposed MediaStreamTrackProcessor in Window. That proposal should be discussed and considered on its own merits.

- dale


Guido Urdaneta

unread,
Aug 6, 2021, 3:47:47 AM8/6/21
to Chris Harrelson, Yoav Weiss, Alex Russell, blink-dev, Guido Urdaneta, jungke...@microsoft.com
The WebCodecs decision on Window exposure has been made (the decision is to expose on Window).
While the Media WG says this should not be used as precedent for this intent, we have already explained the reasons why it made sense for this intent to align with that decision. Please let us know if there is any concern left before final approval for this intent can be given. 

Thanks,
GU


On Mon, Jul 19, 2021 at 8:34 PM Chris Harrelson <chri...@chromium.org> wrote:

Alex Russell

unread,
Aug 12, 2021, 3:28:08 PM8/12/21
to blink-dev, Guido Urdaneta, Yoav Weiss, Alex Russell, blink-dev, jungke...@microsoft.com, Chris Harrelson
Apologies for the delay here. I think we've been tripped up by OWNERS vacation schedules. Hopefully we'll have more input shortly.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Chris Harrelson

unread,
Aug 16, 2021, 7:52:33 PM8/16/21
to Alex Russell, blink-dev, Guido Urdaneta, Yoav Weiss, jungke...@microsoft.com
LGTM3 to ship as you currently designed and implemented the feature (i.e. with audio, and on Window).



To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/c5ac7e6d-6f5d-4456-9cab-9a3a94941f9an%40chromium.org.
Reply all
Reply to author
Forward
0 new messages