Intent to Ship: Audio Output Devices API

736 views
Skip to first unread message

Guido

unread,
Oct 23, 2015, 1:03:00 PM10/23/15
to blink-dev

Intent to Ship: Audio Output Devices API


Contact emails

gui...@chromium.org, h...@chromium.org, to...@chromium.org


Spec

https://w3c.github.io/mediacapture-output/


Summary

This feature incorporates a set of JavaScript APIs that let a Web application direct the audio output of a media element to authorized devices other than the system or user agent default.


Link to “Intent to Implement” blink-dev discussion

https://groups.google.com/a/chromium.org/forum/#!searchin/blink-dev/intent$20to$20implement$20audio$20output$20devices$20api/blink-dev/Ci7pSnHGMmo/Yw6E2bR0CE8J


Is this feature supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

No. It is supported on all desktop platforms (Windows, Mac, Linux, and Chrome OS).

It is not supported on Android (including WebView) because the Android platform does not provide the ability to programmatically switch individual audio streams to different audio devices. We plan to support Android once the platform provides the required underlying support.


Demo link

https://webrtc.github.io/samples/src/content/devices/multi/

https://guidou.github.io/setsinkid-demo.html


Compatibility Risk

There is some compatibility risk, as we are not aware of any plans to provide support for this feature in other browsers. The risk is mitigated by the fact that the surface area of the change is small (only the setSinkId function and the sinkId field on HTML media elements), and the fact that we do not expect the specification to change in an important way in the future with regards to these extensions.

The compatibility risk has not changed since the intent to implement.


OWP launch tracking bug

https://crbug.com/547091


Entry on the feature dashboard

https://www.chromestatus.com/feature/4621603249848320


Other relevant links and info

Security and privacy are being reviewed at https://crbug.com/545992

The feature has been behind a flag since M45.

Chris Harrelson

unread,
Oct 23, 2015, 4:19:48 PM10/23/15
to Guido, blink-dev
Hi,

Have you filed for or completed a W3C TAG Review?

Chris

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Guido

unread,
Oct 23, 2015, 4:57:45 PM10/23/15
to blink-dev, gui...@chromium.org, chri...@chromium.org, Harald Alvestrand

+hta

Harald,

Do you know the status of this?

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Harald Alvestrand

unread,
Oct 24, 2015, 7:46:42 AM10/24/15
to Guido, blink-dev, Chris Harrelson, Harald Alvestrand
I'm the tech lead for the group, as well as a chair of the WEBRTC WG.

I'll check with my staff contact on the idea of a W3C TAG review - the WEBRTC/TAG reviews I've participated in in the past have been rather high level, more about keeping the TAG in touch than about anything else.

It's news to me that we gate anything on TAG reviews - when did that wrinkle get added?



To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

PhistucK

unread,
Oct 24, 2015, 7:48:04 AM10/24/15
to Harald Alvestrand, Guido, blink-dev, Chris Harrelson, Harald Alvestrand
A few months ago.


PhistucK

Yehonathan Sharvit

unread,
Oct 25, 2015, 1:43:09 AM10/25/15
to blink-dev, h...@google.com, gui...@chromium.org, chri...@chromium.org, h...@chromium.org
Why the `setSinkId` function is not available on the AudioContext?
When will it be available?
I'm on M45 with the flag.

Guido

unread,
Oct 25, 2015, 4:45:51 AM10/25/15
to blink-dev, h...@google.com, gui...@chromium.org, chri...@chromium.org, h...@chromium.org
On Sunday, October 25, 2015 at 6:43:09 AM UTC+1, Yehonathan Sharvit wrote:
Why the `setSinkId` function is not available on the AudioContext?
When will it be available?
I'm on M45 with the flag.


The main reason is the spec declares the WebAudio extensions as "a work in progress", and presents 4 possible implementation options, all open for discussion. We prefer to first ship the HTMLMediaElement extesions, which are well defined, and work on the WebAudio extensions when the spec reflects the results of the WebAudio discussion and is no longer a work in progress.

Rick Byers

unread,
Oct 26, 2015, 2:52:14 PM10/26/15
to Guido, blink-dev, Harald Alvestrand, Chris Harrelson, Harald Alvestrand, Chris Wilson
Seems like there hasn't been much progress on this WebAudio issue.  In general we like our high-level features to be explained by low-level primitives, but it can be reasonable to ship the higher level features that have consensus while still working out details in the primitives.  Chris, what's your opinion on shipping the HTMLMediaElement piece here before having the WebAudio piece ready?

Where are issues tracked for this spec?  There's no GitHub or Bugzilla link in the spec for issue tracking as far as I can see.

Also, have there been any conversations with other browser vendors about this API at all?  We should at least be reaching out to see if anyone has any interest in helping to define this API before we ship it.  But if we've asked and it's just not interesting to any of the other vendors, then that's fine.

Harald Alvestrand

unread,
Oct 26, 2015, 7:21:28 PM10/26/15
to Rick Byers, Guido, blink-dev, Chris Harrelson, Harald Alvestrand, Chris Wilson
There's ongoing discussion in the W3C (I have a meeting this afternoon about it).
My sense is that people are looking for a superset, not a subset, of the functionality that's covered by the intent-to-ship.

You can find the github and bug tracker via the WG page:

Please file a bug saying that the bugtracker link should be in the document!

Philip Jägenstedt

unread,
Oct 29, 2015, 11:14:58 AM10/29/15
to Harald Alvestrand, Alex Russell, Rick Byers, Guido, blink-dev, Chris Harrelson, Harald Alvestrand, Chris Wilson
This feature is a lot like MediaSession in that it needs to interact with audio output, but since there is no suitable primitive for that on the web platform we end up extending both HTMLMediaElement and AudioContext. For both this intent and MediaSession, shipping with only support for HTMLMediaElement isn't out of the question, but how much extra work would it be to finish the Web Audio integration? There is a risk that this reveals some issues that would affect the HTMLMediaElement API as well, after all.

I have an old issue on the spec, and there's also a naming question that would be good to explicitly accept or reject before shipping.

Looking at the implementation, I also see that there's a failure case that's not in the spec, namely the AbortError if there is no WebMediaPlayer backend when setSinkId() is called. Can't this case simply be handled, so that the correct output device is used once needed, much like volume can be set in advance? If not, then this is similar to a MediaSession issue, where we don't want to allow assigning mediaElement.session if there is a media player backend and thus a chance that audio is already playing. We did that with a networkState check. Perhaps the inverse of that check, or checking readyState>HAVE_NOTHING would work for the spec. guido@, since you wrote the code in question, can you file a spec issue describing what the requirements are?

I have filed a TAG review issue. Alex, can you make sure that gets some attention?

Philip

Guido

unread,
Oct 29, 2015, 1:20:45 PM10/29/15
to blink-dev, h...@google.com, sligh...@google.com, rby...@chromium.org, gui...@chromium.org, chri...@chromium.org, h...@chromium.org, cwi...@google.com
On Thursday, October 29, 2015 at 4:14:58 PM UTC+1, Philip Jägenstedt wrote:
This feature is a lot like MediaSession in that it needs to interact with audio output, but since there is no suitable primitive for that on the web platform we end up extending both HTMLMediaElement and AudioContext. For both this intent and MediaSession, shipping with only support for HTMLMediaElement isn't out of the question, but how much extra work would it be to finish the Web Audio integration? There is a risk that this reveals some issues that would affect the HTMLMediaElement API as well, after all.


Most of the work to implement this feature was improvements in the Chromium audio stack to make it better at supporting different audio output devices.
The Blink work for the HTMLMediaElement extensions is, for the most part, plumbing to let WebMediaPlayer implementations take advantage of the improvements in the Chromium lower layers. 
The WebAudio integration work will be largely orthogonal to the HTMLMediaElement work, so I think it's OK to ship them separately.

 
I have an old issue on the spec, and there's also a naming question that would be good to explicitly accept or reject before shipping.

Looking at the implementation, I also see that there's a failure case that's not in the spec, namely the AbortError if there is no WebMediaPlayer backend when setSinkId() is called. Can't this case simply be handled, so that the correct output device is used once needed, much like volume can be set in advance? If not, then this is similar to a MediaSession issue, where we don't want to allow assigning mediaElement.session if there is a media player backend and thus a chance that audio is already playing. We did that with a networkState check. Perhaps the inverse of that check, or checking readyState>HAVE_NOTHING would work for the spec. guidou@, since you wrote the code in question, can you file a spec issue describing what the requirements are?


The case when there is no WebMediaPlayer is being addressed in this CL. We are handling it similarly to the volume case, performing validity checks on the sink ID and saving it for future use if the checks pass. This is consistent with the current text of the spec.

On a related note, this feature recently got approval from the privacy and security reviews (crbug.com/545992).


 
I have filed a TAG review issue. Alex, can you make sure that gets some attention?

Philip
On Tue, Oct 27, 2015 at 12:21 AM, 'Harald Alvestrand' via blink-dev <blin...@chromium.org> wrote:
There's ongoing discussion in the W3C (I have a meeting this afternoon about it).
My sense is that people are looking for a superset, not a subset, of the functionality that's covered by the intent-to-ship.

You can find the github and bug tracker via the WG page:

Please file a bug saying that the bugtracker link should be in the document!


On Mon, Oct 26, 2015 at 7:51 PM, Rick Byers <rby...@chromium.org> wrote:
Seems like there hasn't been much progress on this WebAudio issue.  In general we like our high-level features to be explained by low-level primitives, but it can be reasonable to ship the higher level features that have consensus while still working out details in the primitives.  Chris, what's your opinion on shipping the HTMLMediaElement piece here before having the WebAudio piece ready?

Where are issues tracked for this spec?  There's no GitHub or Bugzilla link in the spec for issue tracking as far as I can see.

Also, have there been any conversations with other browser vendors about this API at all?  We should at least be reaching out to see if anyone has any interest in helping to define this API before we ship it.  But if we've asked and it's just not interesting to any of the other vendors, then that's fine.

On Sun, Oct 25, 2015 at 4:45 AM, Guido <gui...@chromium.org> wrote:
On Sunday, October 25, 2015 at 6:43:09 AM UTC+1, Yehonathan Sharvit wrote:
Why the `setSinkId` function is not available on the AudioContext?
When will it be available?
I'm on M45 with the flag.


The main reason is the spec declares the WebAudio extensions as "a work in progress", and presents 4 possible implementation options, all open for discussion. We prefer to first ship the HTMLMediaElement extesions, which are well defined, and work on the WebAudio extensions when the spec reflects the results of the WebAudio discussion and is no longer a work in progress.


To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Philip Jägenstedt

unread,
Nov 2, 2015, 8:58:15 AM11/2/15
to Guido, Kenneth Russell, Raymond Toy, blink-dev, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org, Chris Wilson
On Thu, Oct 29, 2015 at 6:20 PM, Guido <gui...@chromium.org> wrote:
On Thursday, October 29, 2015 at 4:14:58 PM UTC+1, Philip Jägenstedt wrote:
This feature is a lot like MediaSession in that it needs to interact with audio output, but since there is no suitable primitive for that on the web platform we end up extending both HTMLMediaElement and AudioContext. For both this intent and MediaSession, shipping with only support for HTMLMediaElement isn't out of the question, but how much extra work would it be to finish the Web Audio integration? There is a risk that this reveals some issues that would affect the HTMLMediaElement API as well, after all.


Most of the work to implement this feature was improvements in the Chromium audio stack to make it better at supporting different audio output devices.
The Blink work for the HTMLMediaElement extensions is, for the most part, plumbing to let WebMediaPlayer implementations take advantage of the improvements in the Chromium lower layers. 
The WebAudio integration work will be largely orthogonal to the HTMLMediaElement work, so I think it's OK to ship them separately.

How does Web Audio integrate with the lower layers of the audio stack where support for different audio output devices has been added?

Even if shipping separately is an option, I think we must at least have high confidence in what the API will look like for Web Audio, because certain combinations of HTMLMediaElement and AudioContext extensions are harder to justify than others.

In particular, having AudioContext.addSinkId()/removeSinkId() but restricting HTMLMediaElement to just one audio output device (media files with audio tracks in multiple languages exist) would be sad.

Also, the `new AudioContext({ sinkId: requestedSinkId })` doesn't go great with an HTMLMediaElement solution that's allowed to change the sink at any time, because using createMediaStreamDestination() you could get the AudioContext output into an HTMLMediaElement.

This really only leaves the setSinkId() proposal for AudioContext, so is this the API you expect to implement? modules/webaudio/OWNERS, do you think that API will work?

I have an old issue on the spec, and there's also a naming question that would be good to explicitly accept or reject before shipping.

Looking at the implementation, I also see that there's a failure case that's not in the spec, namely the AbortError if there is no WebMediaPlayer backend when setSinkId() is called. Can't this case simply be handled, so that the correct output device is used once needed, much like volume can be set in advance? If not, then this is similar to a MediaSession issue, where we don't want to allow assigning mediaElement.session if there is a media player backend and thus a chance that audio is already playing. We did that with a networkState check. Perhaps the inverse of that check, or checking readyState>HAVE_NOTHING would work for the spec. guidou@, since you wrote the code in question, can you file a spec issue describing what the requirements are?


The case when there is no WebMediaPlayer is being addressed in this CL. We are handling it similarly to the volume case, performing validity checks on the sink ID and saving it for future use if the checks pass. This is consistent with the current text of the spec.

Yep, that makes sense to me!
 
On a related note, this feature recently got approval from the privacy and security reviews (crbug.com/545992).

That issue is protected, but I'll take your word for it :)

Philip 

Guido

unread,
Nov 5, 2015, 6:26:58 AM11/5/15
to blink-dev, gui...@chromium.org, k...@chromium.org, rt...@chromium.org, h...@google.com, sligh...@google.com, rby...@chromium.org, chri...@chromium.org, h...@chromium.org, cwi...@google.com


On Monday, November 2, 2015 at 2:58:15 PM UTC+1, Philip Jägenstedt wrote:
On Thu, Oct 29, 2015 at 6:20 PM, Guido <gui...@chromium.org> wrote:
On Thursday, October 29, 2015 at 4:14:58 PM UTC+1, Philip Jägenstedt wrote:
This feature is a lot like MediaSession in that it needs to interact with audio output, but since there is no suitable primitive for that on the web platform we end up extending both HTMLMediaElement and AudioContext. For both this intent and MediaSession, shipping with only support for HTMLMediaElement isn't out of the question, but how much extra work would it be to finish the Web Audio integration? There is a risk that this reveals some issues that would affect the HTMLMediaElement API as well, after all.


Most of the work to implement this feature was improvements in the Chromium audio stack to make it better at supporting different audio output devices.
The Blink work for the HTMLMediaElement extensions is, for the most part, plumbing to let WebMediaPlayer implementations take advantage of the improvements in the Chromium lower layers. 
The WebAudio integration work will be largely orthogonal to the HTMLMediaElement work, so I think it's OK to ship them separately.

How does Web Audio integrate with the lower layers of the audio stack where support for different audio output devices has been added?


One of the main integration points is the WebAudioDevice interface. In the Chromium implementation of this interface, an actual Chromium audio output device is created and an audio render callback function is supplied. The current implementation of this interface is hardcoded to use the default device (see here). 
Supporting multiple devices would require being able to supply the WebAudioDevice implementation with the ID of the sink to use, so that it can be passed to the Chromium AudioDeviceFactory.
WebAudioDevice is used by AudioDestination.

The HTMLMediaElement extensions integrate with Chromium using the WebMediaPlayer interface. WebMediaPlayer integrates with WebAudio using the WebAudioSourceProvider interface.  
WebMediaPlayer implementations use this to redirect audio to WebAudio. In this case, audio to the WebMediaPlayer is to WebAudio instead of the Chromium audio device, so the media player's sink ID is effectively ignored. This is what happens when an HTMLMediaElement outputs to WebAudio.

 
Even if shipping separately is an option, I think we must at least have high confidence in what the API will look like for Web Audio, because certain combinations of HTMLMediaElement and AudioContext extensions are harder to justify than others.

In particular, having AudioContext.addSinkId()/removeSinkId() but restricting HTMLMediaElement to just one audio output device (media files with audio tracks in multiple languages exist) would be sad.

Also, the `new AudioContext({ sinkId: requestedSinkId })` doesn't go great with an HTMLMediaElement solution that's allowed to change the sink at any time, because using createMediaStreamDestination() you could get the AudioContext output into an HTMLMediaElement.

This really only leaves the setSinkId() proposal for AudioContext, so is this the API you expect to implement? modules/webaudio/OWNERS, do you think that API will work?


At this point I don't know what is the best choice to add support for multiple devices to WebAudio, so we need to think more about it.
However, I don't see that as an impediment to ship the current HTMLMediaElement extensions, since their behavior should not change due to the WebAudio integration work.
The reason is that the audio stack is accessed independently in both cases.
As for the integration between them, I see two main cases:
1. WebAudio can output to an HTMLMediaElement. In this case the sink is determined by the HTMLMediaElement.
2. the HTMLMediaElement can output to WebAudio. In this case the sink is determined by WebAudio.
Both cases work today and I don't expect that to change after we allow WebAudio to output to different devices.
What do you think?

Yehonathan Sharvit

unread,
Nov 5, 2015, 6:47:08 AM11/5/15
to Guido, blink-dev, k...@chromium.org, rt...@chromium.org, h...@google.com, sligh...@google.com, rby...@chromium.org, chri...@chromium.org, h...@chromium.org, Chris Wilson
Guido, what do you mean by "WebAudio can output to HTMLMediaElement"? It would be nice to share piece of code or documentation for this case. 


--
You received this message because you are subscribed to a topic in the Google Groups "blink-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/blink-dev/ELz4SxMwa0U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blink-dev+...@chromium.org.



--
"Are we what we become or do we become what we are?" - Prof. Beno Gross

Philip Jägenstedt

unread,
Nov 5, 2015, 8:20:28 AM11/5/15
to Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org, Chris Wilson
On Thu, Nov 5, 2015 at 12:26 PM, Guido <gui...@chromium.org> wrote:


On Monday, November 2, 2015 at 2:58:15 PM UTC+1, Philip Jägenstedt wrote:
On Thu, Oct 29, 2015 at 6:20 PM, Guido <gui...@chromium.org> wrote:
On Thursday, October 29, 2015 at 4:14:58 PM UTC+1, Philip Jägenstedt wrote:
This feature is a lot like MediaSession in that it needs to interact with audio output, but since there is no suitable primitive for that on the web platform we end up extending both HTMLMediaElement and AudioContext. For both this intent and MediaSession, shipping with only support for HTMLMediaElement isn't out of the question, but how much extra work would it be to finish the Web Audio integration? There is a risk that this reveals some issues that would affect the HTMLMediaElement API as well, after all.


Most of the work to implement this feature was improvements in the Chromium audio stack to make it better at supporting different audio output devices.
The Blink work for the HTMLMediaElement extensions is, for the most part, plumbing to let WebMediaPlayer implementations take advantage of the improvements in the Chromium lower layers. 
The WebAudio integration work will be largely orthogonal to the HTMLMediaElement work, so I think it's OK to ship them separately.

How does Web Audio integrate with the lower layers of the audio stack where support for different audio output devices has been added?


One of the main integration points is the WebAudioDevice interface. In the Chromium implementation of this interface, an actual Chromium audio output device is created and an audio render callback function is supplied. The current implementation of this interface is hardcoded to use the default device (see here). 
Supporting multiple devices would require being able to supply the WebAudioDevice implementation with the ID of the sink to use, so that it can be passed to the Chromium AudioDeviceFactory.
WebAudioDevice is used by AudioDestination.

The HTMLMediaElement extensions integrate with Chromium using the WebMediaPlayer interface. WebMediaPlayer integrates with WebAudio using the WebAudioSourceProvider interface.  
WebMediaPlayer implementations use this to redirect audio to WebAudio. In this case, audio to the WebMediaPlayer is to WebAudio instead of the Chromium audio device, so the media player's sink ID is effectively ignored. This is what happens when an HTMLMediaElement outputs to WebAudio.

So for Web Audio it looks like the dependency is:
blink::DefaultAudioDestinationHandler → blink::AudioDestination → blink::WebAudioDevice → content::RendererWebAudioDeviceImpl → media::AudioOutputDevice  → ??? → media::AudioOutputStream → platform implementations

What does this look like starting at HTMLMediaElement, what is the highest-level common point? Is that a point where it's possible to switch the audio output device, or will that have to be implemented separately for Web Audio?

Even if shipping separately is an option, I think we must at least have high confidence in what the API will look like for Web Audio, because certain combinations of HTMLMediaElement and AudioContext extensions are harder to justify than others.

In particular, having AudioContext.addSinkId()/removeSinkId() but restricting HTMLMediaElement to just one audio output device (media files with audio tracks in multiple languages exist) would be sad.

Also, the `new AudioContext({ sinkId: requestedSinkId })` doesn't go great with an HTMLMediaElement solution that's allowed to change the sink at any time, because using createMediaStreamDestination() you could get the AudioContext output into an HTMLMediaElement.

This really only leaves the setSinkId() proposal for AudioContext, so is this the API you expect to implement? modules/webaudio/OWNERS, do you think that API will work?


At this point I don't know what is the best choice to add support for multiple devices to WebAudio, so we need to think more about it.
However, I don't see that as an impediment to ship the current HTMLMediaElement extensions, since their behavior should not change due to the WebAudio integration work.
The reason is that the audio stack is accessed independently in both cases.
As for the integration between them, I see two main cases:
1. WebAudio can output to an HTMLMediaElement. In this case the sink is determined by the HTMLMediaElement.
2. the HTMLMediaElement can output to WebAudio. In this case the sink is determined by WebAudio.
Both cases work today and I don't expect that to change after we allow WebAudio to output to different devices.
What do you think?

It seems to me that since you can pipe the audio from AudioContext to HTMLMediaElement and vice-versa, no differences in API capabilities would make sense, since you circumvent them by piping the audio to place with the more capable API. (This could of course mysteriously fail, but that would be a very unfortunate API.)

In other words, I think shipping HTMLMediaElement.setSinkId() excludes later shipping `new AudioContext({ sinkId: requestedSinkId })` which is less capable, so we should be clear that we're making that decision now and have that option removed from the table.

By the same reasoning, I also don't think it would make sense to ship AudioContext.addSinkId()/removeSinkId() later without adding the same capabilities to HTMLMediaElement, which you might do by moving setSinkId() to AudioTrack, which unfortunately isn't implemented yet.

So in summary, I'd like to make explicit what the underlying model is:
  1. Can the audio output device be changed while playback is ongoing? (yes)
  2. Can multiple audio devices be used to play two synchronized streams?

And then how that model is made to work for both HTMLMediaElement and Web Audio, or perhaps why divergent capabilities for these is justified, if that's the conclusion.

Philip

Harald Alvestrand

unread,
Nov 5, 2015, 8:38:31 AM11/5/15
to Philip Jägenstedt, Guido, blink-dev, Kenneth Russell, Raymond Toy, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org, Chris Wilson
Note - when you say AudioTrack, what are you referring to?

This name doesn't seem to exist either in WebAudio or in Media Capture and Streams.

There exist MediaStreamTrack with kind = "audio", but these don't have any sinkID, and are not designed to have one. A MediaStreamTrack connects to an input device and is connected to an output object; it's the output object (such as a MediaElement or whatever a MediaStreamAudioSourceNode connects to) that would have a sinkID.

Adding a sinkID to a MediaStreamTrack has not been suggested in the WG.

(Note: http://www.w3.org/TR/webaudio/ refers to AudioMediaStreamTrack. That type existed briefly in the Media Capture and Streams spec, but is not present at the moment.)

Philip Jägenstedt

unread,
Nov 5, 2015, 9:06:02 AM11/5/15
to Harald Alvestrand, Guido, blink-dev, Kenneth Russell, Raymond Toy, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org, Chris Wilson
I mean the AudioTrack that is in mediaElement.audioTracks, which is the API that you can use to enable and disable in-band audio tracks of regular media files. It's not implemented yet, but I presume that when you connect a MediaStream with multiple tracks to a media element, those should also appear in the audioTracks and videoTracks lists, and that it would in principle also work for MediaSource.

Guido

unread,
Nov 5, 2015, 12:06:18 PM11/5/15
to blink-dev, gui...@chromium.org, k...@chromium.org, rt...@chromium.org, h...@google.com, sligh...@google.com, rby...@chromium.org, chri...@chromium.org, h...@chromium.org, cwi...@google.com

On Thursday, November 5, 2015 at 2:20:28 PM UTC+1, Philip Jägenstedt wrote:
On Thu, Nov 5, 2015 at 12:26 PM, Guido <gui...@chromium.org> wrote:


On Monday, November 2, 2015 at 2:58:15 PM UTC+1, Philip Jägenstedt wrote:
On Thu, Oct 29, 2015 at 6:20 PM, Guido <gui...@chromium.org> wrote:
On Thursday, October 29, 2015 at 4:14:58 PM UTC+1, Philip Jägenstedt wrote:
This feature is a lot like MediaSession in that it needs to interact with audio output, but since there is no suitable primitive for that on the web platform we end up extending both HTMLMediaElement and AudioContext. For both this intent and MediaSession, shipping with only support for HTMLMediaElement isn't out of the question, but how much extra work would it be to finish the Web Audio integration? There is a risk that this reveals some issues that would affect the HTMLMediaElement API as well, after all.


Most of the work to implement this feature was improvements in the Chromium audio stack to make it better at supporting different audio output devices.
The Blink work for the HTMLMediaElement extensions is, for the most part, plumbing to let WebMediaPlayer implementations take advantage of the improvements in the Chromium lower layers. 
The WebAudio integration work will be largely orthogonal to the HTMLMediaElement work, so I think it's OK to ship them separately.

How does Web Audio integrate with the lower layers of the audio stack where support for different audio output devices has been added?


One of the main integration points is the WebAudioDevice interface. In the Chromium implementation of this interface, an actual Chromium audio output device is created and an audio render callback function is supplied. The current implementation of this interface is hardcoded to use the default device (see here). 
Supporting multiple devices would require being able to supply the WebAudioDevice implementation with the ID of the sink to use, so that it can be passed to the Chromium AudioDeviceFactory.
WebAudioDevice is used by AudioDestination.

The HTMLMediaElement extensions integrate with Chromium using the WebMediaPlayer interface. WebMediaPlayer integrates with WebAudio using the WebAudioSourceProvider interface.  
WebMediaPlayer implementations use this to redirect audio to WebAudio. In this case, audio to the WebMediaPlayer is to WebAudio instead of the Chromium audio device, so the media player's sink ID is effectively ignored. This is what happens when an HTMLMediaElement outputs to WebAudio.

So for Web Audio it looks like the dependency is:
blink::DefaultAudioDestinationHandler → blink::AudioDestination → blink::WebAudioDevice → content::RendererWebAudioDeviceImpl → media::AudioOutputDevice  → ??? → media::AudioOutputStream → platform implementations
 
What does this look like starting at HTMLMediaElement, what is the highest-level common point? Is that a point where it's possible to switch the audio output device, or will that have to be implemented separately for Web Audio?


For HTMLMediaElement we support two paths (two different blink::WebMediaPlayer implementations):

blink::HTMLMediaElement → blink::->WebMediaPlayer = media::WebMediaPlayerImpl → media::WebAudioSourceProviderImpl → media::WebAudioSourceProviderImpl → media::AudioRendererMixerInput→media::AudioRendererMixer→media::AudioOutputDevice → ...

In this case, media::AudioRendererMixerInput has the capability to change the sink (audio is rendered to a different mixer with a different media::AudioOutputDevice).

For media streams, the path is:

blink::HTMLMediaElement → blink::->WebMediaPlayer = content::WebMediaPlayerMS → content::MediaStreamAudioRenderer→media::AudioOutputDevice → ...

In this case content::MediaStreamAudioRenderer implementations (content::WebRtcAudioRenderer and content::WebRtcLocalAudioRenderer) have the ability to change the sink by creating new a AOD and rendering output to the new AOD. In the future, mixers might be introduced before the media::AudioOutputDevice, similarly to WebMediaPlayerImpl.



Even if shipping separately is an option, I think we must at least have high confidence in what the API will look like for Web Audio, because certain combinations of HTMLMediaElement and AudioContext extensions are harder to justify than others.

In particular, having AudioContext.addSinkId()/removeSinkId() but restricting HTMLMediaElement to just one audio output device (media files with audio tracks in multiple languages exist) would be sad.


I can't find a link right now, but some people knowledgeable about WebAudio  have said that the addSinkId()/removeSinkId() approach is not feasible because of synchronization issues. The sink clock is used to synchronize everything, so there can be only one sink. People with more knowledge about WebAudio might be able to elaborate.
 
 
It seems to me that since you can pipe the audio from AudioContext to HTMLMediaElement and vice-versa, no differences in API capabilities would make sense, since you circumvent them by piping the audio to place with the more capable API. (This could of course mysteriously fail, but that would be a very unfortunate API.)
 
In other words, I think shipping HTMLMediaElement.setSinkId() excludes later shipping `new AudioContext({ sinkId: requestedSinkId })` which is less capable, so we should be clear that we're making that decision now and have that option removed from the table.


I agree with you in that both APIs should be similar, but even if there are differences, there is value in having independent mechanisms for WebAudio and HTMLMediaElement.
Having the same API capability does not mean that everything will work exactly the same. For example, you can pipe from WebAudio to a media element using a MediaStream, but MediaStreams are handled by the lower layers in a way that is optimized for low-latency real-time communications. Piping from WebAudio using a mediaStream might not be optimal for non real-time WebAudio tasks, so using a less capable WebAudio API to set the output device might be better in many cases than redirecting. That said, I do think that the WebAudio extensions should be similar to the HTMLMediaElement extensions, but I prefer that to be decided in a WebAudio-specific discussion.

 
By the same reasoning, I also don't think it would make sense to ship AudioContext.addSinkId()/removeSinkId() later without adding the same capabilities to HTMLMediaElement, which you might do by moving setSinkId() to AudioTrack, which unfortunately isn't implemented yet.


I think addSinkId()/removeSinkId() is out of the question.


 
So in summary, I'd like to make explicit what the underlying model is:
  1. Can the audio output device be changed while playback is ongoing? (yes)
Yes, it can. 
  1. Can multiple audio devices be used to play two synchronized streams?
If this means something like addSinkId()/removeSinkId(), I am pretty certain that this will not be supported.


And then how that model is made to work for both HTMLMediaElement and Web Audio, or perhaps why divergent capabilities for these is justified, if that's the conclusion.

 
I see no technical impediment to support changing audio output devices for WebAudio in the same manner as it is supported for HTMLMediaElement.
It is not yet decided if the WebAudio API should be updated to support this functionality, or how it should be implemented in Chromium.

To summarize, there are three cases:
1. HTMLMediaElement without WebAudio. This is specified and implemented.
2. WebAudio without HTMLMediaElement. This is neither specified nor implemented and is independent of (1).
3. HTMLMediaElement and WebAudio interacting. This is specified and implemented.
 One possible interaction is to use the HTMLMediaElement as an input to a WebAudio context by calling createMediaElementSource(). In this case, the WebAudio spec says that "audio playback from the HTMLMediaElement will be re-routed into the processing graph of the AudioContext". I think it's clear that this necessarily implies ignoring any sink Id the media element may have. This is implemented in Chromium.
 The other interaction is that a mediaStream created with WebAudio can be assigned as the src of an HTMLMediaElement. In this case, that MediaStream will be played on the element's sink just like any other MediaStream.
 All other possible interactions are combinations of these two.

I don't think the lack of a specification for (2) should block shipping (1) and (3), because (2) is completely independent from (1) and (3).

Philip Jägenstedt

unread,
Nov 7, 2015, 6:42:20 PM11/7/15
to Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org, Chris Wilson
Sorry for the lack of a proper reply so far, I'm busy preparing for BlinkOn... If you're going to be there let's chat, otherwise I'll get back to this thread after BlinkOn is over.

Input from other API owners much appreciated :)


To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Hongchan Choi

unread,
Nov 10, 2015, 1:02:06 PM11/10/15
to Philip Jägenstedt, Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org, Chris Wilson
> "some people knowledgeable about WebAudio  have said that the addSinkId()/removeSinkId() approach is not feasible because of synchronization issues"

For one thing, this is quite correct. AudioContext/Destination are being driven by the callback from the underlying audio device's clock. I am not really sure how we can have multiple clock sources or how to coordinate them together to achieve the unified clock. 

FWIW, OSX supports this 'aggregated audio interface' feature from CoreAudio layer, not from the application. If this needs to be done some how, I think it should be Chrome media layer not WebAudio.

For that reason, I believe the option 1 and 2 are viable from WebAudio's perspective.

This might be off-topic but I will just throw it in here: how does it go along with the input selection? Currently getUserMedia does not give you the choice on the device. This might be a bit problematic, for example, when you feed the incoming audio from the device A (via gUM) to pipe it out to device B (AudioOutputDevices API) with WebAudio. What's the total latency here? Do we have any plan for AudioInputDevices API? Why can't this be AudioDeviceSelection API?

Pro-audio vendors (Roland, Yamaha and more) have consistently requested the API for the input or audio device selection from the beginning of Web Audio API.

Please correct me if I am wrong, but I think we have a bit of disparity between handling audio input and output.

Chris Wilson

unread,
Nov 10, 2015, 1:42:01 PM11/10/15
to Hongchan Choi, Philip Jägenstedt, Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org
I have a pretty strong preference to only support #1, until such time as we actually define the underlying audio stream bedrock APIs that Web Audio hypothetically sits on top of.  Web Audio implementations today support changing the sink id (by removing or changing the default audio device), but it's a hack that may have undefined side effects (e.g. the sampleRate may change, which may have higher-level implications in a given app, but there's no way to detect that the device changed - let alone what happened to the timeline).

Philip Jägenstedt

unread,
Nov 18, 2015, 8:46:41 AM11/18/15
to Chris Wilson, Hongchan Choi, Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org
Thanks, Chris. What to do with the sampleRate when changing sinks indeed sounds tricky, so when you argue for only `new AudioContext({ sinkId: requestedSinkId })` it's hard to argue otherwise. I suppose that in order to be able to change sinks at runtime with Web Audio, one would need an extra resampling step and possibly more buffering, which would add to the total delay. Or perhaps a mechanism to signal that the sampleRate has changed, but at that point it seems about as easy to just recreate the graph in a new AudioContext.

And still, switching audio output devices mid-stream is useful for media elements. I think that we should ship these capabilities ASAP, but there are outstanding issues:
We have missed the M48 branch, but it should be entirely possible to resolve this well in time before the next branch point. (Sorry for dragging this out, I blame BlinkOn.)

Philip Jägenstedt

unread,
Dec 9, 2015, 5:49:43 AM12/9/15
to Chris Wilson, Hongchan Choi, Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org
hta@ has poked the TAG review, so I just want to clarify here that this Intent will not block on that if there's no response. Sorting out the open spec issues together with the other stake holder is of higher priority, I think.

Chris Wilson

unread,
Dec 9, 2015, 1:57:21 PM12/9/15
to Philip Jägenstedt, Hongchan Choi, Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org
Sorry, this hit during my Chrome Dev Summit attention blackout.  :)

I don't think limiting to only AudioContext constructor prevents the "switching audio output devices mid-stream" for media elements - all that plumbing would below the AudioContext layer.  Or, to put it differently - the way I would model media elements, you could pass the audio buffers or streams of audio buffers to a different context created on a new output device, so this would still work fine.  There would of course be a bit of a glitch in passing from one device to another, but that's not preventable, I don't think.

Philip Jägenstedt

unread,
Dec 10, 2015, 8:15:50 AM12/10/15
to Chris Wilson, Hongchan Choi, Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org
Hmm, so the model is that an AudioContext is bound to the underlying audio output device (not yet web-exposed) for its lifetime, while a media element can automatically open a new audio output device and direct its output to that, possibly with a new resampling step if the sample rate is different. That makes sense.

Chris Wilson

unread,
Dec 10, 2015, 1:09:14 PM12/10/15
to Philip Jägenstedt, Hongchan Choi, Guido, blink-dev, Kenneth Russell, Raymond Toy, Harald Alvestrand, Alex Russell, Rick Byers, Chris Harrelson, h...@chromium.org
Pretty much.  We have an oddity in Chrome's implementation right now where if the default audio output device is switched, the AudioContext *WILL* reset itself, but if I recall, it has a bunch of issues if the sample rate changes (i.e. it doesn't refresh state or rebuild backing data).  Would need to check.

jno...@hirevue.com

unread,
Dec 11, 2015, 5:01:52 PM12/11/15
to blink-dev, phi...@opera.com, hong...@google.com, gui...@chromium.org, k...@chromium.org, rt...@chromium.org, h...@google.com, sligh...@google.com, rby...@chromium.org, chri...@chromium.org, h...@chromium.org
So I see that currently it's possible to enumerate output devices in Chrome 47 without a flag or any experimental web features enabled, but it isn't possible to actually set the sinkId.  It seems that 48 doesn't allow setting the sinkId either; Canary does.  Is there some timeline for this discrepancy to be resolved?  it's very odd to have enumeration but no ability to set.

Harald Alvestrand

unread,
Dec 11, 2015, 5:14:40 PM12/11/15
to jno...@hirevue.com, blink-dev, Philip Jägenstedt, Hongchan Choi, Guido, Kenneth Russell, Raymond Toy, Alex Russell, Rick Byers, Chris Harrelson, Harald Alvestrand
Setting the output device is available as an experimental feature (the experimental flag is called AudioOutputDevices).

We are working through some details (wording of the permission request dialogs) before we think the feature is ready for release. We still hope to make M49.

In the meantime, we're very happy to get feedback on the experimental implementation!

Guido Urdaneta

unread,
Dec 16, 2015, 10:26:59 AM12/16/15
to Harald Alvestrand, jno...@hirevue.com, blink-dev, Philip Jägenstedt, Hongchan Choi, Guido, Kenneth Russell, Raymond Toy, Alex Russell, Rick Byers, Chris Harrelson, Harald Alvestrand
TAG review is done, the most important issues in the spec have been addressed,
and we have agreed on a plan for Web Audio support.
Is there any issue remaining that is preventing us from shipping this feature?

Philip Jägenstedt

unread,
Dec 16, 2015, 10:33:00 AM12/16/15
to Guido Urdaneta, Harald Alvestrand, jno...@hirevue.com, blink-dev, Hongchan Choi, Guido, Kenneth Russell, Raymond Toy, Alex Russell, Rick Byers, Chris Harrelson, Harald Alvestrand
I have nothing further, LGTM1 to ship!

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Chris Harrelson

unread,
Dec 16, 2015, 10:50:30 AM12/16/15
to Philip Jägenstedt, Guido Urdaneta, Alex Russell, Guido, Harald Alvestrand, Harald Alvestrand, Hongchan Choi, Kenneth Russell, Raymond Toy, Rick Byers, blink-dev, jno...@hirevue.com
LGTM2

Guido

unread,
Dec 22, 2015, 10:42:10 AM12/22/15
to blink-dev, phi...@opera.com, gui...@google.com, sligh...@google.com, gui...@chromium.org, h...@chromium.org, h...@google.com, hong...@google.com, k...@chromium.org, rt...@chromium.org, rby...@chromium.org, jno...@hirevue.com, chri...@chromium.org
This needs one more lgtm. Any takers?
LGTM2

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.

Dimitri Glazkov

unread,
Dec 22, 2015, 11:12:48 AM12/22/15
to Guido, blink-dev, Philip Jägenstedt, gui...@google.com, Alex Russell, h...@chromium.org, Harald Alvestrand, Hongchan Choi, Kenneth Russell, rt...@chromium.org, Rick Byers, jno...@hirevue.com, Chris Harrelson
LGTM3.

:DG<

LGTM2

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Yehonathan Sharvit

unread,
Dec 22, 2015, 12:55:21 PM12/22/15
to Dimitri Glazkov, Guido, blink-dev, Philip Jägenstedt, gui...@google.com, Alex Russell, h...@chromium.org, Harald Alvestrand, Hongchan Choi, Kenneth Russell, rt...@chromium.org, Rick Byers, jno...@hirevue.com, Chris Harrelson
LGTM4

LGTM3.

:DG<
You received this message because you are subscribed to a topic in the Google Groups "blink-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/chromium.org/d/topic/blink-dev/ELz4SxMwa0U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to blink-dev+...@chromium.org.



--

asher...@audyx.com

unread,
Mar 30, 2016, 1:15:40 PM3/30/16
to blink-dev
With much excitement we started last week to work with the new API in order to direct the audio from our app to devices other than the default device.

Unfortunately, we encountered a bug: https://bugs.chromium.org/p/chromium/issues/detail?id=595635. This seems to be an old bug in Chrome, where when connecting an Audio Context Source Node to a MediaStreamElement and then streaming it through an HTML Audio Element, the noise is broken.
Since this is currently the only way to direct the audio to a non-default output device, it seems as if the new API can not be used.

Is there a way around this, or a planned fix for this issue? 

On Friday, October 23, 2015 at 8:03:00 PM UTC+3, Guido Urdaneta wrote:

Intent to Ship: Audio Output Devices API


Contact emails

gui...@chromium.org, h...@chromium.org, to...@chromium.org


Spec

https://w3c.github.io/mediacapture-output/


Summary

This feature incorporates a set of JavaScript APIs that let a Web application direct the audio output of a media element to authorized devices other than the system or user agent default.


Link to “Intent to Implement” blink-dev discussion

https://groups.google.com/a/chromium.org/forum/#!searchin/blink-dev/intent$20to$20implement$20audio$20output$20devices$20api/blink-dev/Ci7pSnHGMmo/Yw6E2bR0CE8J


Is this feature supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

No. It is supported on all desktop platforms (Windows, Mac, Linux, and Chrome OS).

It is not supported on Android (including WebView) because the Android platform does not provide the ability to programmatically switch individual audio streams to different audio devices. We plan to support Android once the platform provides the required underlying support.


Demo link

https://webrtc.github.io/samples/src/content/devices/multi/

https://guidou.github.io/setsinkid-demo.html


Compatibility Risk

There is some compatibility risk, as we are not aware of any plans to provide support for this feature in other browsers. The risk is mitigated by the fact that the surface area of the change is small (only the setSinkId function and the sinkId field on HTML media elements), and the fact that we do not expect the specification to change in an important way in the future with regards to these extensions.

The compatibility risk has not changed since the intent to implement.


OWP launch tracking bug

https://crbug.com/547091


Entry on the feature dashboard

https://www.chromestatus.com/feature/4621603249848320


Other relevant links and info

Security and privacy are being reviewed at https://crbug.com/545992

The feature has been behind a flag since M45.

webrt...@gmail.com

unread,
Mar 31, 2016, 3:42:02 PM3/31/16
to blink-dev, asher...@audyx.com
Asher, I believe we have seen this issue as well, at least related to WebRTC, the relevant bug with repro steps is here:
Reply all
Reply to author
Forward
0 new messages