Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

MediaController proposal

57 views
Skip to first unread message

Andrea Marchesini

unread,
Oct 23, 2014, 1:09:26 PM10/23/14
to dev-w...@lists.mozilla.org
Hi all,

This is the proposal for a new object called MediaController.
I and Ehsan have been working on this proposal for a while and now we would like to share it to the webapi mailing list.
Any comment is welcome!


Use cases
=========

* I have a media player app, and I want to use the media keys on my keyboard on desktop to control it.

* I have a media player app and I want to use the headset controls on my headphones to control it.

* I have a media player app and I want to use the mobile soft keys (e.g. as done on iOS and Android lock screens) to control it.

Possible future use cases
=========================

* I have registered a media player app and I want it to start up when I press the play key. (We could possibly dispatch an event to the SW, etc., out of scope for now)

Prior art
=========

* http://smus.com/remote-controls-web-media/
* http://paulrouget.com/e/mediaevents/
* http://beardedspice.com/
* https://groups.google.com/forum/#!msg/mozilla.dev.webapi/TSDZCeWYiDU/kEXkuQG83ngJ

Platform specific implementation notes
======================================

1. Windows

It seems like Windows will dispatch a WM_APPCOMMAND message when a media key is pressed. The application has an opportunity to handle the message and return TRUE, or pass it along to DefWindowProc to allow the message to be delivered to another application.

See http://msdn.microsoft.com/en-us/library/windows/desktop/ms646275%28v=vs.85%29.aspx

2. Mac OS X

You need to write specific code to relinquish the event interceptor when your application goes to the background. http://overooped.com/post/2593597587/mediakeys explains the reason. This is how to do it: https://github.com/nevyn/SPMediaKeyTap. This library is used by apps such as VLC.

3. iOS

There is a way to capture the ipod music controls on the lockscreen: http://stackoverflow.com/questions/3196330/how-to-enable-ipod-controls-in-the-background-to-control-non-ipod-music-in-ios-4 But it seems like we need to know in advance whether we need to capture these keys, which maps well with what we are thinking of (navigator.requestMediaKeys).

The same API seems to give you access to the media keys on the headphones too. This is documented here: https://developer.apple.com/library/ios/documentation/EventHandling/Conceptual/EventHandlingiPhoneOS/Remote-ControlEvents/Remote-ControlEvents.html#//apple_ref/doc/uid/TP40009541-CH7-SW3 It seems like your app can claim to be the first responder to the event, and if the event is not handled by your app, iOS will send it to the next one, and so on.

4. Android

https://developer.android.com/training/managing-audio/volume-playback.html

The app has to register a BroadcastReceiver in its manifest that listens for the ACTION_MEDIA_BUTTON action broadcast. When the action is received it contains EXTRA_KEY_EVENT informations such as KeyEvent.KEYCODE_MEDIA_PLAY, KeyEvent.KEYCODE_MEDIA_PAUSE, and others.

The receiver has to be registered and unregistered.

5. Linux

Xorg has custom X codes for media buttons such as XF86AudioMute, XF86AudioNext, XF86AudioPause, XF86AudioPlay etc. They are emitted as normal events.

Rough sketch of a proposal
==========================

[NoInterfaceObject]
interface NavigatorMediaController {
Promise<MediaController> requestMediaController();
};

Navigator implements NavigatorMediaController;
enum MediaKeyEventType {
"play",
"pause",
"playpause",
"next",
"previous",
// possibly other codes
};

dictionary MediaKeyEventInit : EventInit {
MediaKeyEventType detail = "play";
};

[Constructor(DOMString type, optional MediaKeyEventInit init)]
interface MediaKeyEvent : Event {
readonly attribute MediaKeyEventType detail;
};

interface MediaController : EventTarget {
attribute EventHandler onmediakey;
// Do we need a revoke method?
// void revoke();
// This attribute is used to see if the play/pause event has been received correct.
attribute boolean mediaActive;

// An image to show when the device is locked.
attribute (DOMString or URL or Blob or HTMLImageElement or HTMLCanvasElement or HTMLVideoElement) mediaImage;

// Extra info to show.
attribute DOMString mediaTitle;
attribute DOMString mediaDuration;
// other attributes.
};

Usage
=====

navigator.requestMediaController().then(function(controller) {

controller.audioActive = true;
controller.mediaTitle = "Maurice Ravel - Piano Concerto for the left hand"

controller.onmediakey = function(e) {
switch (e.detail) {
case "play":
// ...
break;
case "pause":
// ...
break;
}
};
}).catch(function() {
alert("Access to media keys not granted");
});

Notes
=====

We intentionally do not use the DOM3 media KeyboardEvent keys, since on some OSes such as Android and iOS it would not be possible to map these events to a keydown/keyup event in a sensible way.

Technically we could get rid of MediaController, and dispatch the event on Window/Navigator/etc. ehsan has no strong preferences, and baku prefers to keep things this way. ehsan likes the fact that this model plays nice with the promise returned from requestMediaKeys.

The asynchronous requestMediaKeys() function provides a chance for the UA to show a UI asking the user to confirm if they choose to do so, etc.

In the future we may be able to extend MediaController to allow the application to provide some information about the currently playing track, such as the title, the picture, etc. That will enable us to build soft media controls on the lockscreen for example similar to the way that iOS and Android do.

We should also add the ability for the webapp to use the MediaController to send information to the platform about song-name/album-name/artist-name/album-art/timeposition.

We also need the webapp send information about if it's currently playing or not, so that the platform knows if it should display a "play" or a "pause" button on a software keyboard or a widget. This can also be useful to enable the platform to display playing-status next to song name etc. This might also enable us to dispatch the "play" or "pause" event rather than the "playpause" event.

Open questions
==============

The MediaKey* terminology is closed to what is used in the EME spec. ehsan thinks this is fine because it is not likely for most apps to want to use EME, so there are no big chances for confusion.

Is it OK to diverge from the DOM KeyboardEvent codes for this?

The policy choice as to what to do when two web apps request access to this API is left to the UA, and the UA has the freedom to adopt their own policies. Is that OK? (ehsan thinks yes.)

As far as the ergonomics of using the API is concerned, should we return the same MediaController object from requestMediaKeys() no matter how many times the author calls it?

Given the possibility of future extensions to add more things to MediaController as discussed above, should we pick a better name for it?

Anne van Kesteren

unread,
Oct 23, 2014, 2:19:12 PM10/23/14
to Andrea Marchesini, dev-w...@lists.mozilla.org
On Thu, Oct 23, 2014 at 7:08 PM, Andrea Marchesini
<amarc...@mozilla.com> wrote:
> Given the possibility of future extensions to add more things to MediaController as discussed above, should we pick a better name for it?

Yes, but mostly because your name
https://html.spec.whatwg.org/multipage/embedded-content.html#mediacontroller
already exists.


--
https://annevankesteren.nl/

Ehsan Akhgari

unread,
Oct 23, 2014, 2:22:58 PM10/23/14
to Anne van Kesteren, Andrea Marchesini, dev-w...@lists.mozilla.org
Heh, right. Please suggest better names if you can. :)

Anne van Kesteren

unread,
Oct 23, 2014, 2:27:21 PM10/23/14
to Ehsan Akhgari, dev-w...@lists.mozilla.org, Andrea Marchesini
On Thu, Oct 23, 2014 at 8:22 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
> Heh, right. Please suggest better names if you can. :)

The main thing I wonder about is whether this is low-level enough. Do
we want to keep running the application to play back audio? I don't
think that's e.g. what iOS does and I doubt that if we don't have a
more low-level approach we could compete on battery usage down the
line.


--
https://annevankesteren.nl/

Ehsan Akhgari

unread,
Oct 23, 2014, 2:58:04 PM10/23/14
to Anne van Kesteren, dev-w...@lists.mozilla.org, Andrea Marchesini
I think given APIs such as MSE and Web Audio, that's unavoidable in the
general case. Although we may be able to avoid that for simple
playbacks of media elements.

Ehsan Akhgari

unread,
Oct 23, 2014, 3:00:15 PM10/23/14
to Anne van Kesteren, dev-w...@lists.mozilla.org, Andrea Marchesini
And even that wouldn't be very simple if we want to support things such
as going back/forward to the previous/next tracks, and still be able to
relaunch the app in the foreground if the user attempts to bring it into
foreground too...

Shawn Huang

unread,
Oct 24, 2014, 10:20:39 AM10/24/14
to Andrea Marchesini, dev-w...@lists.mozilla.org
Hi Andrea,
Can we add one more use case? It will be good to add use case for Bluetooth AVRCP (The Audio/Video Remote Control Profile).
We already have implementation for b2g, but in long term, I think AVRCP profile on Firefox OS shall also follow MediaController.

Shawn
----- 原始郵件 -----
寄件者: "Andrea Marchesini" <amarc...@mozilla.com>
收件者: dev-w...@lists.mozilla.org
寄件備份: 2014 10 月 24 星期五 上午 1:08:31
主旨: MediaController proposal
Given the possibility of future extensions to add more things to MediaController as discussed above, should we pick a better name for it?
_______________________________________________
dev-webapi mailing list
dev-w...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-webapi

Ehsan Akhgari

unread,
Oct 24, 2014, 1:27:54 PM10/24/14
to Shawn Huang, Andrea Marchesini, dev-w...@lists.mozilla.org
On 2014-10-24 10:19 AM, Shawn Huang wrote:
> Hi Andrea,
> Can we add one more use case? It will be good to add use case for Bluetooth AVRCP (The Audio/Video Remote Control Profile).
> We already have implementation for b2g, but in long term, I think AVRCP profile on Firefox OS shall also follow MediaController.

Can't we make the remote controller send the exact same events to the
app as do the media keys on desktop or the physical keys on a headset,
for example? Hiding the source of these commands behind this API is one
of main reasons why it's useful.

Jonas Sicking

unread,
Nov 4, 2014, 9:32:48 PM11/4/14
to Andrea Marchesini, dev-webapi
On Thu, Oct 23, 2014 at 10:08 AM, Andrea Marchesini
<amarc...@mozilla.com> wrote:
> Hi all,
>
> This is the proposal for a new object called MediaController.

Yay! I hadn't seen this thread until just now. But coincidentally I've
spent some time on this issue too recently thanks Kevin filing

https://bugzilla.mozilla.org/show_bug.cgi?id=1084464

> Use cases
> =========
>
> * I have a media player app, and I want to use the media keys on my keyboard on desktop to control it.
>
> * I have a media player app and I want to use the headset controls on my headphones to control it.
>
> * I have a media player app and I want to use the mobile soft keys (e.g. as done on iOS and Android lock screens) to control it.

I'd like to also add the following use cases:

* I have a media player app and I want to show information about the
currently playing song on the lock screen.

* I have a homescreen app and I want it to show information about the
currently playing song.

* I have a media player app and I want it to pause audio when there's
an incoming phone call, and resume the audio once the phone call is
over.

* I have a media player app and I want it to pause audio when the user
starts watching a video in YouTube or another media player app.

There's also a somewhat more controversial:

* I have a media player app and I want it to keep playing music when
the user switches away to another app, even on platforms where apps
are normally muted when the user switches away.

> Possible future use cases
> =========================
>
> * I have registered a media player app and I want it to start up when I press the play key. (We could possibly dispatch an event to the SW, etc., out of scope for now)

Oh, interesting! But I agree that we can leave this one for later.

> [NoInterfaceObject]
> interface NavigatorMediaController {
> Promise<MediaController> requestMediaController();
> };
>
> Navigator implements NavigatorMediaController;
> enum MediaKeyEventType {
> "play",
> "pause",
> "playpause",
> "next",
> "previous",
> // possibly other codes
> };

We need "stop" as well. This usually rewinds to the beginning of the
current song. I don't know how commonly used it is these days, but I
think some hardware keyboards still have it separate from the "pause"
button.

Also, would it be possible to fire "play" and "pause" events rather
than "playpause" events? Could we use the MediaController.mediaActive
property to determine which one to fire?

> dictionary MediaKeyEventInit : EventInit {
> MediaKeyEventType detail = "play";
> };
>
> [Constructor(DOMString type, optional MediaKeyEventInit init)]
> interface MediaKeyEvent : Event {
> readonly attribute MediaKeyEventType detail;
> };
>
> interface MediaController : EventTarget {
> attribute EventHandler onmediakey;
> // Do we need a revoke method?
> // void revoke();
> // This attribute is used to see if the play/pause event has been received correct.
> attribute boolean mediaActive;
>
> // An image to show when the device is locked.
> attribute (DOMString or URL or Blob or HTMLImageElement or HTMLCanvasElement or HTMLVideoElement) mediaImage;

I'd simplify this and only enable passing a DOMString (url) or Blob
here. URL objects automatically stringify to a useable url.

I don't see a reason why passing a HTMLImageElement or
HTMLVideoElement is needed given that they are always loaded from
URLs. And I don't think the use cases for dynamically generated images
are strong enough that we couldn't ask developers to turn the
HTMLCanvasElement into a Blob.

> // Extra info to show.
> attribute DOMString mediaTitle;
> attribute DOMString mediaDuration;
> // other attributes.

We also need the current position so that we can show that on a lockscreen.

Not sure if we should add a "attribute double currentMediaTime;" which
the platform would automatically increase as long as mediaActive is
set to true. Alternatively we could add two functions like:

indicatePlaying((DOMString or Blob) image, DOMString title, duration,
position); // All arguments would be optional
indicateNotPlaying();

We might also want to allow the page to indicate that it's currently
buffering audio. This could be useful for streaming services so that
they can reflect in the lockscreen UI why audio isn't playing. Adding
something like "attribute boolean buffering" would cover that.


Regarding the "pause while on a phone call" use case, I think most
mobile platforms would want to forcefully pause any application audio
when there's an incoming phone call. All of FirefoxOS, Chrome for
Android and Safario for iOS does this. I think Android automatically
mutes Fennec, but I'm not sure if Fennec in turn pauses any webpage
audio.

So I think we should forcefully pause any <audio>/<video>/WebAudio audio.

In addition we could fire some form of "interrupted"/"resumed" events
on the MediaController object.

This would also cover the "pause when playing in youtube" use case. We
could do essentially exactly what we're doing today using the
mozaudiochannel=content policy. But treating any audio as
channel=content from any app which has
MediaController.mediaActive=true.

> Open questions
> ==============
>
> The MediaKey* terminology is closed to what is used in the EME spec. ehsan thinks this is fine because it is not likely for most apps to want to use EME, so there are no big chances for confusion.

I don't really have an opinion.

> Is it OK to diverge from the DOM KeyboardEvent codes for this?

*I* think so, but I'm sure others might disagree. I think we should
propose this spec to W3C and then we can see what other browser
vendors think.

> The policy choice as to what to do when two web apps request access to this API is left to the UA, and the UA has the freedom to adopt their own policies. Is that OK? (ehsan thinks yes.)

I think so yes.

> As far as the ergonomics of using the API is concerned, should we return the same MediaController object from requestMediaKeys() no matter how many times the author calls it?

I think that if we return different instances then those instances
should act independently. It might be easier to simply always return
the same instance. But I don't feel strongly.

/ Jonas

Jonas Sicking

unread,
Nov 4, 2014, 9:40:15 PM11/4/14
to Anne van Kesteren, dev-w...@lists.mozilla.org, Ehsan Akhgari, Andrea Marchesini
On Thu, Oct 23, 2014 at 11:27 AM, Anne van Kesteren <ann...@annevk.nl> wrote:
> On Thu, Oct 23, 2014 at 8:22 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>> Heh, right. Please suggest better names if you can. :)
>
> The main thing I wonder about is whether this is low-level enough. Do
> we want to keep running the application to play back audio? I don't
> think that's e.g. what iOS does and I doubt that if we don't have a
> more low-level approach we could compete on battery usage down the
> line.

I like the idea of enabling pages to pass a HTMLMediaElement to the
API and then have the platform ensure that it keeps playing even if
the app is killed. This would then be coupled with some form of SW
event which is fired when a certain time position is reached.

I don't know the WebAudio spec well enough to know if this makes sense
to do for AudioContext objects in addition to HTMLMediaElements.

But I feel like this might be a more advanced use case and we should
get the basics in place first? Like Ehsan points out, some sites, like
YouTube, will want to use MSE and keep actively running in order to
continuously feed the platform with the highest quality data possible.

/ Jonas

Marco Chen

unread,
Nov 4, 2014, 10:32:21 PM11/4/14
to Ehsan Akhgari, Shawn Huang, Andrea Marchesini, dev-w...@lists.mozilla.org
Hi,

A question about how to handle a media key once more then one apps registered for it?

User Scenario:
1. User launched the default music app and the second music app from market place.
a. Default one is played/paused on foreground and second one is paused/played on background.
b. Default and second one are all paused on the background.

What will be happened if the media key of play is fired for 1-a and 1-b?

It seems there is the same issue with "using system message for carrying AVRCP commands into Apps".
(System message already hides the source of these command behind it)

Thanks,
Sincerely yours.
----- Original Message -----

Robert O'Callahan

unread,
Nov 5, 2014, 12:45:31 AM11/5/14
to Anne van Kesteren, dev-w...@lists.mozilla.org, Ehsan Akhgari, Andrea Marchesini
On Fri, Oct 24, 2014 at 7:27 AM, Anne van Kesteren <ann...@annevk.nl> wrote:

> On Thu, Oct 23, 2014 at 8:22 PM, Ehsan Akhgari <ehsan....@gmail.com>
> wrote:
> > Heh, right. Please suggest better names if you can. :)
>
> The main thing I wonder about is whether this is low-level enough. Do
> we want to keep running the application to play back audio? I don't
> think that's e.g. what iOS does and I doubt that if we don't have a
> more low-level approach we could compete on battery usage down the
> line.
>

I'm not sure what you mean exactly, but we do have an optimization where
media element playback can use custom hardware so that the application does
not wake up very often while music is playing.

It seems to me that waking up an application just to handle "volume
up/volume down" events is unlikely to significantly impact battery life.

Rob
--
oIo otoeololo oyooouo otohoaoto oaonoyooonoeo owohooo oioso oaonogoroyo
owoiotoho oao oboroootohoeoro oooro osoiosotoeoro owoiololo oboeo
osouobojoeocoto otooo ojouodogomoeonoto.o oAogoaoiono,o oaonoyooonoeo
owohooo
osoaoyoso otooo oao oboroootohoeoro oooro osoiosotoeoro,o o‘oRoaocoao,o’o
oioso
oaonosowoeoroaoboloeo otooo otohoeo ocooouoroto.o oAonodo oaonoyooonoeo
owohooo
osoaoyoso,o o‘oYooouo ofooooolo!o’o owoiololo oboeo oiono odoaonogoeoro
ooofo
otohoeo ofoioroeo ooofo ohoeololo.

Jonas Sicking

unread,
Nov 5, 2014, 1:16:10 AM11/5/14
to Marco Chen, Shawn Huang, Ehsan Akhgari, Andrea Marchesini, dev-webapi
On Tue, Nov 4, 2014 at 7:32 PM, Marco Chen <mc...@mozilla.com> wrote:
> Hi,
>
> A question about how to handle a media key once more then one apps registered for it?
>
> User Scenario:
> 1. User launched the default music app and the second music app from market place.
> a. Default one is played/paused on foreground and second one is paused/played on background.
> b. Default and second one are all paused on the background.
>
> What will be happened if the media key of play is fired for 1-a and 1-b?

Media keys will be sent to whichever app currently has "media focus".
That means that if any app is currently playing, then that's the app
that will receive the keys, if no app is playing we'd use some policy
to determine which one to send the keys to. Most likely we should send
the keys to the app that most recenly played music.

/ Jonas

Tim Chien

unread,
Dec 15, 2014, 4:21:46 AM12/15/14
to Jonas Sicking, dev-webapi, Marco Chen, Ehsan Akhgari, Andrea Marchesini, Shawn Huang
Any update on this? What's the bug # we should be follow up?

IMHO this is *the* example for FxOS v3 (and Firefox Desktop) where we
could say the OS did advance the Open Web, instead of workaround
everything with mozSystemMessage.
--
Tim Guan-tin Chien, Engineering Manager and Front-end Lead, Firefox
OS, Mozilla Corp. (Taiwan)

Jonas Sicking

unread,
Dec 15, 2014, 10:03:53 PM12/15/14
to Tim Chien, dev-webapi, Marco Chen, Ehsan Akhgari, Andrea Marchesini, Shawn Huang
I agree. I'm also super interested to see this move forward.

/ Jonas

Andrea Marchesini

unread,
Dec 16, 2014, 6:18:56 AM12/16/14
to Tim Chien, dev-webapi, Marco Chen, Ehsan Akhgari, Jonas Sicking, Shawn Huang
Bug 111203.
I and Ehsan will work on this soon. Probably starting in January.

Tim Guan-tin Chien

unread,
Dec 16, 2014, 7:06:45 AM12/16/14
to Andrea Marchesini, Shawn Huang, Ehsan Akhgari, dev-webapi, Tim Chien, Marco Chen, Jonas Sicking
You mean https://bugzilla.mozilla.org/show_bug.cgi?id=1112032

Thanks for open-up the bug. I can help on this for B2G and especially Gaia
UI.

On Tue, Dec 16, 2014 at 7:18 PM, Andrea Marchesini <amarc...@mozilla.com>
wrote:
>

Ehsan Akhgari

unread,
Dec 17, 2014, 12:37:07 PM12/17/14
to Jonas Sicking, Anne van Kesteren, dev-w...@lists.mozilla.org, Andrea Marchesini
On 2014-11-04 9:39 PM, Jonas Sicking wrote:
> On Thu, Oct 23, 2014 at 11:27 AM, Anne van Kesteren <ann...@annevk.nl> wrote:
>> On Thu, Oct 23, 2014 at 8:22 PM, Ehsan Akhgari <ehsan....@gmail.com> wrote:
>>> Heh, right. Please suggest better names if you can. :)
>>
>> The main thing I wonder about is whether this is low-level enough. Do
>> we want to keep running the application to play back audio? I don't
>> think that's e.g. what iOS does and I doubt that if we don't have a
>> more low-level approach we could compete on battery usage down the
>> line.
>
> I like the idea of enabling pages to pass a HTMLMediaElement to the
> API and then have the platform ensure that it keeps playing even if
> the app is killed. This would then be coupled with some form of SW
> event which is fired when a certain time position is reached.
>
> I don't know the WebAudio spec well enough to know if this makes sense
> to do for AudioContext objects in addition to HTMLMediaElements.
>
> But I feel like this might be a more advanced use case and we should
> get the basics in place first? Like Ehsan points out, some sites, like
> YouTube, will want to use MSE and keep actively running in order to
> continuously feed the platform with the highest quality data possible.

The same issue with MSE exists with Web Audio as well, since the audio
playback can be manipulated by script while playback is occuring.

0 new messages