Intent to ship: media-capabilities

Jean-Yves Avenard

unread,

May 14, 2018, 11:20:13 AM5/14/18

to dev-platform

Media Capabilities allow for web sites to better determine what content to serve to the end user.
Currently a media element offers the canPlayType method (https://html.spec.whatwg.org/multipage/media.html#dom-navigator-canplaytype-dev <https://html.spec.whatwg.org/multipage/media.html#dom-navigator-canplaytype-dev>) to determine if a container/codec can be used. But the answer is limited as a maybe/probably type answer.

It gives no ability to determine if a particular resolution can be played well/smoothly enough or be done in a power efficient manner (e.g. will it be hardware accelerated).

This has been a particular problem with sites such as YouTube that serves VP9 under all circumstances even if the user agent won't play it well (VP9 is mostly done via software decoding and is CPU itensive). This has forced us to indiscriminately disable VP9 altogether).
For YouTube to know that VP9 could be used for low resolution but not high-def ones would allow them to select the right codec from the start.

This issue is tracked in bugzilla 1409664 (https://bugzilla.mozilla.org/show_bug.cgi?id=1409664 <https://bugzilla.mozilla.org/show_bug.cgi?id=1409664>)

The proposed spec is available at https://wicg.github.io/media-capabilities/ <https://wicg.github.io/media-capabilities/>

Chrome has shipped it a while ago now and talking to several partners (including YouTube, Netflix, Facebook etc) , Media Capabilities support has been the number one request.

We intend to implement and ship this API very soon.

Early comment and feedback will be welcome.

Kinds regards
Jean-Yves

James Graham

unread,

May 14, 2018, 11:38:45 AM5/14/18

to dev-pl...@lists.mozilla.org

What is the testing situation for this feature? Do we have
web-platform-tests?

Tom Ritter

unread,

May 14, 2018, 12:48:20 PM5/14/18

to Jean-Yves Avenard, dev-platform

It seems like this will reveal a lot of information about the user's
hardware. Does the Resist Fingerprinting preference disable the API or
report standardized results? If not, can we get that bug on file (and
if it's easy, point out exactly where we would want to add the 'if()
return false'?)

-tom

> We intend to implement and ship this API very soon.
>
> Early comment and feedback will be welcome.
>
> Kinds regards
> Jean-Yves

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

Boris Zbarsky

unread,

May 14, 2018, 12:53:40 PM5/14/18

to

On 5/14/18 11:19 AM, Jean-Yves Avenard wrote:
> The proposed spec is available at https://wicg.github.io/media-capabilities/ <https://wicg.github.io/media-capabilities/>

I have some questions about this spec and our implementation:

1) What are the fingerprinting implications? What effect, if any, do
our "resist fingerprinting" preferences have on our API implementation
here? The spec tries to address this but as usualy mostly handwaves
around it.

2) It looks to me that given a MediaCapabilitiesInfo there is no way to
figure out what configuration it represents. Should there be such a
way? It seems like it would make it simpler to deal with asking for the
capabilities for several configurations at once and then examining the
results if you don't have to keep track of which returned promise
corresponds to which passed-in configuration. Doubly so if you
Promise.race things (though why one would do that in this case is not so
clear to me).

Note that even the example in section 5.1 of the spec gets this wrong:
it uses result.contentType, but "result" is a MediaCapabilitiesInfo and
doesn't have a .contentType property.

3) The booleans in MediaCapabilitiesInfo (apart from "supported") seem
rather vaguely defined. As a concrete example, if I am on 4-core (+
hyperthreading) "desktop"-level system with nothing running except the
video, "smooth" should clearly be set to true. Should it still be set
to true on the same hardware but in a situation where I am heavily
swapping and my load average is 150? This is a bit of a caricature, but
it seems to me that if people are going to treat this as a _reliable_
signal then it needs to be more clearly spelled out what things it does
or does not take into account.

4) For the "change" event on Screen, does that apply to any property
defined in any specification, not just the properties defined in this
specification? That would be a pretty significant monkeypatch in its
own right. It would be better if whatever specifications define
properties also define what events fire if those properties change.

-Boris

Jean-Yves Avenard

unread,

May 14, 2018, 2:58:03 PM5/14/18

to Tom Ritter, dev-platform

Hi

> On 14 May 2018, at 6:47 pm, Tom Ritter <t...@mozilla.com> wrote:
>
> It seems like this will reveal a lot of information about the user's
> hardware. Does the Resist Fingerprinting preference disable the API or
> report standardized results? If not, can we get that bug on file (and
> if it's easy, point out exactly where we would want to add the 'if()
> return false'?)
>
> -tom

This is a concern that has been raised previously, and one that you can ultimately get with existing APIs, but those are typically after the fact, and by then it’s already too late to allow the user to have a decent media playback experience

Existing canPlayType can tell you if we support a particular codec or not.
During playback, we already expose various metrics (starting from bug https://bugzilla.mozilla.org/show_bug.cgi?id=580531) this became an official spec, to determine if the content plays well : number of frames dropped, number of frames decoded, how many were painted etc...

As such MediaCapabilities doesn’t expose much more than what someone can already gather over time with what’s already existing.

There are various ways we can build the Media Capabilities answer: collecting past metrics and build up a dictionary, or make assumptions based on the decoders (e.g. we know a hardware h264 decoder will always be smooth and power efficient).

To get around fingerprinting, at the user’s choice, the obvious work around would be to report that everything is always supported, will always do so smoothly with great battery savings. This is something we already do for the existing apps. The user will end up with a poor video experience however. As it will typically be served content not always adapted to his machine capabilities.

Providing a way to ensure the user will get a good video experience is paramount IMHO. Watching video on their web browser is what people do the most…

JY

Jean-Yves Avenard

unread,

May 14, 2018, 3:18:28 PM5/14/18

to Boris Zbarsky, dev-pl...@lists.mozilla.org

Hi

> On 14 May 2018, at 6:53 pm, Boris Zbarsky <bzba...@mit.edu> wrote:
>
> On 5/14/18 11:19 AM, Jean-Yves Avenard wrote:
>> The proposed spec is available at https://wicg.github.io/media-capabilities/ <https://wicg.github.io/media-capabilities/>
>
> I have some questions about this spec and our implementation:
>

We’re at an early stage in the implementation, and assuming the concerns of some, I wanted to present it early on.

> 1) What are the fingerprinting implications? What effect, if any, do our "resist fingerprinting" preferences have on our API implementation here? The spec tries to address this but as usualy mostly handwaves around it.

The most obvious choice considered was to provide identical information to what the existing canPlayType information provide: that is not providing extra details.
so if canPlayType reports that "video/webm; codecs=vp9” is supported, then so will MediaCapabilities, but providing no difference then according to the resolution or the bitrate specified.
It is currently possible with canPlayType to query much deeper level of information, in particular bitrate, colorspace, HDR support, codec level etc… We haven’t fully implemented those because as canPlayType is a synchronous API, doing so properly with our asynchronous backend is hard.

>
> 2) It looks to me that given a MediaCapabilitiesInfo there is no way to figure out what configuration it represents. Should there be such a way? It seems like it would make it simpler to deal with asking for the capabilities for several configurations at once and then examining the results if you don't have to keep track of which returned promise corresponds to which passed-in configuration. Doubly so if you Promise.race things (though why one would do that in this case is not so clear to me).
>
> Note that even the example in section 5.1 of the spec gets this wrong: it uses result.contentType, but "result" is a MediaCapabilitiesInfo and doesn't have a .contentType property.

I would invite you to submit such bug and concern you have on the wicg site:
https://github.com/wicg/media-capabilities/issues

Or I can do so if you prefer.

> 3) The booleans in MediaCapabilitiesInfo (apart from "supported") seem rather vaguely defined. As a concrete example, if I am on 4-core (+ hyperthreading) "desktop"-level system with nothing running except the video, "smooth" should clearly be set to true. Should it still be set to true on the same hardware but in a situation where I am heavily swapping and my load average is 150? This is a bit of a caricature, but it seems to me that if people are going to treat this as a _reliable_ signal then it needs to be more clearly spelled out what things it does or does not take into account.

this is an issue I’ve been raising frequently, that there’s no way to determine if the capabilities change over time: receiving a notification when such temporary workload occurs would be of benefit.
The spec isn’t set in stone, and I’m hoping that a new event could be dispatched on the media element to indicate that the capabilities have changed.

Having said that, with hardware decoders, typically whatever you may be doing has no impact on performance: it’s a dedicated circuit (even if for some there’s a limit on how many decoders can be used at the same time).

>
> 4) For the "change" event on Screen, does that apply to any property defined in any specification, not just the properties defined in this specification? That would be a pretty significant monkeypatch in its own right. It would be better if whatever specifications define properties also define what events fire if those properties change.
>

I’m not sure I understand your question. onChange and the change event is only defined for the Screen interface (https://drafts.csswg.org/cssom-view/#the-screen-interface).
Or you’re suggesting that as the MediaCapabilities Screen extension is only about gamut and luminance, each should get its own event so that future extension to the Screen interface do no conflict?

JY

Tom Ritter

unread,

May 14, 2018, 4:14:17 PM5/14/18

to Jean-Yves Avenard, dev-platform

On Mon, May 14, 2018 at 1:57 PM, Jean-Yves Avenard
<jyav...@mozilla.com> wrote:
> Hi
>

>> On 14 May 2018, at 6:47 pm, Tom Ritter <t...@mozilla.com> wrote:
>>
>> It seems like this will reveal a lot of information about the user's
>> hardware. Does the Resist Fingerprinting preference disable the API or
>> report standardized results? If not, can we get that bug on file (and
>> if it's easy, point out exactly where we would want to add the 'if()
>> return false'?)
>>
>> -tom
>
> This is a concern that has been raised previously, and one that you can ultimately get with existing APIs, but those are typically after the fact, and by then it’s already too late to allow the user to have a decent media playback experience
>
> Existing canPlayType can tell you if we support a particular codec or not.

Okay, it sounds like canPlayType needs to respect Resist
Fingerprinting (RFP) as well then. I've filed
https://bugzilla.mozilla.org/show_bug.cgi?id=1461454 for this.

> During playback, we already expose various metrics (starting from bug https://bugzilla.mozilla.org/show_bug.cgi?id=580531) this became an official spec, to determine if the content plays well : number of frames dropped, number of frames decoded, how many were painted etc...

This is the Media Statistics API? We return constant values for this in RFP.

> As such MediaCapabilities doesn’t expose much more than what someone can already gather over time with what’s already existing.
>
> There are various ways we can build the Media Capabilities answer: collecting past metrics and build up a dictionary, or make assumptions based on the decoders (e.g. we know a hardware h264 decoder will always be smooth and power efficient).
>
> To get around fingerprinting, at the user’s choice, the obvious work around would be to report that everything is always supported, will always do so smoothly with great battery savings. This is something we already do for the existing apps. The user will end up with a poor video experience however. As it will typically be served content not always adapted to his machine capabilities.

Right, RFP results in a degraded web experience as it is currently
implemented in several areas - our focus so far has been to great as
consistent and tight a line as we can, and then investigate how we can
improve the experience.

-tom

Steven Englehardt

unread,

May 14, 2018, 5:43:39 PM5/14/18

to Jean-Yves Avenard, dev-platform, Tom Ritter

> > On 14 May 2018, at 6:47 pm, Tom Ritter <t...@mozilla.com> wrote:
> >
> > It seems like this will reveal a lot of information about the user's
> > hardware. Does the Resist Fingerprinting preference disable the API or
> > report standardized results? If not, can we get that bug on file (and
> > if it's easy, point out exactly where we would want to add the 'if()
> > return false'?)
> >
> > -tom
>
> This is a concern that has been raised previously, and one that you can
> ultimately get with existing APIs, but those are typically after the fact,
> and by then it’s already too late to allow the user to have a decent media
> playback experience
>
> Existing canPlayType can tell you if we support a particular codec or not.

> During playback, we already expose various metrics (starting from bug
> https://bugzilla.mozilla.org/show_bug.cgi?id=580531) this became an
> official spec, to determine if the content plays well : number of frames
> dropped, number of frames decoded, how many were painted etc...
>

> As such MediaCapabilities doesn’t expose much more than what someone can
> already gather over time with what’s already existing.
>

Not a domain expert, so I'd like to make sure I understand the difference
between what was possible with `canPlayType` + Media Statistics and what is
possible with Media Capabilities. Please correct me if I have it wrong!
Previously, a script wanting to fingerprint the user would have had to (1)
probe content type/codec support with `canPlayType`, (2) play videos using
each of these configurations, (3) measure and compute metrics for each
configuration using the media statistics API. With Media Capabilities, a
script still needs to probe each content type/codec pair (i.e., they can't
retrieve a list of all supported codecs?), and will receive a
classification of whether the device supports it, is expected to play the
video smoothly, and is expected to play the video in a power efficient way.

If my understanding is correct, Media Capabilities does expose quite a
larger fingerprinting surface in practice. While it may have been
theoretically possible for all trackers to gather statistics on video
playback for each configuration, the only scripts that could practically
carry out those attacks without degrading user experience would have been
video providers. This will be especially true if browsers start blocking
autoplay by default (https://bugzilla.mozilla.org/show_bug.cgi?id=1376321),
since users will never interact with media elements from fingerprinting
scripts. With the Media Capabilities API, it seems that a script like
fingerprintjs2 (https://github.com/Valve/fingerprintjs2) could run through
a big list of types/codecs and retrieve device information regarding
smoothness and energy efficiency with relatively little overhead?

If autoplay is eventually blocked by default could we gate the response of
this API on user interaction with the media element?

> To get around fingerprinting, at the user’s choice, the obvious work
> around would be to report that everything is always supported, will always
> do so smoothly with great battery savings. This is something we already do
> for the existing apps. The user will end up with a poor video experience
> however. As it will typically be served content not always adapted to his
> machine capabilities.
>

It would be great to have the "fingerprint-proof" mode described in more
detail in the spec. That will go a long way to making sure the protection
implemented in RFP / Tor Browser doesn't break sites and provides the best
possible performance given the restrictions.

> Providing a way to ensure the user will get a good video experience is
> paramount IMHO. Watching video on their web browser is what people do the
> most…
>
> JY

Boris Zbarsky

unread,

May 14, 2018, 6:26:00 PM5/14/18

to

On 5/14/18 3:18 PM, Jean-Yves Avenard wrote:
> The most obvious choice considered was to provide identical information to what the existing canPlayType information provide: that is not providing extra details.

OK. All I'm saying is that this needs to be sorted out before we ship.

> I would invite you to submit such bug and concern you have on the wicg site:
> https://github.com/wicg/media-capabilities/issues

I can do that, sure. Figured I'd check whether these issues had been
considered yet first. Filed
https://github.com/WICG/media-capabilities/issues/82

> Having said that, with hardware decoders, typically whatever you may be doing has no impact on performance: it’s a dedicated circuit (even if for some there’s a limit on how many decoders can be used at the same time).

Sure; the question is what happens when the decoders are not hardware.

>> 4) For the "change" event on Screen, does that apply to any property defined in any specification, not just the properties defined in this specification? That would be a pretty significant monkeypatch in its own right. It would be better if whatever specifications define properties also define what events fire if those properties change.
>>
>
> I’m not sure I understand your question. onChange and the change event is only defined for the Screen interface (https://drafts.csswg.org/cssom-view/#the-screen-interface).

Yes... For which properties of that interface? For example, should it
fire for changes to availWidth? Changes to pixelDepth?

> Or you’re suggesting that as the MediaCapabilities Screen extension is only about gamut and luminance, each should get its own event so that future extension to the Screen interface do no conflict?

I don't have a strong opinion on that, though having a single "change"
event mean "yeah, one of these N things changed" is not that great; then
you have to re-poll all those N things to figure out which one changed.
But my real objection is that the MediaCapabilities spec is currently
saying that changes to _any_ property on Screen, defined by _any_
specification, necessitate a change event to be fired. That is a pretty
severe constraint on what other specifications can expose on Screen, no?
In particular exposing anything that the OS doesn't notify on changes
for and is somewhat expensive (whatever that means) to query the OS for
would suddenly become a non-starter.

-Boris

Randell Jesup

unread,

May 15, 2018, 1:41:51 PM5/15/18

to

>If my understanding is correct, Media Capabilities does expose quite a
>larger fingerprinting surface in practice. While it may have been
>theoretically possible for all trackers to gather statistics on video
>playback for each configuration, the only scripts that could practically
>carry out those attacks without degrading user experience would have been
>video providers. This will be especially true if browsers start blocking
>autoplay by default (https://bugzilla.mozilla.org/show_bug.cgi?id=1376321),
>since users will never interact with media elements from fingerprinting
>scripts. With the Media Capabilities API, it seems that a script like
>fingerprintjs2 (https://github.com/Valve/fingerprintjs2) could run through
>a big list of types/codecs and retrieve device information regarding
>smoothness and energy efficiency with relatively little overhead?

Probably so, yes. We could reduce but not eliminate the exposure by
rate-limiting requests (perhaps even on a sliding scale, allowing a
small number before delays are introduced). This is likely insufficient
as a mitigation, however.

>If autoplay is eventually blocked by default could we gate the response of
>this API on user interaction with the media element?

That might be possible, but if so it should be discussed in the spec and
how to get "real" data after user interaction. (Perhaps giving fake
data until user interaction, but then one needs to warn developers about
this, and how to get real data when interaction occurs reliably.)

--
Randell Jesup, Mozilla Corp
remove "news" for personal email

Karl Tomlinson

unread,

May 15, 2018, 5:05:52 PM5/15/18

to

Steven Englehardt writes:

> While it may have been
> theoretically possible for all trackers to gather statistics on video
> playback for each configuration, the only scripts that could practically
> carry out those attacks without degrading user experience would have been
> video providers. This will be especially true if browsers start blocking
> autoplay by default (https://bugzilla.mozilla.org/show_bug.cgi?id=1376321),
> since users will never interact with media elements from fingerprinting

> scripts. [...]

> If autoplay is eventually blocked by default could we gate the response of
> this API on user interaction with the media element?

Note that current playback blocking work is focused on media
elements that produce audio. Video-only elements would not be
affected.

The blocking is also gated by interaction with the document,
rather than any particular element.

(There may be other non-default behaviors available.)

Jean-Yves Avenard

unread,

Jul 3, 2018, 8:16:46 PM7/3/18

to dev-platform

Hi

The code is now in central and in the last nightly.

It's currently disabled by default behind the pref
media.media-capabilities.enabled

The bug tracking fingerprinting concerns is done in
https://bugzilla.mozilla.org/show_bug.cgi?id=1461454

Feel free to enable it and watch videos in YouTube. On mac in particular it
would allow to re-enable the free vp9 codec (which has been disabled due to
performance reason)

Kind regards
Jean-Yves

On Mon, May 14, 2018 at 5:19 PM, Jean-Yves Avenard <jyav...@mozilla.com>
wrote:

> Media Capabilities allow for web sites to better determine what content to
> serve to the end user.
> Currently a media element offers the canPlayType method (

> https://html.spec.whatwg.org/multipage/media.html#dom-
> navigator-canplaytype-dev) to determine if a container/codec can be used.

> But the answer is limited as a maybe/probably type answer.
>
> It gives no ability to determine if a particular resolution can be played
> well/smoothly enough or be done in a power efficient manner (e.g. will it
> be hardware accelerated).
>
> This has been a particular problem with sites such as YouTube that serves
> VP9 under all circumstances even if the user agent won't play it well (VP9
> is mostly done via software decoding and is CPU itensive). This has forced
> us to indiscriminately disable VP9 altogether).
> For YouTube to know that VP9 could be used for low resolution but not
> high-def ones would allow them to select the right codec from the start.
>
> This issue is tracked in bugzilla 1409664 (https://bugzilla.mozilla.org/

> show_bug.cgi?id=1409664)

>
> The proposed spec is available at https://wicg.github.io/
> media-capabilities/
>

Jean-Yves Avenard

unread,

Aug 9, 2018, 4:32:46 PM8/9/18

to dev-platform

Hi

There has been some concerns about some parts of the spec, in particular the one extending the Screen interface.

The plan now is to keep the Screen extensions disabled by default and to enable the remaining parts, related purely to the playing and encoding capabilities on.

This is tracked in bug 1480190

JY

> On 4 Jul 2018, at 2:16 am, Jean-Yves Avenard <jyav...@mozilla.com> wrote:
>
> Hi
>
> The code is now in central and in the last nightly.
>
> It's currently disabled by default behind the pref media.media-capabilities.enabled
>

> The bug tracking fingerprinting concerns is done in https://bugzilla.mozilla.org/show_bug.cgi?id=1461454 <https://bugzilla.mozilla.org/show_bug.cgi?id=1461454>

Jean-Yves Avenard

unread,

Aug 11, 2018, 8:45:16 AM8/11/18

to dev-platform

It appears that I hadn’t provided all the information earlier…

So here it is again:

Summary:

Media Capabilities allow for web sites to better determine what content to serve to the end user.

Currently a media element offers the canPlayType method (https://html.spec.whatwg.org/multipage/media.html#dom-navigator-canplaytype-dev) to determine if a container/codec can be used. But the answer is limited as a maybe/probably type answer.

It gives no ability to determine if a particular resolution can be played well/smoothly enough or be done in a power efficient manner (e.g. will it be hardware accelerated).

This has been a particular problem with sites such as YouTube that serves VP9 under all circumstances even if the user agent won't play it well (VP9 is mostly done via software decoding and is CPU itensive). This has forced us to indiscriminately disable VP9 altogether).
For YouTube to know that VP9 could be used for low resolution but not high-def ones would allow them to select the right codec from the start.

Chrome has shipped it a while ago now and talking to several partners (including YouTube, Netflix, Facebook etc) , Media Capabilities support has been the number one request.

Bug: This issue is tracked in bugzilla 1409664 (https://bugzilla.mozilla.org/show_bug.cgi?id=1409664)

Link to standard: The proposed spec is available at https://wicg.github.io/media-capabilities/
Platform coverage: It will be available for all platform, and exposed to all sites including insecure (http)
Estimated or target release: 63
Preference behind which this will be implemented: the feature is controllable via media.media-capabilities.enabled
Is this feature enabled by default in sandboxed iframes? If not, is there a proposed sandbox flag to enable it? If allowed, does it preserve the current invariants in terms of what sandboxed iframes can do?
DevTools bug: No particular requirements for additional devtools
Do other browser engines implement this? Chrome has shipped this since late 2017
web-platform-tests: Phttp://w3c-test.org/media-capabilities/

We do not enable the Screen Media-Capabilities extension due spec issues (in particular https://github.com/WICG/media-capabilities/issues/89), additionally, we have no way at present to implement those.