web-platform-tests that fail only in Firefox (from wpt.fyi data)

Philip Jägenstedt

unread,

Oct 11, 2018, 4:22:59 PM10/11/18

to dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, David Burns

Hi all,

I sent the result of some investigation to webkit-dev [1] today and
thought you might be interested to take a look the equivalent list for
Firefox.

https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
list of tests that fail in Firefox Nightly, but pass in stable
versions of Chrome, Edge and Safari. Although not all of them will be
high-value and really impact web developers, these are probably more
valuable to fix than a random WPT failure. Triage and prioritization
required, of course.

Skimming the list, I'd guess that css-flexbox, css-grid, fetch and
streams might be the most worth digging into.
cors-cookies-redirect.any.html, for example, seems like something that
could matter in the real world.

Making this part of the wpt.fyi UI is a current priority [2] but I
thought this one-off analysis might still be useful to y'all.

[1] https://lists.webkit.org/pipermail/webkit-dev/2018-October/030209.html
[2] https://github.com/web-platform-tests/wpt.fyi/issues/201

Boris Zbarsky

unread,

Oct 11, 2018, 4:34:41 PM10/11/18

to Philip Jägenstedt, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, David Burns

On 10/11/18 4:22 PM, Philip Jägenstedt wrote:
> https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
> list of tests that fail in Firefox Nightly, but pass in stable
> versions of Chrome, Edge and Safari.

Or more precisely have some sub-test that has that property, right?

Thank you for putting this list together.

-Boris

Boris Zbarsky

unread,

Oct 11, 2018, 4:37:03 PM10/11/18

to

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1498357 to track
these failures.

-Boris

Philip Jägenstedt

unread,

Oct 13, 2018, 3:17:26 AM10/13/18

to Boris Zbarsky, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, David Burns

On Thu, Oct 11, 2018, 22:34 Boris Zbarsky <bzba...@mit.edu> wrote:

> On 10/11/18 4:22 PM, Philip Jägenstedt wrote:

> > https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
> > list of tests that fail in Firefox Nightly, but pass in stable
> > versions of Chrome, Edge and Safari.
>

> Or more precisely have some sub-test that has that property, right?
>

Right, since there's no way to link to a subtest, in those cases I've
linked to the test and it might take some work to spot which subtest it
was. If this is a problem I could improve the report.

Thanks for filing the tracking bug, l hope there's some failures in here
that point to problems that really affect web developers that can be fixed.

>

Philip Jägenstedt

unread,

Oct 13, 2018, 3:27:46 AM10/13/18

to Boris Zbarsky, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, David Burns

On Sat, Oct 13, 2018, 09:17 Philip Jägenstedt <foo...@chromium.org> wrote:

> On Thu, Oct 11, 2018, 22:34 Boris Zbarsky <bzba...@mit.edu> wrote:
>
>> On 10/11/18 4:22 PM, Philip Jägenstedt wrote:

>> > https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
>> > list of tests that fail in Firefox Nightly, but pass in stable
>> > versions of Chrome, Edge and Safari.
>>

>> Or more precisely have some sub-test that has that property, right?
>>
>
> Right, since there's no way to link to a subtest, in those cases I've
> linked to the test and it might take some work to spot which subtest it
> was. If this is a problem I could improve the report.
>
> Thanks for filing the tracking bug, l hope there's some failures in here
> that point to problems that really affect web developers that can be fixed.
>

There's another crux worth mentioning. Tests can be definitely passing or
definitely failing, but then there are various crash/error/timeout/etc
results where the validity of the test is uncertain, or it's quite likely
to be a flake or infra issue. In my report I've been conservative and used
1 PASS + 3 FAIL as the criteria. Fiddling with these rules can reveal lots
more potential issues, and if you like I could provide reports on that too.

>

Emilio Cobos Álvarez

unread,

Oct 16, 2018, 8:23:21 PM10/16/18

to Philip Jägenstedt, Boris Zbarsky, David Burns, James Graham, dev-pl...@lists.mozilla.org, Michael Taylor

Hi Philip,

Do you know how do reftests run in order to get that data?

I'm particularly curious about this Firefox-only failure:

css/selectors/selection-image-001.html

It passes both on our automation and locally. I'm curious because I was
the author of that test (whoops) and the Firefox fix (bug 1449010).

Does it use the same mechanism than our automation to wait for image
decodes and such? Is there any way to see the test images?

IIRC one potential difference here is that Firefox blocks the load event
for image loads, but doesn't decode images synchronously unlike other
browsers, so we may fire the load event but not paint the image. Our
reftest harnesses has use internal APIs to ensure that the screenshot is
taken with all the images decoded.

I suspect that can't be the cause of this test failure, since the image
is really small and I would've expected it to get synchronously decoded
anyway (we sync-decode if fast by default), but I'm no expert about how
wpt.fyi is set up, thus the curiosity, I'd love to be able to see the
screenshots of that test.

Thanks in advance,

-- Emilio

On 10/13/18 9:27 AM, Philip Jägenstedt wrote:
> On Sat, Oct 13, 2018, 09:17 Philip Jägenstedt <foo...@chromium.org> wrote:
>
>> On Thu, Oct 11, 2018, 22:34 Boris Zbarsky <bzba...@mit.edu> wrote:
>>
>>> On 10/11/18 4:22 PM, Philip Jägenstedt wrote:

>>>> https://gist.github.com/foolip/a77c88e62aa3cfc461c2879f3e5d4855 is a
>>>> list of tests that fail in Firefox Nightly, but pass in stable
>>>> versions of Chrome, Edge and Safari.
>>>

>>> Or more precisely have some sub-test that has that property, right?
>>>
>>
>> Right, since there's no way to link to a subtest, in those cases I've
>> linked to the test and it might take some work to spot which subtest it
>> was. If this is a problem I could improve the report.
>>
>> Thanks for filing the tracking bug, l hope there's some failures in here
>> that point to problems that really affect web developers that can be fixed.
>>
>
> There's another crux worth mentioning. Tests can be definitely passing or
> definitely failing, but then there are various crash/error/timeout/etc
> results where the validity of the test is uncertain, or it's quite likely
> to be a flake or infra issue. In my report I've been conservative and used
> 1 PASS + 3 FAIL as the criteria. Fiddling with these rules can reveal lots
> more potential issues, and if you like I could provide reports on that too.
>
>>

> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>

James Graham

unread,

Oct 17, 2018, 5:13:06 AM10/17/18

to Emilio Cobos Álvarez, Philip Jägenstedt, Boris Zbarsky, David Burns, James Graham, dev-pl...@lists.mozilla.org, Michael Taylor

On 17/10/2018 01:23, Emilio Cobos Álvarez wrote:
> Hi Philip,
>
> Do you know how do reftests run in order to get that data?
>
> I'm particularly curious about this Firefox-only failure:
>
> css/selectors/selection-image-001.html
>
> It passes both on our automation and locally. I'm curious because I was
> the author of that test (whoops) and the Firefox fix (bug 1449010).
>
> Does it use the same mechanism than our automation to wait for image
> decodes and such? Is there any way to see the test images?

It's using the same harness as we use in gecko, so it should be giving
the same results, but of course it's possible that there's some
difference in the configuration that could cause different results for
some tests.

Unfortunately there isn't yet a way to see the images; because of the
number of failures per run, and the number of runs, putting all the
screenshots in the logs would be prohibitively large, but there is a
plan to start uploading previously unseen screenshots to wpt.fyi [1]

Having said that the infrastructure is all containerised and it's
possible to repeat the run locally with relatively little effort. I'm
happy to help out with that if you like.

[1] https://github.com/web-platform-tests/wpt.fyi/issues/57

James Graham

unread,

Oct 17, 2018, 5:56:14 AM10/17/18

to dev-pl...@lists.mozilla.org

On 17/10/2018 10:12, James Graham wrote:
> On 17/10/2018 01:23, Emilio Cobos Álvarez wrote:
>> Hi Philip,
>>
>> Do you know how do reftests run in order to get that data?
>>
>> I'm particularly curious about this Firefox-only failure:
>>
>> css/selectors/selection-image-001.html
>>
>> It passes both on our automation and locally. I'm curious because I
>> was the author of that test (whoops) and the Firefox fix (bug 1449010).
>>
>> Does it use the same mechanism than our automation to wait for image
>> decodes and such? Is there any way to see the test images?
>
> It's using the same harness as we use in gecko, so it should be giving
> the same results, but of course it's possible that there's some
> difference in the configuration that could cause different results for
> some tests.
>
> Unfortunately there isn't yet a way to see the images; because of the
> number of failures per run, and the number of runs, putting all the
> screenshots in the logs would be prohibitively large, but there is a
> plan to start uploading previously unseen screenshots to wpt.fyi [1]

OK, I investigated this and it turns out that we accidentally started
uploading tbpl-style logs with screenshots for full runs when we turned
on taskcluster for PRs. So the screenshot is available through

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://taskcluster-artifacts.net/U6OIGr7ZTjurDYjy_KgyCg/0/public/results/log_tbpl.log

Emilio Cobos Álvarez

unread,

Oct 17, 2018, 8:03:27 AM10/17/18

to James Graham, dev-pl...@lists.mozilla.org

Thanks! So it looks that the reftest screenshots are taken on inactive
windows?

We don't respect ::selection for inactive windows, so the failure now
makes sense.

Still I think there's something fishy there, but it may be related to
the widget toolkit that is on wpt's CI or something...

-- Emilio

Philip Jägenstedt

unread,

Oct 17, 2018, 9:10:49 AM10/17/18

to Emilio Cobos Álvarez, James Graham, dev-pl...@lists.mozilla.org

Thanks James for accidentally storing screenshots in Taskcluster logs
and figuring out how to use them with reftest-analyzer, that's great
and I'll pass along this tip to blink-dev as well :D

Boris Zbarsky

unread,

Oct 17, 2018, 5:53:51 PM10/17/18

to Philip Jägenstedt, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, David Burns

On 10/13/18 3:27 AM, Philip Jägenstedt wrote:
> Fiddling with these rules can reveal lots
> more potential issues, and if you like I could provide reports on that too.

I would be pretty interested in that, yes. In particular, a report
where there is 1 "not PASS and not FAIL" and 3 "PASS" would be pretty
helpful, I suspect.

By the way, I recently found some tests that fail when run directly but
pass in the harness. :( For example
http://w3c-test.org/html/infrastructure/common-dom-interfaces/collections/htmlallcollection.html
fails various subtests in all browsers due to the <div id="log"> being
in the DOM when running directly. Not really sure what we can do with that.

-Boris

Philip Jägenstedt

unread,

Oct 19, 2018, 8:42:33 AM10/19/18

to Boris Zbarsky, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, dbu...@mozilla.com

On Wed, Oct 17, 2018 at 11:53 PM Boris Zbarsky <bzba...@mit.edu> wrote:
>
> On 10/13/18 3:27 AM, Philip Jägenstedt wrote:

> > Fiddling with these rules can reveal lots
> > more potential issues, and if you like I could provide reports on that too.
>

> I would be pretty interested in that, yes. In particular, a report
> where there is 1 "not PASS and not FAIL" and 3 "PASS" would be pretty
> helpful, I suspect.

Rerunning my script it's apparent that unreliable Edge results [1]
leads to the same tests being considered lone failures or not for the
other browsers. So, I've use the same set of runs for this report of
what you suggested:
https://gist.github.com/foolip/e6014c9bcc8ca405219bf18542eb5d69

It's not a long list, so I checked them all and they are timeouts.
This is sometimes the failure mode for genuine problems, so looking
over these might be valuable.

> By the way, I recently found some tests that fail when run directly but
> pass in the harness. :( For example
> http://w3c-test.org/html/infrastructure/common-dom-interfaces/collections/htmlallcollection.html
> fails various subtests in all browsers due to the <div id="log"> being
> in the DOM when running directly. Not really sure what we can do with that.

That's a bit odd, the <div id="log"> is in the markup and would be
when running manually or under automation. Are you sure that explains
the difference? If it does, then just removing it from the markup and
adapting any affected tests would be the way to go. I updated the test
pretty recently, if you're confident it's broken can you file a wpt
issue and assign me?

[1] https://github.com/web-platform-tests/results-collection/issues/563

Boris Zbarsky

unread,

Oct 19, 2018, 11:52:55 AM10/19/18

to Philip Jägenstedt, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, dbu...@mozilla.com

On 10/19/18 8:42 AM, Philip Jägenstedt wrote:
> That's a bit odd, the <div id="log"> is in the markup and would be
> when running manually or under automation. Are you sure that explains
> the difference?

Yes. I filed https://github.com/web-platform-tests/wpt/issues/13625

-Boris

Philip Jägenstedt

unread,

Dec 14, 2018, 3:42:02 AM12/14/18

to Boris Zbarsky, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, dbu...@mozilla.com

On Fri, Oct 19, 2018 at 2:42 PM Philip Jägenstedt <foo...@chromium.org> wrote:
>
> On Wed, Oct 17, 2018 at 11:53 PM Boris Zbarsky <bzba...@mit.edu> wrote:
> >
> > On 10/13/18 3:27 AM, Philip Jägenstedt wrote:

> > > Fiddling with these rules can reveal lots
> > > more potential issues, and if you like I could provide reports on that too.
> >

> > I would be pretty interested in that, yes. In particular, a report
> > where there is 1 "not PASS and not FAIL" and 3 "PASS" would be pretty
> > helpful, I suspect.
>
> Rerunning my script it's apparent that unreliable Edge results [1]
> leads to the same tests being considered lone failures or not for the
> other browsers. So, I've use the same set of runs for this report of
> what you suggested:
> https://gist.github.com/foolip/e6014c9bcc8ca405219bf18542eb5d69
>
> It's not a long list, so I checked them all and they are timeouts.
> This is sometimes the failure mode for genuine problems, so looking
> over these might be valuable.

Given the recent news [1] it won't be as relevant to consider the
status of EdgeHTML for prioritization in other engines. Given that and
the unreliable results, I've updated my script to consider only
Chrome, Firefox and Safari. I also the reports auto-updating on a
daily basis:
https://foolip.github.io/ad-hoc-wpt-results-analysis/chrome-lone-failures.html
https://foolip.github.io/ad-hoc-wpt-results-analysis/firefox-lone-failures.html
https://foolip.github.io/ad-hoc-wpt-results-analysis/safari-lone-failures.html

[1] https://github.com/MicrosoftEdge/MSEdge/blob/master/README.md

Philip Jägenstedt

unread,

Dec 14, 2018, 3:50:21 AM12/14/18

to Boris Zbarsky, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor, dbu...@mozilla.com

On Fri, Dec 14, 2018 at 9:41 AM Philip Jägenstedt <foo...@chromium.org> wrote:
>
> On Fri, Oct 19, 2018 at 2:42 PM Philip Jägenstedt <foo...@chromium.org> wrote:
> >
> > On Wed, Oct 17, 2018 at 11:53 PM Boris Zbarsky <bzba...@mit.edu> wrote:
> > >
> > > On 10/13/18 3:27 AM, Philip Jägenstedt wrote:

> > > > Fiddling with these rules can reveal lots
> > > > more potential issues, and if you like I could provide reports on that too.
> > >

> > > I would be pretty interested in that, yes. In particular, a report
> > > where there is 1 "not PASS and not FAIL" and 3 "PASS" would be pretty
> > > helpful, I suspect.
> >
> > Rerunning my script it's apparent that unreliable Edge results [1]
> > leads to the same tests being considered lone failures or not for the
> > other browsers. So, I've use the same set of runs for this report of
> > what you suggested:
> > https://gist.github.com/foolip/e6014c9bcc8ca405219bf18542eb5d69
> >
> > It's not a long list, so I checked them all and they are timeouts.
> > This is sometimes the failure mode for genuine problems, so looking
> > over these might be valuable.
>
> Given the recent news [1] it won't be as relevant to consider the
> status of EdgeHTML for prioritization in other engines. Given that and
> the unreliable results, I've updated my script to consider only
> Chrome, Firefox and Safari. I also the reports auto-updating on a
> daily basis:
> https://foolip.github.io/ad-hoc-wpt-results-analysis/chrome-lone-failures.html
> https://foolip.github.io/ad-hoc-wpt-results-analysis/firefox-lone-failures.html
> https://foolip.github.io/ad-hoc-wpt-results-analysis/safari-lone-failures.html
>
> [1] https://github.com/MicrosoftEdge/MSEdge/blob/master/README.md

And, to spell it out, the effect of that is to increase the number of
product-specific for all three by quite a lot. Firefox goes from ~700
to ~1300.

Chrome went from ~300 to ~900, and I'm suggesting that we get to at
least <500 and stay there. (I suspect many failures are for trivial
reasons, so that it'll be easy to make progress in the beginning.)

Philip Jägenstedt

unread,

Dec 15, 2018, 6:27:57 AM12/15/18

to dbu...@mozilla.com, Luke Bjerring, Boris Zbarsky, dev-pl...@lists.mozilla.org, James Graham, Michael Taylor

That's fantastic, there's a lot to triage but hopefully it's well worth it.
If you create any ad-hoc mapping between failures and bugs, please comment
on https://github.com/web-platform-tests/wpt.fyi/issues/64 and perhaps we
can populate using that data when the linking feature exists. +Luke Bjerring
<lukebj...@google.com> FYI.

On Fri, Dec 14, 2018 at 4:09 PM David Burns <dbu...@mozilla.com> wrote:

> Thanks for this Philip.
>
> I have started raising bugs and blocking
> https://bugzilla.mozilla.org/show_bug.cgi?id=1498357.
>
> David
>
> On Fri, 14 Dec 2018 at 08:41, Philip Jägenstedt <foo...@chromium.org>
> wrote:
>
>> On Fri, Oct 19, 2018 at 2:42 PM Philip Jägenstedt <foo...@chromium.org>
>> wrote:

>> >
>> > On Wed, Oct 17, 2018 at 11:53 PM Boris Zbarsky <bzba...@mit.edu>
>> wrote:
>> > >
>> > > On 10/13/18 3:27 AM, Philip Jägenstedt wrote:

>> > > > Fiddling with these rules can reveal lots
>> > > > more potential issues, and if you like I could provide reports on
>> that too.
>> > >

David Burns

unread,

Dec 15, 2018, 6:32:48 AM12/15/18

to Philip Jägenstedt, Boris Zbarsky, dev-platform, James Graham, Michael Taylor

Thanks for this Philip.

I have started raising bugs and blocking
https://bugzilla.mozilla.org/show_bug.cgi?id=1498357.

David

On Fri, 14 Dec 2018 at 08:41, Philip Jägenstedt <foo...@chromium.org> wrote:

> On Fri, Oct 19, 2018 at 2:42 PM Philip Jägenstedt <foo...@chromium.org>
> wrote:
> >
> > On Wed, Oct 17, 2018 at 11:53 PM Boris Zbarsky <bzba...@mit.edu> wrote:
> > >
> > > On 10/13/18 3:27 AM, Philip Jägenstedt wrote:

> > > > Fiddling with these rules can reveal lots
> > > > more potential issues, and if you like I could provide reports on
> that too.
> > >