Update on running the W3C tests

85 views
Skip to first unread message

Dirk Pranke

unread,
Dec 20, 2013, 8:31:16 PM12/20/13
to blink-dev
Hi all,

Here's another holiday-timed update on this long-running task ... as of r164262 / (chromium r242227), we are running the first sets of tests imported from the W3C (227 of them). The process is not yet fully automated, and there are more improvements to follow, but if you have particular suites of tests you'd like to run, shoot me a note and we can add them.

Details:

We are now importing modified clones of the W3C's "csswg-test" and "web-platform-tests" repos on GitHub. 

The repos are being pulled in via entries in the chromium DEPS file, meaning that to pull in more tests, or get new versions, we need to roll that file.

We are not using a DEPS file in Blink because there's no good way to do that w/ gclient :( 

(If you want the details on this, contact me off-list, or start a new thread. This is really annoying (at least to me), and it remains to be seen if I can fully automate this process so that it isn't annoying, or if we need to modify gclient, or if we need to wait to merge the repos, or find another solution.)

In order to handle test failures introduced during a new W3C roll, I have added a new TestExpectations file in src/webkit/tools/layout_tests/test_expectations_w3c.txt . Any entries in that file will be honored on both the canary and deps bots (this works just like the skia_test_expectations.txt file). Entries in that file should be temporary and be merged into the main file.

We currently do not have a way to handle generic test baselines for the W3C tests, i.e., there are no -expected.txt files next to the tests themselves. (We can't put them next to the tests because that would require putting them in the w3c repos) You should be able to check in platform-specific expectations, but the baseline optimizer might get confused; I'll fix this soon.

This is less of a problem than you might think, because I've modified run-webkit-tests to not require -expected.txt files at all for tests that use the "testharness.js" script for assertion-based tests. As long as all of the asserts pass, we'll consider the test as passing. We are also not importing any manual (aka pixel) tests, just text-only/script-based tests and reftests.

The tests that we do run are automatically modified during the import process (via the import-w3c-tests script) to prefix CSS attributes and update other things as needed to run inside content_shell. Ultimately I hope to modify enough things in content_shell directly so that we can just run the tests as-is and not need this step.

Arguably, this whole process would be a lot easier if we just manually copied the tests into Blink. I'm not doing that because (a) I want to preserve the upstream repo history and (b) I think it's a bit more transparent and will ultimately allow us to more easily run new tests as they appear in the upstream repos. We'll see if this turns out to be a mistake :).

The process for uploading new tests to the W3C is entirely GitHub-centric and separate from running the tests. As we do more and more of this, I will add docs for this as well.

I will upload much of this information onto dev.chromium.org and send out a link to it.

Let me know if you have any questions, concerns, or other feedback!

-- Dirk


Nico Weber

unread,
Dec 22, 2013, 12:14:51 AM12/22/13
to Dirk Pranke, blink-dev

Dirk Pranke

unread,
Dec 23, 2013, 4:52:44 PM12/23/13
to Nico Weber, blink-dev
Nope, definitely not a known issue (to me, at least). I'll look into it and/or get the infra folks looking as well.

-- Dirk

Philip Jägenstedt

unread,
Jan 15, 2014, 6:00:43 AM1/15/14
to Dirk Pranke, blink-dev
Thanks for the update, Dirk!

On Sat, Dec 21, 2013 at 8:31 AM, Dirk Pranke <dpr...@chromium.org> wrote:

> We currently do not have a way to handle generic test baselines for the W3C
> tests, i.e., there are no -expected.txt files next to the tests themselves.
> (We can't put them next to the tests because that would require putting them
> in the w3c repos) You should be able to check in platform-specific
> expectations, but the baseline optimizer might get confused; I'll fix this
> soon.
>
> This is less of a problem than you might think, because I've modified
> run-webkit-tests to not require -expected.txt files at all for tests that
> use the "testharness.js" script for assertion-based tests. As long as all of
> the asserts pass, we'll consider the test as passing. We are also not
> importing any manual (aka pixel) tests, just text-only/script-based tests
> and reftests.

I suppose this means that we can't use a test that isn't fully
passing? That is a bit of a shame because when writing a test for
web-platforms-tests, you'd typically test against the spec and not
remove bits that happen to fail in Blink. As a concrete example, I've
been thinking about moving the GlobalEventHandlers tests to
web-platform-tests, but it sounds like I can't do that as long as
there are events in that interface which Blink doesn't support yet.
Correct?

> The tests that we do run are automatically modified during the import
> process (via the import-w3c-tests script) to prefix CSS attributes and
> update other things as needed to run inside content_shell. Ultimately I hope
> to modify enough things in content_shell directly so that we can just run
> the tests as-is and not need this step.
>
> Arguably, this whole process would be a lot easier if we just manually
> copied the tests into Blink. I'm not doing that because (a) I want to
> preserve the upstream repo history and (b) I think it's a bit more
> transparent and will ultimately allow us to more easily run new tests as
> they appear in the upstream repos. We'll see if this turns out to be a
> mistake :).

It might turn out to be a mistake, but I'm glad you did it this way.
Trying to figure out which repository is the most up to date between
Opera's internal tests, Blink's LayoutTests and web-platform-tests is
a waste of time, so one repo to rule them all is great!

> The process for uploading new tests to the W3C is entirely GitHub-centric
> and separate from running the tests. As we do more and more of this, I will
> add docs for this as well.
>
> I will upload much of this information onto dev.chromium.org and send out a
> link to it.
>
> Let me know if you have any questions, concerns, or other feedback!

Where do you want us to be in a few years? Personally, I'd love it if
we could write tests in web-platform-tests directly and gradually move
more tests from LayoutTests to web-platform-tests. However, that would
either branching web-platform-tests, or making it very quick to get
tests into upstream and to then pull them back into Blink. (This
discussion is probably premature.)

Philip

Dirk Pranke

unread,
Jan 15, 2014, 12:16:10 PM1/15/14
to Philip Jägenstedt, blink-dev
On Wed, Jan 15, 2014 at 3:00 AM, Philip Jägenstedt <phi...@opera.com> wrote:
Thanks for the update, Dirk!

On Sat, Dec 21, 2013 at 8:31 AM, Dirk Pranke <dpr...@chromium.org> wrote:

> We currently do not have a way to handle generic test baselines for the W3C
> tests, i.e., there are no -expected.txt files next to the tests themselves.
> (We can't put them next to the tests because that would require putting them
> in the w3c repos) You should be able to check in platform-specific
> expectations, but the baseline optimizer might get confused; I'll fix this
> soon.
>
> This is less of a problem than you might think, because I've modified
> run-webkit-tests to not require -expected.txt files at all for tests that
> use the "testharness.js" script for assertion-based tests. As long as all of
> the asserts pass, we'll consider the test as passing. We are also not
> importing any manual (aka pixel) tests, just text-only/script-based tests
> and reftests.

I suppose this means that we can't use a test that isn't fully
passing? That is a bit of a shame because when writing a test for
web-platforms-tests, you'd typically test against the spec and not
remove bits that happen to fail in Blink. As a concrete example, I've
been thinking about moving the GlobalEventHandlers tests to
web-platform-tests, but it sounds like I can't do that as long as
there are events in that interface which Blink doesn't support yet.
Correct?

Not quite. We need to support running tests that aren't fully passing, and you can kinda do it today: 

Start w/ checking in platform-specific expectations; since all the ports fall back to either mac or windows, that means that we'd check in two files rather than one. 

This is annoying, but may be okay for now. (I need to check to see what the baseline optimizer will do in this situation; I probably need to fix it to not try to de-dup said files).

If this turns out to be too awkward, there are three other options that I've thought of:

1) Have win fall back to mac, then alongside the test.
2) Create a separate platform/generic fallback
3) Figure out a way to check in the generic baselines as part of the import process on the blink branch in the repo.

I'm not sure that any of these are big improvements over just having two platform baselines, though (3) might be best. I'm open to votes here, or other ideas :).

> The tests that we do run are automatically modified during the import
> process (via the import-w3c-tests script) to prefix CSS attributes and
> update other things as needed to run inside content_shell. Ultimately I hope
> to modify enough things in content_shell directly so that we can just run
> the tests as-is and not need this step.
>
> Arguably, this whole process would be a lot easier if we just manually
> copied the tests into Blink. I'm not doing that because (a) I want to
> preserve the upstream repo history and (b) I think it's a bit more
> transparent and will ultimately allow us to more easily run new tests as
> they appear in the upstream repos. We'll see if this turns out to be a
> mistake :).

It might turn out to be a mistake, but I'm glad you did it this way.
Trying to figure out which repository is the most up to date between
Opera's internal tests, Blink's LayoutTests and web-platform-tests is
a waste of time, so one repo to rule them all is great!

Yes, the goal is to ultimately run mostly tests from the w3c and only have to run tests from LayoutTests (or elsewhere) when testing things that are browser-specific.

> The process for uploading new tests to the W3C is entirely GitHub-centric
> and separate from running the tests. As we do more and more of this, I will
> add docs for this as well.
>
> I will upload much of this information onto dev.chromium.org and send out a
> link to it.
>
> Let me know if you have any questions, concerns, or other feedback!

Where do you want us to be in a few years? Personally, I'd love it if
we could write tests in web-platform-tests directly and gradually move
more tests from LayoutTests to web-platform-tests. However, that would
either branching web-platform-tests, or making it very quick to get
tests into upstream and to then pull them back into Blink. (This
discussion is probably premature.)

Well, as described earlier, web-platform-tests is already branched (in our mirror), but using that branch (or another branch) for incoming tests may just be confusing; I'm not sure yet.

However, I think generally the hope we have (and also the other w3c test contributors have) is that we can make it very quick to upstream the tests. Pulling them back into Blink should become pretty automatic (I hope to eventually automatically be tracking tip of tree and rolling new versions into Blink).

-- Dirk

ja...@hoppipolla.co.uk

unread,
Jan 15, 2014, 2:56:35 PM1/15/14
to blin...@chromium.org, Philip Jägenstedt
So FWIW the main blocker at the moment from getting tests into web-platform-tests quickly is the review queue. We have test submissions that have been languishing in the queue for months and a low single digits number of people doing actual review work. This is a significant problem that is easily seen in the number of open pull requests against time [1]. Any effort that blink contributors can put into reducing that queue would be very much appreciated.

However for tests that are being upstreamed from vendors, we don't intend to do double-review; if there is a review of the test submission in another publicly accessible location it will be considered sufficient to land the tests in web-platform-tests. If necessary we will blacklist any contributors whose tests repeatedly fail to meet the quality bar, and require a second round of up-front review for their submissions.

For Mozilla, my latest (but not final) thinking is that we will have a directory in our local vcs where contributers add new tests. These will then be upstreamed, at first manually, but after a we understand the process well enough, with a script, carrying the review forward from bugzilla. Compared to making everyone interact with w-p-t directly this is significantly simpler for most of the involved parties.

[1] http://i.imgur.com/VJ7YpvQ.png

Dirk Pranke

unread,
Jan 15, 2014, 3:56:30 PM1/15/14
to James Graham, blink-dev, Philip Jägenstedt
Yeah, that's roughly my thinking for Blink as well.

-- Dirk

Reply all
Reply to author
Forward
0 new messages