content shell and layout tests

Jochen Eisinger

unread,

Apr 22, 2013, 5:57:20 PM4/22/13

to blink-dev

Hi,

we're getting closer to switching layout test execution from DRT to content shell on Linux.

Here's how I plan to proceed:

1. all_webkit will start to build content_shell instead of DRT on Linux

2. run-webkit-tests will execute content shell instead of DRT on Linux

3. I'll add the ContentShellTestExpectations to the regular TestExpectations file

This means that all bots that run webkit tests on Linux (incl. trybots) will also start to use content shell instead of DRT.

1-3 can happen in one CL, so the switch should be atomic, Ideally, you shouldn't notice much of a difference, and if something goes wrong, we can easily revert.

If you have questions or concerns, please let me know!

best

-jochen

James Robinson

unread,

Apr 22, 2013, 5:59:33 PM4/22/13

to Jochen Eisinger, blink-dev

On Mon, Apr 22, 2013 at 2:57 PM, Jochen Eisinger <joc...@chromium.org> wrote:

Hi,

we're getting closer to switching layout test execution from DRT to content shell on Linux.

Here's how I plan to proceed:

1. all_webkit will start to build content_shell instead of DRT on Linux
2. run-webkit-tests will execute content shell instead of DRT on Linux

What's blocking turning this on for other platforms? I think having the layout tests behave in a different way on Linux vs other platforms is going to be really confusing. It'll introduce another axis of divergence since both the platform and harness will be different on linux, so it'll be hard to tell when something is broken which is responsible.

- James

Jochen Eisinger

unread,

Apr 22, 2013, 6:02:43 PM4/22/13

to James Robinson, blink-dev

On Mon, Apr 22, 2013 at 11:59 PM, James Robinson <jam...@chromium.org> wrote:

On Mon, Apr 22, 2013 at 2:57 PM, Jochen Eisinger <joc...@chromium.org> wrote:

Hi,

we're getting closer to switching layout test execution from DRT to content shell on Linux.

Here's how I plan to proceed:

1. all_webkit will start to build content_shell instead of DRT on Linux
2. run-webkit-tests will execute content shell instead of DRT on Linux

What's blocking turning this on for other platforms? I think having the layout tests behave in a different way on Linux vs other platforms is going to be really confusing. It'll introduce another axis of divergence since both the platform and harness will be different on linux, so it'll be hard to tell when something is broken which is responsible.

On windows, tests that use drag & drop time out (about 200), on Mac, we still see too many crashes (about 40).

It's hard to tell how long it'll take to fix those failures. We could just skip the tests and fix them later, but it's a large number of tests..

best

-jochen

Kenneth Rohde Christiansen

unread,

Apr 22, 2013, 6:04:04 PM4/22/13

to Jochen Eisinger, blink-dev

This is really good news!

Kenneth

--
Kenneth Rohde Christiansen
Senior Engineer, WebKit, Qt, EFL
Phone +45 4294 9458 / E-mail kenneth at webkit.org

﹆﹆﹆

Adam Barth

unread,

Apr 22, 2013, 6:05:24 PM4/22/13

to Jochen Eisinger, James Robinson, blink-dev

On Mon, Apr 22, 2013 at 3:02 PM, Jochen Eisinger <joc...@chromium.org> wrote:

On Mon, Apr 22, 2013 at 11:59 PM, James Robinson <jam...@chromium.org> wrote:

On Mon, Apr 22, 2013 at 2:57 PM, Jochen Eisinger <joc...@chromium.org> wrote:

Hi,

we're getting closer to switching layout test execution from DRT to content shell on Linux.

Here's how I plan to proceed:

1. all_webkit will start to build content_shell instead of DRT on Linux
2. run-webkit-tests will execute content shell instead of DRT on Linux

What's blocking turning this on for other platforms? I think having the layout tests behave in a different way on Linux vs other platforms is going to be really confusing. It'll introduce another axis of divergence since both the platform and harness will be different on linux, so it'll be hard to tell when something is broken which is responsible.

On windows, tests that use drag & drop time out (about 200), on Mac, we still see too many crashes (about 40).

It's hard to tell how long it'll take to fix those failures. We could just skip the tests and fix them later, but it's a large number of tests..

That doesn't seem like too many tests. I'd be in favor of marking them as failing and working through them after switching the bots over.

(I think the plan of switching Linux first is a good one. We should just follow it up quickly with the other platforms.)

Adam

Dirk Pranke

unread,

Apr 22, 2013, 6:13:06 PM4/22/13

to Adam Barth, Jochen Eisinger, James Robinson, blink-dev

On Mon, Apr 22, 2013 at 3:05 PM, Adam Barth <aba...@chromium.org> wrote:

On Mon, Apr 22, 2013 at 3:02 PM, Jochen Eisinger <joc...@chromium.org> wrote:

On Mon, Apr 22, 2013 at 11:59 PM, James Robinson <jam...@chromium.org> wrote:

On Mon, Apr 22, 2013 at 2:57 PM, Jochen Eisinger <joc...@chromium.org> wrote:

Hi,

we're getting closer to switching layout test execution from DRT to content shell on Linux.

Here's how I plan to proceed:

1. all_webkit will start to build content_shell instead of DRT on Linux
2. run-webkit-tests will execute content shell instead of DRT on Linux

What's blocking turning this on for other platforms? I think having the layout tests behave in a different way on Linux vs other platforms is going to be really confusing. It'll introduce another axis of divergence since both the platform and harness will be different on linux, so it'll be hard to tell when something is broken which is responsible.

On windows, tests that use drag & drop time out (about 200), on Mac, we still see too many crashes (about 40).

It's hard to tell how long it'll take to fix those failures. We could just skip the tests and fix them later, but it's a large number of tests..

That doesn't seem like too many tests. I'd be in favor of marking them as failing and working through them after switching the bots over.

(I think the plan of switching Linux first is a good one. We should just follow it up quickly with the other platforms.)

I would be inclined to switch all of the bots over as fast as possible, but I agree that perhaps not all in a single change, just to minimize the number of breakages that might happen at once. Ideally we'd turn things on on all platforms over the course of an afternoon or a day.

-- Dirk

Stephen Chenney

unread,

Apr 22, 2013, 6:32:45 PM4/22/13

to Dirk Pranke, Adam Barth, Jochen Eisinger, James Robinson, blink-dev

Is there anything special required to debug tests inside content_shell? What is the ninja target?

If the answer to either of these is non-obvious, could we update http://dev.chromium.org/developers/testing/webkit-layout-tests

Actually, that wiki page will need to be updated regradless.

Stephen.

--

Stephen Chenney | Software Engineer | sche...@google.com | 404-314-1809

Slavomir Kaslev

unread,

Apr 22, 2013, 7:03:24 PM4/22/13

to Stephen Chenney, Dirk Pranke, Adam Barth, Jochen Eisinger, James Robinson, blink-dev

The target is content_shell: ninja -C out/Debug content_shell

--

Slavomir Kaslev | Software Engineer | ska...@google.com | 562 217 8497

Jochen Eisinger

unread,

Apr 23, 2013, 1:06:06 AM4/23/13

to Slavomir Kaslev, Stephen Chenney, Dirk Pranke, Adam Barth, James Robinson, blink-dev

Yes, currently the target is content_shell. In the future, all_webkit will depend on content_shell.

To debug content_shell, you should follow the instructions on dev.chromium.org/developers for getting a renderer into your debugger on your platform, e.g. on linux from https://code.google.com/p/chromium/wiki/LinuxDebugging:

out/Debug/content_shell --dump-render-tree --no-timeout --no-sandbox --renderer-cmd-prefix='xterm -title renderer -e gdb --args'

best

-jochen

Daniel Cheng

unread,

May 28, 2013, 5:44:32 PM5/28/13

to Jochen Eisinger, Slavomir Kaslev, Stephen Chenney, Dirk Pranke, Adam Barth, James Robinson, blink-dev

Is there a --renderer-cmd-prefix that will work nicely if I'm working over SSH? In this case, I think I can fudge it with --single-process, but it'd be nice not to have to rely on that. IIRC, there was some flag to have the renderer pause at startup as well, but I can't find it atm (and I think that might require me to click a dialog button as well).

Also, the old DRT used to populate crash-log.txt with a stack trace. content_shell doesn't appear to do that--is there any way to get the old behavior back, since it could be useful at times.

Daniel

Jochen Eisinger

unread,

May 29, 2013, 5:20:00 AM5/29/13

to Daniel Cheng, Slavomir Kaslev, Stephen Chenney, Dirk Pranke, Adam Barth, James Robinson, blink-dev

On Tue, May 28, 2013 at 11:44 PM, Daniel Cheng <dch...@chromium.org> wrote:

Is there a --renderer-cmd-prefix that will work nicely if I'm working over SSH? In this case, I think I can fudge it with --single-process, but it'd be nice not to have to rely on that. IIRC, there was some flag to have the renderer pause at startup as well, but I can't find it atm (and I think that might require me to click a dialog button as well).

You can use --wait-for-debugger. That'll print out the PID and gives you one minute to attach to the process.

Also, the old DRT used to populate crash-log.txt with a stack trace. content_shell doesn't appear to do that--is there any way to get the old behavior back, since it could be useful at times.

On Linux, you need to disable the sandbox to get symbolized stack traces, e.g. run-webkit-tests --additional-drt-flag=--no-sandbox

On Mac, the sandbox allows for symbolizing stack traces, so this should just work.

There are a few places were you don't get a backtrace, e.g. when the browser decides to kill a renderer, we report this as renderer crash, but there's no stack trace.

hth

-jochen

Adam Barth

unread,

May 29, 2013, 1:32:01 PM5/29/13

to Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, Dirk Pranke, James Robinson, blink-dev

On Wed, May 29, 2013 at 2:20 AM, Jochen Eisinger <joc...@chromium.org> wrote:

On Tue, May 28, 2013 at 11:44 PM, Daniel Cheng <dch...@chromium.org> wrote:

Is there a --renderer-cmd-prefix that will work nicely if I'm working over SSH? In this case, I think I can fudge it with --single-process, but it'd be nice not to have to rely on that. IIRC, there was some flag to have the renderer pause at startup as well, but I can't find it atm (and I think that might require me to click a dialog button as well).

You can use --wait-for-debugger. That'll print out the PID and gives you one minute to attach to the process.

Also, the old DRT used to populate crash-log.txt with a stack trace. content_shell doesn't appear to do that--is there any way to get the old behavior back, since it could be useful at times.

On Linux, you need to disable the sandbox to get symbolized stack traces, e.g. run-webkit-tests --additional-drt-flag=--no-sandbox

Should we add a --no-sandbox flag to run-webkit-tests to make this easier? Also, can we print out instructions when we generate a unsymbolized stack trace? I didn't know this trick, and it would have saved me a bunch of time.

Adam

Dirk Pranke

unread,

May 29, 2013, 1:51:59 PM5/29/13

to Adam Barth, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, James Robinson, blink-dev

On Wed, May 29, 2013 at 10:32 AM, Adam Barth <aba...@chromium.org> wrote:

On Wed, May 29, 2013 at 2:20 AM, Jochen Eisinger <joc...@chromium.org> wrote:

On Tue, May 28, 2013 at 11:44 PM, Daniel Cheng <dch...@chromium.org> wrote:

Is there a --renderer-cmd-prefix that will work nicely if I'm working over SSH? In this case, I think I can fudge it with --single-process, but it'd be nice not to have to rely on that. IIRC, there was some flag to have the renderer pause at startup as well, but I can't find it atm (and I think that might require me to click a dialog button as well).

You can use --wait-for-debugger. That'll print out the PID and gives you one minute to attach to the process.

Also, the old DRT used to populate crash-log.txt with a stack trace. content_shell doesn't appear to do that--is there any way to get the old behavior back, since it could be useful at times.

On Linux, you need to disable the sandbox to get symbolized stack traces, e.g. run-webkit-tests --additional-drt-flag=--no-sandbox

Should we add a --no-sandbox flag to run-webkit-tests to make this easier? Also, can we print out instructions when we generate a unsymbolized stack trace? I didn't know this trick, and it would have saved me a bunch of time.

You can always pass arguments through run-webkit-tests using --additional-drt-flag, but it might also make sense to add a dedicated flag for this.

I wonder if it also might make sense to either always pass --no-sandbox, or to retry crashed tests and pass this flag ...

-- Dirk

Adam Barth

unread,

May 29, 2013, 1:59:45 PM5/29/13

to Dirk Pranke, Adam Barth, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, James Robinson, blink-dev

Getting stack traces from the bots is very helpful when trying to understand what is causing the crash. You're right that we'll want to do something that works automatically.

Adam

Eric Seidel

unread,

May 29, 2013, 3:17:39 PM5/29/13

to Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, James Robinson, blink-dev

I think the "sandbox" configuration is sufficiently tested by other
layers, and we should just run with --no-sandbox (at least for
platforms where sandboxing gives us stacktrace trouble). If some day
we'd like to run the layout tests with the sandbox, that can be a
separate migration effort. :)

James Robinson

unread,

May 29, 2013, 3:22:21 PM5/29/13

to Eric Seidel, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev

I don't agree. Before we had content_shell we've had bugs before where code was added that violated the sandboxes assumptions and was only caught when run as part of a full Chrome build. These sorts of bugs were very nasty since the layout tests ran without issue. The problem only showed up on the WebKit roll and cost a lot of time to figure out. A big advantage of using content_shell is running with the same configuration, including sandbox restrictions, that we use in the actual product. We should take advantage of that as much as possible.

- James

Eric Seidel

unread,

May 29, 2013, 3:26:02 PM5/29/13

to James Robinson, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev

I'm more stating that these issues can be made separable. We can have
our content_shell, make sure it's not regressing developer
productivity (flakiness, stacktraces, speed, etc.), and then turn on
all its bells and whistles over time.

Dirk Pranke

unread,

May 29, 2013, 3:26:52 PM5/29/13

to Eric Seidel, James Robinson, Adam Barth, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev

Well, the regressing speed ship sailed already :).

-- Dirk

Jochen Eisinger

unread,

May 30, 2013, 3:39:06 AM5/30/13

to Dirk Pranke, Eric Seidel, James Robinson, Adam Barth, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev

Maybe we can punch a few holes into the sandbox for development that would allow the renderer to symbolize its own stack traces?

I think windows correctly symbolizes its stack traces, but they're printed to the renderer's stderr which is invisible :(

-jochen

John Abd-El-Malek

unread,

May 30, 2013, 10:55:31 AM5/30/13

to James Robinson, Eric Seidel, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev

+1

The benefits of running tests in as close of an environment that our users run are hard to overstate. We should figure out ways of getting the stack traces without disabling the sandbox.

Justin Schuh

unread,

May 30, 2013, 11:57:44 AM5/30/13

to John Abd-El-Malek, Julien Tinnes, jor...@chromium.org, James Robinson, Eric Seidel, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev

+[jln, jorgelo]

It may be feasible to do so and put it behind a command line switch.

Christian Biesinger

unread,

May 30, 2013, 12:16:12 PM5/30/13

to Justin Schuh, John Abd-El-Malek, Julien Tinnes, jor...@chromium.org, James Robinson, Eric Seidel, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev

Can't the testrunner pipe the stacks through addr2line or something?
Or will that fail due to unknown base addresses?

-christian

Julien Tinnes

unread,

May 30, 2013, 12:59:55 PM5/30/13

to Christian Biesinger, Justin Schuh, John Abd-El-Malek, Jorge Lucangeli Obes, James Robinson, Eric Seidel, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev, Alexander Potapenko, k...@chromium.org

I would rather not disable the sandbox for these tests. Or if we
really need to, there should be one bot with and one bot without
(which would become annoying for flakiness. (But a ~24 hours delay
before they get tested inside the sandbox would be bad).

Adding Alexander and Kostya, because I think ASAN is working on a similar issue.

- We could do a step similar to asan_symbolize.py (addr2line etc..)
- More work: figure out how to get the symbols accessible
- Lots more work: breakpad in chromium and use breakpad to get the stack trace.

Eric Seidel

unread,

May 30, 2013, 1:07:54 PM5/30/13

to Julien Tinnes, Christian Biesinger, Justin Schuh, John Abd-El-Malek, Jorge Lucangeli Obes, James Robinson, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev, Alexander Potapenko, k...@chromium.org

We have to walk before we can run. :)

Getting ContentShell stable and fast, and keeping it default for a
while is walking. It already offers us huge testing wins w/o all its
bells and whistles.

If the sandbox (or any other new CS feature which was not the test
harness we used for the last 5 years) is blocking development, we
should turn it off. And then work to enable that awesome new testing
feature separately.

I would like sandbox'd testing too, but we somehow shipped a stable
product for the last 5 years w/o it and enabling it here seems like a
separate issue from fixing development regressions from the CS
transition. Being able to quickly debug crashes from the bots (and
even my local machine) is much more important to me to keep until a
stacktrace-enabled sandbox option is ready.

Julien Tinnes

unread,

May 30, 2013, 1:17:40 PM5/30/13

to Eric Seidel, Christian Biesinger, Justin Schuh, John Abd-El-Malek, Jorge Lucangeli Obes, James Robinson, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev, Alexander Potapenko, k...@chromium.org

Also to make things clear: you should be getting a symbolized stack
trace, just not the line numbers at the moment.

Alexis Menard

unread,

May 30, 2013, 7:31:03 PM5/30/13

to Eric Seidel, blink-dev, Jorge Lucangeli Obes, Dirk Pranke, Julien Tinnes, Slavomir Kaslev, Adam Barth, Stephen Chenney, Alexander Potapenko, Daniel Cheng, John Abd-El-Malek, k...@chromium.org, James Robinson, Justin Schuh, Jochen Eisinger, Christian Biesinger

Le 30 mai 2013 14:07, "Eric Seidel" <ese...@chromium.org> a écrit :
>
> We have to walk before we can run. :)
>
> Getting ContentShell stable and fast, and keeping it default for a
> while is walking. It already offers us huge testing wins w/o all its
> bells and whistles.
>
> If the sandbox (or any other new CS feature which was not the test
> harness we used for the last 5 years) is blocking development, we
> should turn it off. And then work to enable that awesome new testing
> feature separately.
>
> I would like sandbox'd testing too, but we somehow shipped a stable
> product for the last 5 years w/o it and enabling it here seems like a
> separate issue from fixing development regressions from the CS
> transition. Being able to quickly debug crashes from the bots (and
> even my local machine) is much more important to me to keep until a
> stacktrace-enabled sandbox option is ready.

Huge +1.

Let's bring CS to a stable and fast level before we can think of adding great and new features. Having easy to get backtraces while making CS a good replacement of DRT is very useful.

Alexander Potapenko

unread,

May 31, 2013, 9:22:19 AM5/31/13

to Julien Tinnes, Christian Biesinger, Justin Schuh, John Abd-El-Malek, Jorge Lucangeli Obes, James Robinson, Eric Seidel, Adam Barth, Dirk Pranke, Jochen Eisinger, Daniel Cheng, Slavomir Kaslev, Stephen Chenney, blink-dev, Kostya Serebryany

If I'm understanding the task correctly and it's about printing the
symbols for the crash stacks with the sandboxing turned on, this
sounds doable.
It's fairly easy to write a script that converts the <module name,
offset> pairs into symbol names, line numbers etc. (asan_symbolize.py
can be adapted to handle custom stack trace regexps)
The only problem is to obtain the list of the module names once the
sandbox has been turned on and the process doesn't have access to
/proc/self/maps
In ASan we provide an API function that lets the user process to
notify our tool that it's about to turn on the sandbox. In this case
ASan reads and caches the contents of /proc/self/exe and
/proc/self/maps, which allows it to get the module names after the
sandbox is on.

--
Alexander Potapenko
Software Engineer
Google Moscow

TAMURA, Kent

unread,

Jun 3, 2013, 2:25:17 AM6/3/13

to Jochen Eisinger, Dirk Pranke, blink-dev

I'd like to add --no-sandbox by default until content shell gets stable.

We observed random crashes in some layout bots. But it's very hard to investigate them because of no stack traces and the randomness.

Jochen Eisinger

unread,

Jun 3, 2013, 3:21:56 AM6/3/13

to TAMURA, Kent, Dirk Pranke, blink-dev

On Mon, Jun 3, 2013 at 8:25 AM, TAMURA, Kent <tk...@chromium.org> wrote:

I'd like to add --no-sandbox by default until content shell gets stable.

We observed random crashes in some layout bots. But it's very hard to investigate them because of no stack traces and the randomness.

Note that --no-sandbox won't get you stack traces, it will just symbolize existing ones, i.e. it'll only make a difference for linux renderer crashes. Most random crashes seem to happen however on windows and mac :-/

-jochen

Alan Cutter

unread,

Jun 3, 2013, 3:43:48 AM6/3/13

to Jochen Eisinger, TAMURA, Kent, Dirk Pranke, blink-dev

A quick sampling of the buildbots tells me the random crashes occur on Linux and Mac about 60% of the time while Windows is closer to 90% of the time.

On Windows these random crashes keep happening around the 3-5 minute mark while the total testing time is between 13 and 16 minutes. I wonder if the time that the crash occurs would stay in this range if the order of the tests are randomised. I believe DumpRenderTree had this capability.

Jochen Eisinger

unread,

Jun 3, 2013, 11:35:54 AM6/3/13

to Alan Cutter, TAMURA, Kent, Dirk Pranke, blink-dev

I debugged several of the crashes today, and a common pattern of the ones I happened to pick was that they came after the fast/filesystem/workers tests. It looks like those are corrupting the AtomicString table.

I've skipped those tests for now, hopefully this decreases the number of random crashes we see.

Another odd thing is that even without the sandbox, I wouldn't get a backtrace for the crashes - the process just died :-/

-jochen

Adam Barth

unread,

Jun 3, 2013, 1:08:38 PM6/3/13

to Jochen Eisinger, Alan Cutter, TAMURA, Kent, Dirk Pranke, blink-dev

Sounds like a thread safety issue in that code. I wonder if the code in content is passing WebStrings across threads.... Is there a bug where I can follow up?

Adam

Jochen Eisinger

unread,

Jun 3, 2013, 1:16:39 PM6/3/13

to Adam Barth, blink-dev, Kent TAMURA, Alan Cutter, Dirk Pranke

It's here: http://code.google.com/p/chromium/issues/detail?id=246193

Reply all

Reply to author

Forward