WebKit Leak bot guidance?

20 views
Skip to first unread message

Alice Boxhall

unread,
May 25, 2017, 9:22:24 PM5/25/17
to blink-dev
I ran into an issue yesterday with a patch getting reverted by a sheriff because it triggered a failure on https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Linux%20Trusty%20Leak

As I tried to dig into the issue, I found it difficult to
1) figure out how to reproduce the issue locally (eventually a colleague helped me figure out which arguments from the bot's gn args and webkit tests command line were significant)
2) debug the actual issue (in the end I concluded that it was a pre-existing issue as I could reproduce it without my changes, but if the issue had indeed been in my code I would have been at a loss).

Is there some guidance or a known contact person/team that I missed for devs who find themselves in this situation?

Daniel Cheng

unread,
May 25, 2017, 9:29:55 PM5/25/17
to Alice Boxhall, blink-dev
+1, I find myself wishing that the the GN config used by the bots, as well as the targets it builds and the test commands were in more discoverable places. It's true that you can find them if you dig through the build steps, but you have to click around for a bit to find it. Usually I just try copying the commands literally, ignoring the obviously bot things like the custom goma dir.

It's especially bad for tests that most developers don't run very often, like the telemetry "unittests".

Daniel

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAMQHGLy2jBq%2Beue2gbXrd2hAnWRDy874UuMwPKLciwCJ3yRe9g%40mail.gmail.com.

Jeremy Roman

unread,
May 25, 2017, 9:56:25 PM5/25/17
to Daniel Cheng, Alice Boxhall, blink-dev
On Thu, May 25, 2017 at 9:29 PM, Daniel Cheng <dch...@chromium.org> wrote:
+1, I find myself wishing that the the GN config used by the bots, as well as the targets it builds and the test commands were in more discoverable places. It's true that you can find them if you dig through the build steps, but you have to click around for a bit to find it. Usually I just try copying the commands literally, ignoring the obviously bot things like the custom goma dir.

This exists!

tools/mb/mb.py gen -m [master] -b [bot] out/foo

It takes extra flags to override the goma dir, etc. I think
 
It's especially bad for tests that most developers don't run very often, like the telemetry "unittests".

Daniel

On Thu, May 25, 2017 at 6:22 PM Alice Boxhall <abox...@chromium.org> wrote:
I ran into an issue yesterday with a patch getting reverted by a sheriff because it triggered a failure on https://build.chromium.org/p/chromium.webkit/builders/WebKit%20Linux%20Trusty%20Leak

As I tried to dig into the issue, I found it difficult to
1) figure out how to reproduce the issue locally (eventually a colleague helped me figure out which arguments from the bot's gn args and webkit tests command line were significant)
2) debug the actual issue (in the end I concluded that it was a pre-existing issue as I could reproduce it without my changes, but if the issue had indeed been in my code I would have been at a loss).

Is there some guidance or a known contact person/team that I missed for devs who find themselves in this situation?

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAMQHGLy2jBq%2Beue2gbXrd2hAnWRDy874UuMwPKLciwCJ3yRe9g%40mail.gmail.com.

--
You received this message because you are subscribed to the Google Groups "blink-dev" group.

Kouhei Ueno

unread,
May 25, 2017, 10:08:58 PM5/25/17
to Daniel Cheng, Alice Boxhall, blink-dev
There are two doc related to this:

1) reproing steps
Please run content_shell -- --run-layout-test --enable-leak-detection test.html, or run_webkit_tests --enable-leak-detection to repro locally.
IIRC, special GN flags should not be needed.

2) debugging steps
Debugging a leak is hard. I have a done a lot myself (I authored initial prototype of the leak detector), but still there isn't a silver bullet. I spent more than a day on a recent leak I introduced.
This is now especially hard with Oilpan Blink GC, which mitigates most of the leaks we had in the RefPtr days. This only leaves out super tricky cases where a V8 object graph is involved or bug inside Blink GC.

When debugging a leak, I personally try to minimize the case as much as possible.
- Minimize the test case layout test: Many leak can be minimized to be around 3 lines of Javascript code.
- Minimize the object graph. Simplify object graph just enough to run the repro case, then try commenting out object graph mutations in C++ code path touched in the minimized repro case.

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+unsubscribe@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAF3XrKqgUfcsTHwnbv6L%2BBy5nEhN37Magh5jm1O3grcpgTn%3D1w%40mail.gmail.com.



--
kouhei

Dirk Pranke

unread,
May 25, 2017, 10:32:18 PM5/25/17
to Kouhei Ueno, Daniel Cheng, Alice Boxhall, blink-dev
We in Ops-land are definitely open to suggestions for how to make these sorts of things easier to determine.

One idea I've just had based on this is to perhaps add a bullet to each test step that shows just the command line of the step.

Another idea that we've talked about -- but is harder -- is to have the failures in Sheriff-o-Matic link to some doc specific to that class of failure.

A third would be to make sure that all of our test harnesses used a consistent command line interface

out/Release/bin/run_webkit_layout_tests fast/html/the_test_that_failed.html

Anyone have other suggestions?

-- Dirk

Victor Costan

unread,
May 26, 2017, 2:39:10 AM5/26/17
to Dirk Pranke, Kouhei Ueno, Daniel Cheng, Alice Boxhall, blink-dev
Much like Alice, I also had a CL reverted a while ago, for the same reason. I dug around in logs and eventually figured out the magical incantation"content_shell -- --run-layout-test --enable-leak-detection." Unlike Alice, I complained to myself for a bit that there's no guidance on this process, and then I forgot all about it and moved on to the next thing. Thank you for spending the energy to get us talking about this issue!

Here's what I propose, based on what I thought about. I tried to aim for a maximally lightweight process that will eventually get us in a position to do interesting stuff.

1) Create a google group, e.g. reverted-cls@chromium. (a bug label is a worthy contender, see "expected benefits" later on)

2) Whenever a CL gets reverted, for any reason, the person (or bot) reverting posts a message to reverted-cls@ including the CL link, the failing bot (if applicable), and the relevant output.

3) (Optional) while debugging, the author can post messages documenting things that helped and things that didn't. This can include commands and URLs for docs / other bots.

4) When the author figures out the issue and relands the CL, the author posts a message describing the "golden path" of what worked.

Here are the benefits I'm looking to get out of this:

1) When my CL gets reverted, I implicitly learn about the mailing list. I can do searches and learn from others' experience.
Note 1: The wisdom is probably spread out on blink-dev@ or other lists. A designated mailing list would pool all this together, and is lighter-weight than having people maintain docs with best practices.
Note 2: I'm sure asking would get me the information I need, but I'm too much of an introvert to ask blink-dev@ to solve all my problems :)

2) Ops can monitor the list for patterns and use the data to improve the trybots and CQ.

3) Once there's enough data collected on the list, we can start playing with building models for auto-suggestions. We'd set up a couple of templates for the mailing list messages, to help parse the messages.

While benefit 3) is the biggest reason for the mailing list, I claim that it'd create enough value to offset the cost of posting very soon after it is launched.

Thank you for reading. I look forward to your thoughts!
    Victor



Alice Boxhall

unread,
May 28, 2017, 8:54:26 PM5/28/17
to Victor Costan, Dirk Pranke, Kouhei Ueno, Daniel Cheng, blink-dev
I love Victor's suggestion. It definitely seems like we could have a mail created as part of the auto revert process, and possibly hinted at as something for a manual reverter to do in the event that an auto-revert fails.

To Dirk's ideas: a bullet point with the relevant command line would help SO MUCH. Links to relevant docs would also help SO MUCH if feasible. I don't quite follow the "consistent command line interface" part.

Kouhei, thanks for the tips! Are you able to point to any CLs you've made fixing leaks as examples of problems/solutions/regression tests, by any chance?
Reply all
Reply to author
Forward
0 new messages