The effect of timeouts on run-webkit-tests

12 views
Skip to first unread message

Aleks Totic

unread,
Oct 6, 2017, 3:47:23 PM10/6/17
to blink-infra
Sometimes I'd like to run a full layoutng test suite several times a day:

python third_party/WebKit/Tools/Scripts/run-webkit-tests --target=Optimized --additional-driver-flag=--enable-blink-features=LayoutNG

It takes 20 minutes, which is annoying, especially because it feels it should be faster. Those first 10K tests complete so fast...

startup: 25s
10K tests: 1:10
20K tests: 2:20
30K tests: 3:50
40K tests: 4:50
50K tests: 7:00
50800 tests: 18:20
50830 tests: 19:43
40 unexpected failures, retries.
Total time:
real 20m11s
user 32m12s

The end of the test run is dominated by few directories with lots of lengthy tests, many of which time out. So I hacked up the test runner to skip timeout tests, and this brought the total time down to 7 and a half minutes.

startup 25
10K: 59s
20K: 1:55
30K: 2:45
40K: 3:25
49700: 5:30
49910: 7:02
75 unexpected failures
real: 7m29s
user: 16m34s

Can we add a --skip-timeout flag to run-webkit-tests? I can contribute the patch that skips the tests, but I am not sure how to pass in the flag.

Part 2: running fully parallel

Running fully parallel should be faster. I was hoping for <5min, which would be awesome. It is not because we get many more unexpected failures:

Skip timeout tests, fully parallel
startup 25
10K: 2:11
20K: 4:18
30K: 6:29
40K: 8:35
49910: 10:55

182 unexpected failures
real: 12m5s
user: 39m2s

Looking at the failures, many of them are caused by content_shell having the wrong window size. My guess is that there is a test that resizes the window, and subsequent tests running in the same shell are stuck in wrong window size. 

Is there a way for run-webkit-tests to reset the window size to 800x600 before each test?

Gimme these 2, and everyone will want to run full test suite locally.

Aleks

Daniel Cheng

unread,
Oct 6, 2017, 3:54:24 PM10/6/17
to Aleks Totic, blink-infra

Gimme these 2, and everyone will want to run full test suite locally.

Aleks

--
You received this message because you are subscribed to the Google Groups "blink-infra" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-infra...@chromium.org.
To post to this group, send email to blink...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-infra/CAMdyzDscmNLHjAvTceeFHnuVr98puC4nOn_SZrELA_0bgC%2BJvQ%40mail.gmail.com.

Aleks Totic

unread,
Oct 6, 2017, 8:21:17 PM10/6/17
to Daniel Cheng, blink-infra
I've confirmed that this is the problem.

python third_party/WebKit/Tools/Scripts/run-webkit-tests --target=Optimized svg external/wpt -f

mixes svg tests that run in 640, with regular tests that run in 800, and I get 100s of unexpected failures.

I've looked at the code, and as usual, it is non-obvious to me how does Web window eventually gets its new size when  

   main_window_->web_contents()
        ->GetRenderViewHost()
        ->GetWidget()
        ->WasResized();
gets called. I'll put this on my "nice to have when I am bored" list.
Aleks



To unsubscribe from this group and stop receiving emails from it, send an email to blink-infra+unsubscribe@chromium.org.

Dirk Pranke

unread,
Oct 6, 2017, 8:28:50 PM10/6/17
to Daniel Cheng, Aleks Totic, blink-infra
On Fri, Oct 6, 2017 at 12:54 PM, Daniel Cheng <dch...@chromium.org> wrote:
We used to have flags to skip different kinds of failures in the past. Looking at --help now, I don't see them, so I'm guessing I removed them at some point. It wouldn't be hard to re-add them. There is currently a flag (--skipped) to skip tests marked [ Skip ] that you could look at for inspiration, or you can ask me how to proceed.
 

Part 2: running fully parallel

Running fully parallel should be faster. I was hoping for <5min, which would be awesome. It is not because we get many more unexpected failures:

Skip timeout tests, fully parallel
startup 25
10K: 2:11
20K: 4:18
30K: 6:29
40K: 8:35
49910: 10:55

182 unexpected failures
real: 12m5s
user: 39m2s

Between the additional failures, and the way we spin up content_shells up and down as we change the command line arguments for virtual test suites, I would expect --fully-parallel to be slower. You are correct that if we didn't have bugs (and we were somewhat smarter about handling the command line args), it should be faster.
 
Looking at the failures, many of them are caused by content_shell having the wrong window size. My guess is that there is a test that resizes the window, and subsequent tests running in the same shell are stuck in wrong window size. 

Is there a way for run-webkit-tests to reset the window size to 800x600 before each test?


But perhaps there are some bugs here...

It's definitely supposed to, and if that's not working, no question that's going to cause lots of problems.

-- Dirk

Aleks Totic

unread,
Oct 9, 2017, 3:30:21 AM10/9/17
to Dirk Pranke, Daniel Cheng, blink-infra
We used to have flags to skip different kinds of failures in the past. Looking at --help now, I don't see them, so I'm guessing I removed them at some point. wouldn't be hard to re-add them. There is currently a flag (--skipped) to skip tests marked [ Skip ] that you could look at for inspiration, or you can ask me how to proceed.

I can figure it out. I was going to just add --skip-timeout flag. Is that a good flag name?
 
Between the additional failures, and the way we spin up content_shells up and down as we change the command line arguments for virtual test suites, I would expect --fully-parallel to be slower. You are correct that if we didn't have bugs (and we were somewhat smarter about handling the command line args), it should be faster.

Virtual test suites are handled correctly, fully parallel flag is ignored for virtual. I expect a 2min speedup on full test suite if tests were not failing.
 
Is there a way for run-webkit-tests to reset the window size to 800x600 before each test?


But perhaps there are some bugs here...

It's definitely supposed to, and if that's not working, no question that's going to cause lots of problems.

It does not cause lots of problems, it is an occasional timing bug (affects 100/3000 on my machine). It really only shows up in force when when running SVG-1.1 tests fully parallel, because these tests run in a different size shell from all others.

I've confirmed that this is definitely a problem. I've tried fixing it, and failed, not enough knowledge of host/renderer interactions. Filed a bug, with results of my investigation at crbug.com/772811

Aleks

Dirk Pranke

unread,
Oct 9, 2017, 3:18:03 PM10/9/17
to Aleks Totic, Daniel Cheng, blink-infra
On Mon, Oct 9, 2017 at 12:29 AM, Aleks Totic <ato...@google.com> wrote:
We used to have flags to skip different kinds of failures in the past. Looking at --help now, I don't see them, so I'm guessing I removed them at some point. wouldn't be hard to re-add them. There is currently a flag (--skipped) to skip tests marked [ Skip ] that you could look at for inspiration, or you can ask me how to proceed.

I can figure it out. I was going to just add --skip-timeout flag. Is that a good flag name?

I'd probably use --skip-timeouts, but otherwise that's fine.

-- Dirk

Aleks Totic

unread,
Oct 10, 2017, 7:06:48 PM10/10/17
to Dirk Pranke, Daniel Cheng, blink-infra
--skip-timeouts patch has been sent to review. With this patch, the almost entire test suite runs in 8m5s on my machine.

- I've also sent a CL to make svg tests run in 800x600.

I was hoping that eliminating failures due to screen size change would would make --fully-parallel run fast. 
It did not, too flaky.

The problem with SVG and resizes was that content_shell was not getting fully reset 
between two tests. Notifications triggered by old test (resize) would be received by 
new test, which would cause tests to fail.

I think this might happen frequently with some other test suites too. 
For example, look at the flaky tests at:


Many of the http/tests/devtools consistently fail on the first attempt, and succeed on retry. 
It is just something to be aware of. 

Without flaky tests, full run might be down to 7mins, which would be even more 
awesome, but for me not worth the effort to track it down.

Aleks
Reply all
Reply to author
Forward
0 new messages