Time outs on page loads

238 views
Skip to first unread message

Simon Stewart

unread,
Aug 4, 2011, 7:12:45 AM8/4/11
to selenium-developers
One of the problems I hear from users is that it's currently
impossible to control how long timeouts take when loading a page. I'm
going to suggest two changes, and I'd like your opinion before
throwing them into the code.

The first change is to strictly define what we mean by "loaded" when
calling "get". There are several different interpretations of this in
our code base right now, and I'd like to pick something that we can
implement consistently. The ideal state would be to wait until the
"onload" event has fired for the frame that we're currently
"switchTo()'d" or the default content if the frame is no more. Since
it's possible that we could miss this event, the alternative would be
to wait until the the document.readyState is "complete" or (if we
wanted to appear fast though more likely to cause tests to be flaky)
"interactive".

The second change is to add an API to allow users to control page timeouts:

driver.manage().timeouts().forPageLoad(long duration, TimeUnit in);
driver.manage().timeouts().forPageLoad(long duration, TimeUnit in,
boolean fatal);

The boolean in the second signature would be to indicate that page
load timeouts should be logged and shouldn't throw a TimeoutException
as it turns out that quite often people don't care whether webdriver
has detected a page load as they're using other signals (such as
implicit/explicit waits) to determine when a page is loaded.
Naturally, the defaults would be to model the existing behaviour.

Your thoughts?

Simon

Eran Mes

unread,
Aug 4, 2011, 8:43:05 AM8/4/11
to selenium-...@googlegroups.com
Both suggestions sound good - I think that an API for setting the page load timeout would be more handy, as it will allow developers who currently experience issues with page loads to immediately address (or at least limit) the problem.

Strictly defining the meaning of"loaded" will be very useful as a future reference - the current javadoc for get() only says that it blocks until the "load is complete". Occasionally we get users asking what that means or (mistakenly) assuming *everything* on the page is stable before get returns. There's additional gain in clarity if we'd use  document.readyState to explain page loads.

Eran


--
You received this message because you are subscribed to the Google Groups "Selenium Developers" group.
To post to this group, send email to selenium-...@googlegroups.com.
To unsubscribe from this group, send email to selenium-develo...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/selenium-developers?hl=en.


Patrick Lightbody

unread,
Aug 4, 2011, 9:01:42 AM8/4/11
to selenium-...@googlegroups.com
So now *I'm* confused: do we wait until onload or do we wait until something else (like onDOMReady)? :) 

--
Patrick Lightbody
Schedule a meeting with me at https://tungle.me/lightbody





Simon Stewart

unread,
Aug 4, 2011, 9:26:48 AM8/4/11
to selenium-...@googlegroups.com
At the moment "it depends". The firefox driver hooks into the
internals of firefox and waits until the XPCOM component responsible
for tracking active downloads reports that nothing is going on. Chrome
uses the same mechanism as the throbber to figure out when a load is
complete. IE certainly used to iterate through every available
document and checked the readyState, but now uses listeners to be more
robust. I have no idea how HtmlUnit does it, but it seems to work. The
mobile browsers are a bun-fight as well. I'd be interested to know how
Opera decides that it's complete too.

As you can see, there's no standard, and as Eran points out, the
wording on "get" is particularly vague (deliberately, I hasten to add,
cos I had no idea how to measure these things when I started
webdriver)

Simon

Patrick Lightbody

unread,
Aug 4, 2011, 10:11:06 AM8/4/11
to selenium-...@googlegroups.com
Interesting… by the way, this conversation smells of the same type as "can I inspect HTTP traffic"… it's inching towards performance :)

--
Patrick Lightbody
+1 (415) 830-5488
Schedule a meeting with me at https://tungle.me/lightbody

Luke Inman-Semerau

unread,
Aug 4, 2011, 10:57:38 AM8/4/11
to selenium-...@googlegroups.com
+1 for driver.manage().timeouts().forPageLoad

I like having control ;)

-0.5 for this since usually this is just masking another issue 
+0.5 there are edge cases where there seems to be an issue with the readystate, like having an iframe open for a long poll

-Luke

Simon Stewart

unread,
Aug 4, 2011, 11:06:40 AM8/4/11
to selenium-...@googlegroups.com
Which would be an argument for waiting for "interactive" rather than "complete".

Simon

Stuart Knightley

unread,
Aug 4, 2011, 11:45:27 AM8/4/11
to selenium-...@googlegroups.com
OperaDriver currently says the page is ready after all external
resources are loaded[1]. It has a timeout of 30 seconds. For some other
methods we check that readyState is "complete" (.back(), .click() and
others) (I'm not actually sure why we use the two different methods,
I'll have to check)

We also implemented something called OperaIdle. For Opera's internal
testing we rely a lot on comparing screenshots. This means we want the
page to have stopped "doing things" before we take the screenshot. We
defined these things as:

* plugin activity (Flash etc.)
* running Ecmascript (JS)
* pending reflows and paints
* any animated images (i.e. gifs)
* pending meta-refreshes
* svg animations

once all these conditions have been done the page can be screenshotted.
However, as you've probably already guessed, a page is often ready
*before* all of these have been satisfied, for us as (e.g) when testing
SVG animations, but more importantly on the web when pretty much every
major site is running some JS constantly, to animate banners or doing
XHR. While it originally seemed as a great way to solve the "page is
ready" problem, it actually takes you further away than the simple
solution.

In response to the managing timeouts proposal, I am in favour. I
remember a couple of times the New York Times site has exceeded our 30s
timeout (it seems to be better now), and I can imagine some other sites
do too. I'm also in favour of being able to specify whether hitting the
timeout is fatal, like Simon says (heh), people often only care if an
element exists, rather than the loading state of the page.

[1]
https://github.com/operasoftware/operadriver/blob/master/protos/window_manager-2.0.proto#L109

Luke Inman-Semerau

unread,
Aug 4, 2011, 11:45:51 AM8/4/11
to selenium-...@googlegroups.com
Yep... but i like the control freak in me likes the other one better :) But yeah, I can only think of convincing arguments for waiting till interactive.  (Like.. If someone sets the timeout really small, possibly no response has come back at all and then possible wonky things will happen. And wait for interactive re-enforces the idea of using the implicit/explicit waits, which most are now accustom to).

Ken Kania

unread,
Aug 4, 2011, 2:17:54 PM8/4/11
to Simon Stewart, selenium-developers
Sounds good, just a couple questions:

1) Should we always wait for HTTP redirects to finish?  The user may set the timeout to 0 not caring about the DOM being loaded, but he may care that he's on the page he intended.

2) The timeout is applied to indirect navigations (like clicking an anchor) too, right?  I suppose its fine to start timing as soon as we notice the browser is navigating (we don't have to wait for the old page to be unloaded or anything).

3) Why do users want to set a specific timeout for page load? Is it because some sites/ads are slow? Or because they just don't care about the page contents? Would it better to have a separate solution for each problem?  A generic page load timeout will lead to flakiness when normal servers take a long time to return.  Instead of a timeout, would the ability to specify what to wait for be better? Wait for all frames to load, main frame to load, document load start, no wait?

Ken

Simon Stewart

unread,
Aug 4, 2011, 4:50:00 PM8/4/11
to Ken Kania, selenium-developers
Hi Ken,

In answer to your questions:

1) The firefox driver (and maybe IE too) will wait for "low threshold"
meta redirects to fire if they notice them. A "low threshold" is
generally considered to be 0 or 1 second. The reason for this is that
meta redirects seem to fall into two camps: the "I'd love to do a
server-side redirect, but can't" and the "interstitial page that
reloads periodically as a long running operation completes" The former
tend to have very low meta redirect timeouts, the latter longer.

2) It seems reasonable to apply this time out to any page load, no
matter whether it's triggered by a "get" or a "click". There is the
nasty case of detecting whether a click has actually caused a page
reload when using native events, but that's a different problem to
solve.

3) Generally the problem of page loads falls into two camps: those
where a third party dependency of a page (ads, trackers) respond way
too slowly, causing a page that is otherwise ready to appear inactive
for too long, and those where the app under test is either meant to
have completed responses after a very short or long duration
(typically, people seem keen on extending timeouts) A timeout of some
sort seems to be the major user request.

Control over the mechanism used for determining a page load is
interesting, but I'm unsure of the best way of implementing it
cross-browser (my concern is for those browsers where we have to fall
back to Selenium Core's way of telling that a page is loaded, meaning
that we can't just hook in an event handler for "onload")

Users already have a wealth of ways to determine whether a page is
ready for interactions via the explicit and implicit waits that are
already present, and a simple "wait until the document is interactive
or ready" approach would allow other more sophisticated checks (eg:
"wait until all frames are loaded" becomes a simple case of calling
"switchTo().frame" sequentially) so my initial preference would be to
keep things as simple as can be while still keeping to the principle
of least surprise in the common case.

Simon

Simon Stewart

unread,
Aug 4, 2011, 4:52:54 PM8/4/11
to selenium-...@googlegroups.com
Figuring out when a page is loaded used to be so simple (has "onload"
fired?) but is increasingly hard to tell in a meaningful way. I've
been coming round more and more to thinking that the AUT needs to be
involved in the decision about whether a page is actually loaded. For
"throw testing over the wall" coding shops this idea is an anathema,
but I really can't think of a reliable way of doing so otherwise.

Simon

Mel Llaguno

unread,
Aug 4, 2011, 7:18:47 PM8/4/11
to selenium-...@googlegroups.com
It is not unreasonable to have the AUT be instrumented for testing, and I believe you are right in assuming that relying on the various interpretations of "done and ready for automation" between browsers leaves a lot to be desired. I would suggest that WebDriver provide a client side mechanism that can be queried for its state which the "wait until interactive" commands can use to block on. I did something similar a few years ago and it worked for the most part. 

What is really required is a Model-View-Presenter pattern on the client side where all ajax-aware controls registered themselves with the Presenter and the Presenter was responsible for determining whether the application was busy or not. Waits were wired to Selenium 1.0 via a delegate which would be called to evaluate Javascript regarding the busy state.

Not sure if this is helpful, but I though I'd share my experience.

M.

Ross Patterson

unread,
Aug 5, 2011, 8:30:26 AM8/5/11
to selenium-...@googlegroups.com
I thought the Project was officially now a browser automation framework, not a testing tool? If so, I don't think decisions should be made with the assumption that the web site(s) being automated are under the control of the people writing the automation scripts. In other words, don't think about how an Application Under Test can be involved, because the Application Under *Automation* might be GMail, http://www.uspto.gov, or http://bronies.wikia.com/wiki/Wiki_Home (shudder).

Ross

Patrick Lightbody

unread,
Aug 5, 2011, 10:45:52 AM8/5/11
to selenium-...@googlegroups.com
I think that's fine if we go this route, but if that's the case than get() and other "wait for page to load events" need to return sooner than later so that we can have additional code that waits for specific elements to appear or JS to evaluate a certain way. Waiting for onload may be too late because the specific element appeared a long time ago (in relative terms).

--
Patrick Lightbody
+1 (415) 830-5488
Schedule a meeting with me at https://tungle.me/lightbody

Ian

unread,
Aug 5, 2011, 11:07:17 AM8/5/11
to selenium-...@googlegroups.com
My vote, as vague as it may be, is for the least flaky option :P

I might be misunderstanding some of the browser interactions, but it seems like an implicit wait in this case might be too broad to be reliable.  Yes, there is a robust implicit wait API for making sure elements are loaded, but it seems like there could be non-obvious side effects that would cause testers to code too defensively, as I've personally seen before.

An explicit timeout is flaky as well, but it strikes me as being deterministically flaky :)

Typically the problem that I see in my automation has been related to iframes in Internet Explorer, which are loading third party content and ads.  I'd imagine this would be too complicated, but would it be possible to adopt a model that allows configurable blocking on likely culprits for the page not loading? 

If I had to tack on another dimension to what I think is important in the solution, it would be configurability for whatever solution is picked.
-------------------------------------------------------------------------------------------------------------------------------------
Words of Wisdom:
Quitters never win, and winners never quit, but those who never quit and never win are idiots.
Before you attempt to beat the odds, make sure you can survive the odds beating you. 
There are no stupid questions, but there are a lot of inquisitive idiots.
If you can't learn to do something well, learn to enjoy doing it poorly.

Simon Stewart

unread,
Aug 5, 2011, 1:47:24 PM8/5/11
to selenium-...@googlegroups.com
I don't think we can completely ignore the fact that the major use
case for selenium is testing web apps, but the gmail example is a
fantastic one for demonstrating the problems we face. For a start,
most of the content is loaded via XHR after the "onload" event fires.
With the chat integration up and running, I believe that there's a
long hanging get, which means that we can't wait for all the HTTP
requests to finish.

The advice I'd give the gmail team is to use an explicit wait and some
sort of JS variable to indicate that the site is ready for input.

Simon

Santiago Suarez Ordoñez

unread,
Aug 5, 2011, 4:12:46 PM8/5/11
to selenium-...@googlegroups.com
I just like to think that Selenium should behave as real users do.
A real person, will wait until the basic DOM structure is rendered. It may not wait for all images or ads to be there to start clicking around and reading the content, it just needs the basic links and text to be available to start.
I think that would basically represent "wait for the DOM hierarchy to be constructed and rendered". I believe this is what jQuery uses as well.

My two cents.

Santi

Patrick Lightbody

unread,
Aug 5, 2011, 9:35:59 PM8/5/11
to selenium-...@googlegroups.com
Yup, this is why I worry that if some browsers impls wait until onload it might now allow for the full testing. Since onload waits for the images to load, it could be quite a long time and not allow the tester to test situations where users click items that are visible on the page.

--
Patrick Lightbody
Schedule a meeting with me at https://tungle.me/lightbody






Simon Stewart

unread,
Aug 7, 2011, 4:00:17 PM8/7/11
to selenium-...@googlegroups.com
So that's a vote for waiting for the readyState to be "interactive".

Simon

Simon Stewart

unread,
Aug 7, 2011, 4:10:10 PM8/7/11
to selenium-...@googlegroups.com
These will be incredibly flaky tests unless the user takes care to
control the loading of the extra resources, but that's a vote for
"interactive" on the readyState.

The question I'm asking myself is whether this meets the principle of
least surprise. Lots of people expect a page to be fully loaded (and
JS event handlers hooked in via the "onload" event) once a call to
"get" or Selenium RC's "open" completes. A user who wanted to be able
to interact with a page while loading might be able to set the page
load timeout to 0 and mark the failure of a page to load as non-fatal.
Having said that, I've seen enough tests being very flaky because of
pages taking too long to load, so "onload" may be a little too
generous a time out.

Simon

Daniel Wagner-Hall

unread,
Aug 7, 2011, 8:06:23 PM8/7/11
to selenium-...@googlegroups.com
It seems to me that we're looking for a convenient way to make
explicit waits implicit for driver.get calls. I propose we promote
ExpectedCondition from support to the core API, and add two new API
methods:

1: driver.manage().pageLoadedCondition(ExpectedCondition);
2: driver.get(String url, ExpectedCondition);

When 1 is called, any subsequent calls to driver.get(String) (without
specifying an explicit expectedCondition) wait on that
ExpectedCondition, as explicit waits currently do.

We provide a number of pre-defined loaded conditions, which we
support; examples that come to mind:
* return immediately
* readyState=interactive
* readyState=complete
* onload=fired
* allFrames=loaded
* millisecondsHavePassed(ms)

If no condition is passed or has been set using manage, we default to
onload has fired.

It's fairly trivial to create the above ExpectedConditions (except for
allFrames=loaded, but we know how tricky that is anyway).

We start running the ExpectedCondition checks as soon as we can after
the get has started executing.

This way, we make it easy for the same wait conditions to be shared
between implicit and explicit waits, and give access to the DOM for
the user to make up their own mind.

Fundamentally, the overload of get is a convenience over having to put
an explicit wait after every get; I could probably justify adding it
to submit, navigate().back() and navigate().forward(). I would have
more difficulty justifying adding it to click. I definitely couldn't
justify adding it to any other APIs (tbh I feel a bit uncomfortable
with that many overloads as it is, but we are talking about a
convenience here).

We should probably transparently swallow "DOM not yet ready to be
queried" exceptions, but that's it.

Mark Collin

unread,
Aug 8, 2011, 1:23:09 AM8/8/11
to selenium-...@googlegroups.com

Bu a user will not try to start interacting with the elements as quickly as an automated process so are less likely to see issues.


-- This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify postm...@ardescosolutions.com

Simon Stewart

unread,
Aug 8, 2011, 7:20:22 AM8/8/11
to selenium-...@googlegroups.com
I'd much rather be clear on what we mean by "get". A user who wants
more can then use the normal Wait class to enhance what they expect to
happen.

I'm deeply uncomfortable about adding lots of overloaded calls to the
API. It doesn't feel "right" and breaks the aim of trying to keep the
API tight and small. Also, ExpectedCondition is in support because it
requires a third party dependency (guava-libraries) Though that's
nice, I'm not keen on tying core webdriver to a library we're using to
provide functionality that's available from several different
competing libraries.

Simon

Patrick Lightbody

unread,
Aug 9, 2011, 11:24:30 AM8/9/11
to selenium-...@googlegroups.com
These will be incredibly flaky tests unless the user takes care to
control the loading of the extra resources, but that's a vote for
"interactive" on the readyState.

Yes.

The question I'm asking myself is whether this meets the principle of
least surprise. Lots of people expect a page to be fully loaded (and
JS event handlers hooked in via the "onload" event) once a call to
"get" or Selenium RC's "open" completes. A user who wanted to be able
to interact with a page while loading might be able to set the page
load timeout to 0 and mark the failure of a page to load as non-fatal.
Having said that, I've seen enough tests being very flaky because of
pages taking too long to load, so "onload" may be a little too
generous a time out.

I think it offers the most flexibility to end users and is the least surprising, since it allows me to to simulate what real users do (click on things before images are done loading). 

Just yesterday we got bit by this very bug: EC2 had a huge outage last night (40 minutes of no connectivity), so some of our images and 3rd party JS were resulting in timeouts. But most of the page had loaded fine, but because we had some JS initialize onload instead of readyState, users were clicking on our interactive charts and getting strange JS errors. Right now I can't accurately test that scenario with Selenium.

Simon Stewart

unread,
Aug 10, 2011, 2:03:06 PM8/10/11
to selenium-...@googlegroups.com
OK. Here's my plan:

* Add the proposed method to "timeouts". It doesn't seem contentious
and fits neatly with our existing APIs

* Implement a handful of strategies for page loading in the Firefox
driver, selectable with a preference but defaulting to what we have
now.

The strategies will be:

* Wait until the document loader informs us there are no active downloads
* Wait for "onload" in the currently used frame
* Wait for "document.readyState == 'interactive'" in the current frame
* Or "DOMContentLoaded" on older firefoxen
* Wait for "document.readyState == 'complete'" in the current frame

The latter two tie into the document readiness section of HTML5:

http://www.whatwg.org/specs/web-apps/current-work/multipage/dom.html#current-document-readiness

The reason for implementing several strategies is to allow us to try
each of them out on existing tests to see the effect. This is a pretty
fundamental change, so we should be taking care to make sure we know
the impact of what we're about to do.

Simon

Santiago Suarez Ordoñez

unread,
Aug 10, 2011, 3:04:05 PM8/10/11
to selenium-...@googlegroups.com
I think that giving too many options for users is just going to be counterproductive. Control and customization go against simplicity and ease of use. In this case, I'm guessing it will lead us to tons of questions in the user list, huge explanations of what each wait strategy does, when to switch, when not to and a long and painful list of etc's.

My vote goes in favor of choosing a single strategy that does the right thing in 90% of the cases and letting the other 10% use get in a way it returns right away and code their own condition for detecting load.

Now, on "choosing the right strategy", I googled a little bit and found a nice test page to help understand the order in which events are triggered.
Sadly (but not surprisingly), the way browsers behave is inconsistent:
https://saucelabs.com/bugs/c4c1237d507d7ce79216a3a812e70955?auth=21385f1a3d9fc641011fbf8ba7fad244
https://saucelabs.com/bugs/c4c1237d507d7ce79216a3a812d4cbf9?auth=89b4a52833892fd0a73abbb99ff050c9
https://saucelabs.com/bugs/c4c1237d507d7ce79216a3a812e8deaa?auth=18860b6ad185ccc673eed5b5a5921c65

Now, what worries me the most, is that IE sets documents.readyState == 'interactive' even before document.body is loaded, which I'm guessing will lead to hellalot of confusion and broken tests.

Now, time to bring up jquery again. They have worked in this problem already and have way more eyeballs and testers than we ever could. At the same time, their solution is JS based, which means... Atoms!
I'll be reading their code and informing how it works soon.

Best,
Santi

Simon Stewart

unread,
Aug 10, 2011, 4:36:00 PM8/10/11
to selenium-...@googlegroups.com
I'm not going to expose those to the users. Those options are for us
to investigate which approach is best, but we need those options
available to more than just me in order to allow a broader range of
tests to be run.

The whole discussion has been kicked off by the fact that currently
all the drivers behave differently. I'd love some consistant
behaviour. As for jquery, a quick look says that they already a
combination of what we're thinking of doing, but using "complete"
instead of "interactive"

https://github.com/jquery/jquery/blob/master/src/core.js#L425
https://github.com/jquery/jquery/blob/master/src/core.js#L891

Simon

Lukus

unread,
Sep 21, 2011, 1:54:05 AM9/21/11
to Selenium Developers
Has this lost steam?

As I read through all of the comments in this post, I most agree with
Daniel on giving the developer the choice. Isn't Selenium written for
developers? Having the default option for get() to be consistent,
like waiting on readyState to be 'interactive' makes sense, but I
support the idea of being able to set what I want, either waiting on a
specified readyState, or being able to set a timeout. Heck, I'd be
happy if get() did what a user does, which is only type in a url and
press enter. (Yes, the user then waits for visual results before
clicking things, but I am simplifying the get() command down to it's
most basic function).

We use Selenium on all browsers, all OSs (well Mac, Ubuntu, and
Windows versions), and across thousands of websites monthly. The
problem in Internet Explorer with frames is a big challenge at this
point and though I've started to mess with the browser.cpp file to fix
this issue, I am sort of waiting to see what decision is made here.
If I at least had an option for get() to call the url and immediately
return, I could manage the page load stuff myself and not feel like my
hands are tied.

At a miminum, it sounds like what you stated in your first post might
be the way to go, where get() returns after readyState is interactive
and the user can specify a timeout.

Thanks,

The other Luke

Lukus

unread,
Sep 21, 2011, 1:58:07 AM9/21/11
to Selenium Developers
Sorry, that last post was written without realizing there was a second
page of comments to which it does look like this is being
investigated. Thanks for your work guys!
Reply all
Reply to author
Forward
0 new messages