Status of research on the IE driver

2489 views
Skip to first unread message

Jim Evans

unread,
Aug 30, 2010, 9:13:42 AM8/30/10
to Selenium Developers
I've taken it upon myself to try to improve the IE driver code base.
My immediate goals are to solve two problems with the driver, namely
the need to sometimes elevate privileges on Vista and above for IE 7
and 8, and the inability to create more than one driver instance.
Secondary goals are to reduce the complexity of the language binding
driver code, probably by implementing some form of RemoteWebDriver
server component to actually driver the browser, so the language
bindings can derive from RemoteWebDriver; and better encapsulating
user interaction code so that it can be replaced with the new user
interaction API is ready. Tertiary goals are to allow for future
implementation of a browser tab API, and an existing browser
attachment API. I've not yet written any code I'm ready to share, but
there are some decisions to be made with respect to the direction we
want to go. The results of my research have been enlightening, and I
thought I'd share with the group, so I could get help making these
directional decisions.

The main problem is with Protected Mode in IE 7 or 8 on Windows Vista
or above. Whenever you transition into or out of Protected Mode, IE
requires that another session be created. In IE 7, this will usually
manifest itself as a new top-level browser window; in IE 8, a new
IExplore.exe process will be created, but it will usually (not
always!) seamlessly attach it to the existing IE top-level frame
window. As near as I can tell, any browser automation framework that
drives IE externally will run into these problems.[1]

The good news is that we can attach to these new processes, including
sinking events, and do it without requiring user privilege elevation
or using a Browser Helper Object (BHO, or IE add-in)[2]. The bad news
is that it seems impossible to avoid race conditions with the
navigation events in that instance. As near as I can tell, there is no
one-size-fits-all, just-make-it-work solution, but there are some
alternatives that I can suggest.

First, we could dictate that to work with IE, all zones must have the
same Protected Mode setting. As long as it's on for all zones, or off
for all zones, we can make it work pretty well using many of the
techniques in the existing code base. The pros of this solution are
that protected mode settings are per user, and usually don't require
admin privileges to set. It allows our users to continue to run with
UAC turned on, and to run securely in the browser if they set
Protected Mode "on" for all zones. On the downside, there's no
documented or easy way to programmatically set this setting; the user
will have to do it themselves.

Second, we could let the Protected Mode preferences alone and do some
sort of "best guess" algorithm to find new browser windows as they
appear. On the plus side, it will appear to "just work" for users. On
the other hand, it almost completely invalidates the multiple IE
driver use case, and it's almost a certainty that we will guess wrong
at some point, leading to unexpected behavior in tests.

So now, I'm looking for input. What am I missing? These are the most
promising alternatives I've managed to think of, but I'm sure there
are others, so what are they? I'd love to hear some other ideas, so
please speak up.

Regards,
--Jim Evans

[1] Yes, even Watir is subject to these potential problems. Please see
http://wiki.openqa.org/display/WTR/FAQ#FAQ-WhatshouldIdoiftwobrowserwindowsappearwhenrunningatestunderWindowsVista%3F.
[2] Using a BHO would not only require an actual installer, but also
require user elevation to install. It has other challenges too, which
I'm more than happy to discuss off-list, particularly with identifying
which browser windows we are really controlling.

Kevin Menard

unread,
Aug 30, 2010, 9:35:08 AM8/30/10
to selenium-...@googlegroups.com
Hi Jim,

Great analysis. Some thoughts and questions appear in-line.

On Mon, Aug 30, 2010 at 6:13 AM, Jim Evans <james.h....@gmail.com> wrote:

> First, we could dictate that to work with IE, all zones must have the
> same Protected Mode setting. As long as it's on for all zones, or off
> for all zones, we can make it work pretty well using many of the
> techniques in the existing code base. The pros of this solution are
> that protected mode settings are per user, and usually don't require
> admin privileges to set. It allows our users to continue to run with
> UAC turned on, and to run securely in the browser if they set
> Protected Mode "on" for all zones. On the downside, there's no
> documented or easy way to programmatically set this setting; the user
> will have to do it themselves.

How is this a better situation than installing a DLL with regsvr32?
It could be installed under HKCU and not affect the entire machine.
I'm probably missing something very obvious, but it seems if the user
already has to jump through hoops, we may as well make it easier for
ourselves. It'd probably be easier for the user, too, instead of
setting the zone settings.

Having said that, I'm pretty sure you can manipulate the zone settings
via the registry. We already do some of this in the IE launcher for
Se 1.x. The tricky point (if it works at all) is figuring out how to
unroll the changes to end back up in a clean state when multiple IE
instances are being run.

> Second, we could let the Protected Mode preferences alone and do some
> sort of "best guess" algorithm to find new browser windows as they
> appear. On the plus side, it will appear to "just work" for users. On
> the other hand, it almost completely invalidates the multiple IE
> driver use case, and it's almost a certainty that we will guess wrong
> at some point, leading to unexpected behavior in tests.

This could be tricky, but always grabbing the most recently created IE
window should be a good enough heuristic. It probably would require
looking up the process hierarchy to make sure a new tab wasn't created
in an older window VS a new window being created. But, if it's
documented and deterministic, it's probably good enough.

Otherwise, an in-process COM server should let you operate on the
correct window.

--
Kevin

Jim Evans

unread,
Aug 30, 2010, 10:13:47 AM8/30/10
to Selenium Developers
> How is this a better situation than installing a DLL with regsvr32?
> It could be installed under HKCU and not affect the entire machine.
> I'm probably missing something very obvious, but it seems if the user
> already has to jump through hoops, we may as well make it easier for
> ourselves. It'd probably be easier for the user, too, instead of
> setting the zone settings.

If you're talking about creating and installing a BHO via regsvr32,
you can't create a per-user BHO. It *must* be per-machine, and you
*have* to register it in HKLM, which requires elevation. I'd be
interested in seeing how the zone manipulation works though. I used
procmon to view the registry calls made by IE when I changed my
settings, and the results looked to be really, really obscure.

> This could be tricky, but always grabbing the most recently created IE
> window should be a good enough heuristic.

Could be, but there is a small chance that we could get the wrong
window. This chance goes up when you think about multiple drivers
driving different browser instances, which could all be opening new
windows at roughly the same time. Right now, we're restricted to a
single browser instance, so the chance would be a lot lower if we kept
that model. I'd really like to see us allow multiple driver instances,
though.

Incidentally, how do you tell the "most recently created" one? I don't
know what API to call to get that information. Do we always keep a
list of every IE window open on the system? Use IShellWindows (or more
correctly DShellWindowsEvents)?

--Jim

On Aug 30, 9:35 am, Kevin Menard <nirvd...@gmail.com> wrote:
> Hi Jim,
>
> Great analysis.  Some thoughts and questions appear in-line.
>

Kevin Menard

unread,
Aug 30, 2010, 10:38:02 AM8/30/10
to selenium-...@googlegroups.com
On Mon, Aug 30, 2010 at 7:13 AM, Jim Evans <james.h....@gmail.com> wrote:

> If you're talking about creating and installing a BHO via regsvr32,
> you can't create a per-user BHO. It *must* be per-machine, and you
> *have* to register it in HKLM, which requires elevation.

Ahh, right. Sorry, I forgot about that. I've been playing with NPAPI
which will read out of HKCU.

> I'd be interested in seeing how the zone manipulation works though. I used
> procmon to view the registry calls made by IE when I changed my
> settings, and the results looked to be really, really obscure.

Take a look through the source and you can see where some of these
entries exist. I've found it much easier to snapshot the registry,
make the change, snapshot again, and then diff the results. It's
laborious, but seems to work rather well.

> Incidentally, how do you tell the "most recently created" one? I don't
> know what API to call to get that information. Do we always keep a
> list of every IE window open on the system? Use IShellWindows (or more
> correctly DShellWindowsEvents)?

You could have timing issues at creation time, but then again, they're
browsers that haven't been associated with any WebDriver instance yet
so it probably doesn't matter a whole lot. Once one is associated
with the WebDriver instance, you'd just keep track of the proc ID or
HWND.

GetProcessTimes looks like the API call you'd want to dig up creation time:

http://msdn.microsoft.com/en-us/library/ms683223

Of course, you'll still need to find the list of processes with name
"iexplore.exe," but that should be fairly straightforward.

--
Kevin

Jim Evans

unread,
Aug 30, 2010, 12:01:33 PM8/30/10
to Selenium Developers
Could be that I'm way off base here and trying to make this harder
than it is. My approaches to solving these problems rest on the
following assumptions:

1. We do not want to interfere with browser processes the user creates
manually (either before or during test execution).
2. We want to be able to instantiate multiple drivers and run them in
parallel.
3. We want a driver session to be able to control all of the browsers
that are opened by actions taken in the context of that session.

As a check of assumptions and/or by-design limitations, do we
collectively agree that these assumptions are correct? I am not an
expert in the internal workings of the Firefox or Chrome drivers, but
do they follow these assumptions or am I mistaken?

Consider the following specific sequence of events using IE 7:

1. User has a WebDriver session browsed to a page with a link that
navigates to a page in a different Protected Mode zone.
2. User finds the link element and clicks it with WebElement.click().
3. The browser navigates to the target location using a new session.

IE 7 will launch a new top-level window, but you don't receive any
notification via any of the COM events you can subscribe to that a
navigation occurred at all. In other words, BeforeNavigate2 never
fires. That means you can't tell WebDriver, "Hey, I just caused a new
window to open up. Go attach to and manage that new window." Since we
use asynchronous native events for the click, and not every click
event leads to a navigation, you have a lot of work to do to figure
out if you should expect a navigation, and there's no way we'll get
that 100% right. And that doesn't even account for the more
complicated case of when there's some sort of redirect after the
navigation. IE 8 is a little better, because you'll get a Quit event
for the browser you were attached to before the link click, but you
still have a race condition to contend with in the post-navigation
redirect case.

As you say, I could go grab the last IE instance created, but I have
no way to know for sure if that's the one I want. Additionally, what
does WebDriver.getWindowHandles() return? I suppose it could return
every window handle opened since the driver was created, but that
violates assumption 1 and makes it really hard to fill assumption 2.

Regards,
--Jim

On Aug 30, 10:38 am, Kevin Menard <nirvd...@gmail.com> wrote:

Kevin Menard

unread,
Aug 30, 2010, 12:11:34 PM8/30/10
to selenium-...@googlegroups.com
Hey Jim,

I should preface that you've spent way more time looking at this than
I have, so I'll defer to you on the finer points. My BHO experience
is largely relegated to out-of-Selenium work and the work done on
SnapsIE. I've audited a bit of the WebDriver IE code, but don't know
it well enough to speak authoritatively about it.

More comments below:

On Mon, Aug 30, 2010 at 9:01 AM, Jim Evans <james.h....@gmail.com> wrote:
> Could be that I'm way off base here and trying to make this harder
> than it is. My approaches to solving these problems rest on the
> following assumptions:
>
> 1. We do not want to interfere with browser processes the user creates
> manually (either before or during test execution).

I guess this one I don't care all that much about. It'd be great if
we could do it, but if we can't, I'm personally okay with dropping it.
A test machine really shouldn't be interactively used.

> 2. We want to be able to instantiate multiple drivers and run them in
> parallel.

+1

> 3. We want a driver session to be able to control all of the browsers
> that are opened by actions taken in the context of that session.

+1

I'd have to look at the process hierarchy to see if any ancestry is
established or not to see how easily that could be supported.

--
Kevin

Jim Evans

unread,
Aug 30, 2010, 1:55:33 PM8/30/10
to Selenium Developers
> I'd have to look at the process hierarchy to see if any ancestry is
> established or not to see how easily that could be supported.

I'm probably wrong about this, but if I'm seeing this correctly, all
of the iexplore.exe processes are created by the broker process
(ieuser.exe in IE7, an instance of iexplore.exe in IE8), of which
there is ever only one. I'd love to be wrong, but I don't think you're
going to be able to determine how a particular IE window came to be by
looking only at its process information. Let me know if you find
differently.

--Jim

On Aug 30, 12:11 pm, Kevin Menard <nirvd...@gmail.com> wrote:
> Hey Jim,
>
> I should preface that you've spent way more time looking at this than
> I have, so I'll defer to you on the finer points.  My BHO experience
> is largely relegated to out-of-Selenium work and the work done on
> SnapsIE.  I've audited a bit of the WebDriver IE code, but don't know
> it well enough to speak authoritatively about it.
>
> More comments below:
>
Reply all
Reply to author
Forward
0 new messages