Accessing HTML fragments

29 views
Skip to first unread message

Beders

unread,
Jul 1, 2009, 2:16:42 PM7/1/09
to webdriver
Hi,

first of all. Thanks for webdriver. It allowed me to put together a
really nice demo within a couple of hours. Great!

I was wondering if there is way to get an HTML fragment (as either a
partial DOM tree or a String) from a WebElement.
I tried to use webelement.findElements(By.xpath("node()|text()"));
but the API is not equipped to support any kind of node-set coming
back from the xpath engine.

Is there a way to get access to the internal presentation of the
current document?
I could provide a patch for this in case there is no other way to
access the DOM tree. How can I submit that?

Thanks,
Jochen

Steve CCRP

unread,
Jul 1, 2009, 7:03:17 PM7/1/09
to webdriver
You could use Javascript to pull out the contents of a WebElement as a
String using the innerHTML method. Here's an example of pulling the
whole source from the <html> tag...

WebElement htmlTag = driver.findElement(By.tagName("html"));
JavascriptExecutor js = (JavascriptExecutor) driver;
String contents = (String)js.executeScript("return arguments
[0].innerHTML", htmlTag);
System.out.println(contents);

Check if the output is exactly what you want.

Beders

unread,
Jul 2, 2009, 2:40:49 AM7/2/09
to webdriver
Thanks! Will try. Looks evil and dirty. I like it :)

Simon Stewart

unread,
Jul 2, 2009, 5:28:05 AM7/2/09
to webd...@googlegroups.com
Hi Jochen,

You're right: the API doesn't support pulling back something other
than an element coming back from the find methods. As Steve points out
in a later email, you can use the JavascriptExecutor to extract the
innerHTML of an element if that's what you need.

The reason for this is to keep the APIs clear, small and unambiguous.
If we allowed "findElement" to return an arbitrary node, we'd need to
change the return type from WebElement (which is clear) to "Node" (or
some variant, which is more ambiguous) In the common case, we do
actually want to return and interact with a WebElement. We're trying
to make the easy stuff easy, and the hard stuff possible (to steal a
phrase :)

Now, there is an argument to be made that we could "re-purpose"
WebElement to act as a Node. In that case, returning a text node would
mean that only "getText" would be a valid method to call. We already
do something a little like this; WebElement has every method you might
want, and it throws an exception if you attempt something illogical.
For example, if you call "toggle" on something other than a checkbox,
an exception will be thrown.

I'm not sure I like that re-purposing. I'm always happy to debate on-list :)

Simon

Beders

unread,
Jul 2, 2009, 2:42:20 PM7/2/09
to webdriver
Well, I would be happy to be able to use the full power of XPath,
maybe through a new method.
List<HtmlNode> WebElement.findNodes(By by);

The use-case here is not so much about testing web-sites, but doing
screen scrapes, for example, postings in a web forum.
The text of those forums contains of a restricted set of HTML tags,
i.e. a MIXED content model in DTD speak. I will try to use innerHTML
for that purpose, but I'd prefer to use as little javascript as
possible.

I agree that the API should be as simple as possible and the Webdriver
API is a pretty neat example.
Still, having very fine grained access to the HTML content would be
great.

Thanks,
Jochen
Reply all
Reply to author
Forward
0 new messages