Modified getText method

1,603 views
Skip to first unread message

Ardesco

unread,
Sep 29, 2009, 4:09:13 PM9/29/09
to selenium-developers
With the latest revision of the site I'm currently testing we have
been having some issues with a change to the markup that is designed
to make it more human readable. This change results in a carriage
return being put in text that is longer than 100 characters and tabs
used to move the next line up the start of the text block so that
everything is aligned neatly.

An unfortunate consequence of this is that all of our selenium tests
that use getText to pull in text from elements where this has occurred
are now failing due to the added carriage returns, and tabs. The
markup is being automatically turned into this human readable form so
we don't have a guaranteed placement for these extra characters and th
idea of rewirint every single text validation script we have was quite
daunting.

The easiest fix was to tweak selenium instead.

The fix is very simple and requires minor modifications to two files
as follows:

selenium.java

Old:

/** Gets the text of an element. This works for any element that
contains
text. This command uses either the textContent (Mozilla-like browsers)
or
the innerText (IE-like browsers) of the element, which is the rendered
text shown to the user.
@param locator an <a href="#locators">element locator</a>
@return the text of the element
*/
String getText(String locator);

New:

/** Gets the text of an element. This works for any element that
contains
text. This command uses either the textContent (Mozilla-like browsers)
or
the innerText (IE-like browsers) of the element, which is the rendered
text shown to the user.
@param locator an <a href="#locators">element locator</a>
@return the text of the element
*/
String getText(String locator);

/** Gets the text of an element. This works for any element that
contains
text. This command uses either the textContent (Mozilla-like browsers)
or
the innerText (IE-like browsers) of the element, which is the rendered
text shown to the user with tabs and carriage returns stripped out.
@param locator an <a href="#locators">element locator</a>
@param action defaults to normal, optional setting is clean
@return the text of the element
*/
String getText(String locator, String action);

DefaultSelenium.java

Old:

public String getText(String locator, String action) {
return commandProcessor.getString("getText", new String[]
{locator,});
}

New:

public String getText(String locator, String action) {
if(action.equals("clean")){
return commandProcessor.getString("getText", new String
[] {locator,}).replace("\n", "").replace("\t", "");
} else {
return commandProcessor.getString("getText", new String[]
{locator,});
}
}

public String getText(String locator) {
return getText(locator, "normal");
}

To utilise the new functionality you just need to add an option
parameter with the text "clean" to the getText call e.g.

selenium.getText("//div[@id=''foo']", "clean");

This will strip all \n and \t characters out of the returned text.

The existing functionality of getText is untouched so the following
would still work as normal:

selenium.getText("//div[@id=''foo']");

This is also potentially extensible in the future, just in case other
operations on the returned text are wanted.

Hopefully it will be of use to somebody.

Santiago Suarez Ordoñez

unread,
Sep 29, 2009, 4:52:02 PM9/29/09
to selenium-...@googlegroups.com
On Tue, Sep 29, 2009 at 5:09 PM, Ardesco <ma...@ardescosolutions.com> wrote:
The easiest fix was to tweak selenium instead.

The advantages of open source. :)
Thanks for sharing!

Santi

QA_manager

unread,
Sep 30, 2009, 9:31:47 AM9/30/09
to selenium-developers
Yes, thank you Ardesco; I am sure someone else can use that.

Because that code is sure to get re-used, please add some comments to:
- explain the purpose of the optional 'clean' parameter
- explain the code a little bit

You explained everything quite well above - the problem, the solution,
and the optional Action parameter - but that explanation isn't
included in the code.

Here is what I would do:
- take your explanation of the problem and solution, condense them a
bit and put them into a comment section in the code
- add a one-line comment explaning what the two halves of the if/else
are doing
- write an enhancement request in Jira
- attach your code

That way your code is much more likely to be included in the next
release, and your code is well documented for the next programmer who
needs to understand it.

Additional questions for everyone:
- are there any other Selenium commands that take parameters similar
in purpose to 'clean'? If so, should we standardize the parameter
names?
- should parameters be case sensitive?

On Sep 29, 4:52 pm, Santiago Suarez Ordoñez <santi...@gmail.com>
wrote:

Dave Hunt

unread,
Sep 30, 2009, 10:47:17 AM9/30/09
to selenium-developers
I'm being devil's advocate here, but is this something Selenium should
be doing? Although there might not be a better way in IDE, I would
have looked into cleaning the text in my RC client language.

Dave.

Ardesco

unread,
Sep 30, 2009, 3:30:48 PM9/30/09
to selenium-developers
It was so simple a change I didn't think it was worth adding any to be
honest, I've always worked from the mindset that comments are useful
to describe complex code, things that you are doing that may not be
obvious to another person, etc, but largely pointless if the code is
blatantly obvious. But i digress :)

Comments added as requested

Selenium.java

/** Gets the text of an element. This works for any element that
contains
text. This command uses either the textContent (Mozilla-like browsers)
or
the innerText (IE-like browsers) of the element, which is the rendered
text shown to the user.
@param locator an <a href="#locators">element locator</a>
@return the text of the element
*/
String getText(String locator);

/** Gets the text of an element. This works for any element that
contains
text. This command uses either the textContent (Mozilla-like browsers)
or
the innerText (IE-like browsers) of the element, which is the rendered
text shown to the user.
An optional switch of "clean" can be set that will remove all tabs and
carriage returns from the collected text. Using anything else in the
optional
setting will default to normal behaviour.
@param locator an <a href="#locators">element locator</a>
@param action defaults to "normal", optional setting is "clean"
@return the text of the element
*/
String getText(String locator, String action);

DefaultSelenium.java

public String getValue(String locator) {
return commandProcessor.getString("getValue", new String[]
{locator,});
}

public String getText(String locator, String action) {
/* Read in the value assigned to the string action */
if(action.equals("clean")){
/* If the value of action is equal to clean strip out
all \n (newline) and \t (tab) characters*/
return commandProcessor.getString("getText", new String
[] {locator,}).replace("\n", "").replace("\t", "");
} else {
/* If the value of action is anything else (e.g.
"normal", random text, random characters, etc) return the text
collected verbatum */
return commandProcessor.getString("getText", new String[]
{locator,});
}
}

Ardesco

unread,
Sep 30, 2009, 3:43:18 PM9/30/09
to selenium-developers
Dave,

In this case I would say yes. This code does not remove any of the
markup, just junk characters hidden within the markup that the browser
will ignore anyway. Selenium currently trims the text it returns,
this is really not so different, just an evolution in my opinion. The
option is still there to not remove these characters and I know I have
spent time before trying to debug a getText call that has failed only
to find out that the text shown in firebug is not entirely accurate
because a newline character has been slipped in somewhere. If
anything this should help people who are less experienced with
selenium and don't realise that this characters could be causing
failures.

If the markup contains something like a <br> tag this change will not
touch the tag and if you are trying to match something across a <br>
and attempting to use the added clean functionality to remove the line
break it will fail.

It would have been quite possible to add this function to my existing
test framework, but this seemed to be a cleaner way to do it, required
less modification of my code base and is now something that I will
have available in every test framework I generate without needing an
additional clean function.

Patrick Lightbody

unread,
Oct 1, 2009, 8:07:25 AM10/1/09
to selenium-developers
I'd love to hear Simon's thoughts on this from the WebDriver/Selenium
2.0 perspective. In general, I like the suggest feature, but I know
API is something Simon has a lot more leadership/intelligence on than
I do, and I'd like to see how he'd implement this...

Patrick

2009/9/30 Ardesco <ma...@ardescosolutions.com>:

Simon Stewart

unread,
Oct 1, 2009, 10:39:19 AM10/1/09
to selenium-...@googlegroups.com
TBH, I'd not noticed the fact that we'd tweaked the text we are
returning from Selenium. I'd really like it if the API _stayed
stable_. That's partly because the promise of 1.0 was a stable API and
partly because I'm attempting to put together an implementation based
on webdriver.

In this case, my preference would be to roll back the change to the
formatting of getText, which would restore our previous behaviour and
remove the need for a change to the API, moving the responsibility for
reformatting to the code that requested the text.

Simon

Philippe Hanrigou

unread,
Oct 1, 2009, 11:54:22 AM10/1/09
to selenium-...@googlegroups.com
Ditto Simon

Selenium core mission is to automate the browser, not stripping
whitespace... which can be done by any programming language!

Cheers,
- Philippe

On Thursday, October 1, 2009, Simon Stewart wrote:
>
> TBH, I'd not noticed the fact that we'd tweaked the text we are
> returning from Selenium. I'd really like it if the API _stayed
> stable_. That's partly because the promise of 1.0 was a stable API and
> partly because I'm attempting to put together an implementation based
> on webdriver.
>
> In this case, my preference would be to roll back the change to the
> formatting of getText, which would restore our previous behaviour and
> remove the need for a change to the API, moving the responsibility for
> reformatting to the code that requested the text.
>
> Simon
>

> On Thu, Oct 1, 2009 at 1:07 PM, Patrick Lightbody wrote:
>>
>> I'd love to hear Simon's thoughts on this from the WebDriver/Selenium
>> 2.0 perspective. In general, I like the suggest feature, but I know
>> API is something Simon has a lot more leadership/intelligence on than
>> I do, and I'd like to see how he'd implement this...
>>
>> Patrick
>>

>> 2009/9/30 Ardesco :


>>>
>>> Dave,
>>>
>>> In this case I would say yes.  This code does not remove any of the
>>> markup, just junk characters hidden within the markup that the browser
>>> will ignore anyway.  Selenium currently trims the text it returns,
>>> this is really not so different, just an evolution in my opinion.  The
>>> option is still there to not remove these characters and I know I have
>>> spent time before trying to debug a getText call that has failed only
>>> to find out that the text shown in firebug is not entirely accurate
>>> because a newline character has been slipped in somewhere.  If
>>> anything this should help people who are less experienced with
>>> selenium and don't realise that this characters could be causing
>>> failures.
>>>
>>> If the markup contains something like a

tag this change will not
>>> touch the tag and if you are trying to match something across a

>>> and attempting to use the added clean functionality to remove the line
>>> break it will fail.
>>>
>>> It would have been quite possible to add this function to my existing
>>> test framework, but this seemed to be a cleaner way to do it, required
>>> less modification of my code base and is now something that I will
>>> have available in every test framework I generate without needing an
>>> additional clean function.
>>>

>>>> > @param locator an element locator <#locators>

Santiago Suarez Ordoñez

unread,
Oct 1, 2009, 12:24:30 PM10/1/09
to selenium-...@googlegroups.com
Ditto Simon

Selenium core mission is to automate the browser, not stripping
whitespace... which can be done by any programming language!


+1 for making the API transparent to the user. They should receive exactly what the browser receives.

Santi

Ardesco

unread,
Oct 1, 2009, 3:57:06 PM10/1/09
to selenium-developers
I ought to quantify my trim comment above.

The getText function in Selenium does not itself trim space, but I am
sure that space is trimmed somewhere. As an example take the
following element:

<div id="test">This text <span>gets</span> trimmed</div>

The following will pass:

verifyEquals(selenium.getText('//div[@id='test']/text()[1]), "This
text");
verifyEquals(selenium.getText('//div[@id='test']/text()[2]),
"trimmed");
verifyEquals(selenium.getText('//div[@id='test']), "This text gets
trimmed");

The following will fail:

verifyEquals(selenium.getText('//div[@id='test']/text()[1]), "This
text ");
verifyEquals(selenium.getText('//div[@id='test']/text()[2]), "
trimmed");

Therefore a trim is occurring at some point, I haven't hunted through
the code to find out where this happens, I am just used to it
happening and expect it to happen.

The original function that I provided does NOT remove whitespace, all
it does is remove \n and \t characters. All whitespace is preserved!

Coincidently I had to make a modification to the above function
because the code we are testing seems to randomly put in spaces as
well as tabs and carriage returns, my revised version is as follows:

public String getText(String locator, String action) {
/* Read in the value assigned to the string action */
if (action.equals("clean")) {
/* If the value of action is equal to clean strip out all \n
(newline) and \t (tab) characters,
* then replace all blocks of multiple spaces with one space
just in case our actions have added
* into the returned text stream */
return commandProcessor.getString("getText", new String[]
{locator,}).replaceAll("\n", "").replaceAll("\t", "").replaceAll("\\s
+", " ");
} else {
/* If the value of action is anything else (e.g. "normal",
random text, random characters, etc) return the text collected
verbatum */
return commandProcessor.getString("getText", new String[]
{locator,});
}
}

This one takes all blocks of multiple spaces and replaces them with a
single space. This is done to ensure that we are cleaning up after
ourselves, and mutliple spaces are generally ignored by browsers
anyway.

Again very useful for us, but maybe not everybody's cup of tea.


On Oct 1, 3:39 pm, Simon Stewart <simon.m.stew...@gmail.com> wrote:
> TBH, I'd not noticed the fact that we'd tweaked the text we are
> returning from Selenium. I'd really like it if the API _stayed
> stable_. That's partly because the promise of 1.0 was a stable API and
> partly because I'm attempting to put together an implementation based
> on webdriver.
>
> In this case, my preference would be to roll back the change to the
> formatting of getText, which would restore our previous behaviour and
> remove the need for a change to the API, moving the responsibility for
> reformatting to the code that requested the text.
>
> Simon
>
> On Thu, Oct 1, 2009 at 1:07 PM, Patrick Lightbody <patr...@lightbody.net> wrote:
>
> > I'd love to hear Simon's thoughts on this from the WebDriver/Selenium
> > 2.0 perspective. In general, I like the suggest feature, but I know
> > API is something Simon has a lot more leadership/intelligence on than
> > I do, and I'd like to see how he'd implement this...
>
> > Patrick
>
> > 2009/9/30 Ardesco <m...@ardescosolutions.com>:

jcmeyrignac

unread,
Oct 6, 2009, 9:35:27 AM10/6/09
to selenium-developers
It's in htmlutils.js:

// Returns the text in this element
function getText(element) {
var text = getTextContent(element);
text = normalizeNewlines(text);
text = normalizeSpaces(text);
return text.trim();
}

Note that removing double spaces is useless, since normalizeSpaces
does the job properly.
Be careful ! getText replaces &nbsp;&nbsp; with two spaces, and your
code replaces these two spaces into one, which is wrong.

If you want to remove the trim(), the easiest way is to create a
custom user-extensions.js containing your own getText() function.

JC

Mark Collin

unread,
Oct 12, 2009, 1:43:12 PM10/12/09
to selenium-...@googlegroups.com
I agree that if I wanted to trim the text, normalizeSpaces would be the way to do it, but I don't!

The whole point of my function is to remove \n and \t characters.  The bit of code that removes multiple spaces and replaces them with one space is a way to clean up the remains once the \n and \t have been removed.

The solution may well only be applicable for us, as our devs have added code to align long lines of text into a nice human readable block when you look at the source.  This has resulted in a bunch of \t, \n and spaces getting inserted into blocks of text on the fly so we needed a solution to get rid of them all.  I realise that this can result in two non breaking spaces being converted into one space, but it is the lesser of two evils, especially considering the fact that generally speaking people don't use lots of non breaking spaces within paragraphs of text.

To be very clear, I don't want to trim, I have never wanted to trim (since selenium already does that for me) and my code is not supposed to trim.  My code does what I want it to do properly, it may however not do what others want it to do, but then they are quite free to modify it to meet their own requirements :)
-- This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify postm...@ardescosolutions.com. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

jcmeyrignac

unread,
Oct 13, 2009, 8:53:03 AM10/13/09
to selenium-developers
In your case, why not use a span with a fixed width in pixels, so that
the browser renders it correctly ?
Your solution adds a lot of risk, like when using proportional fonts
(try comparing a line containing only 'i's with a line containing
'w's).

Adding \n and \t programmatically seems a lot of work.
And your solution has no value outside of your scope.

In your case, since you just want to remove the \n and \t, you could
just add to user-extensions.js something like:

function getText(element) {
var text = getTextContent(element);
text = normalizeNewlines(text);
text = normalizeSpaces(text);
text = text.trim();
text = text.replace(/\r|\n|\r\n/g, "")
return text;
}

Previously, I patched Selenium RC to output XML instead of HTML, but
now, if I had to do it again, I'd avoid modifying Selenium RC, by
using an external HTML to XML converter.
Patching the Java code in Selenium force you to maintain your own
version, and it may become a nightmare to update (it is the case for
me).

JC

Mark Collin

unread,
Oct 13, 2009, 9:33:36 AM10/13/09
to selenium-...@googlegroups.com
Unfortunatly I don't code the site, I test it. So invariably have to
work with what I'm given :)

It's a relativly minor patch and easy enough to just chuck a new jar
into the vendor directory to get it to work.

To be clear the \n and \t are in the source, they have no effect on
the way the browser renders the source. They are purely there to make
the source more readable when you do z view source. Changing fonts
will have no effect on the way the browser renders the text as it
ignores the \n and \t anyway.

------------------------------
Mark Collin
Managing Director
Ardesco Solutions Ltd
Reply all
Reply to author
Forward
0 new messages