Capturing text between elements

6,327 views
Skip to first unread message

robbiewinston

unread,
Sep 7, 2012, 8:44:18 AM9/7/12
to seleniu...@googlegroups.com
Hi,

I have the following snippet of html;

<p>
abc
<br>
def
<br>
ghi
<br>
jkl
<br>
mno
<br>
pqr
<br>
stu
<br>
vwx
<br>
yz
<br>
</p>


I want to capture the text between the <br> tags as separate elements so that I can prove the string has been broken up correctly.

I can use this XPATH to identify the text items; //div[@class='content']//div[@id='offer_terms']/p/br/preceding-sibling::text()

However I can not this in FindElements as it throws this error;

findElements execution failed;
 The result of the xpath expression "//div[@class='content']//div[@id='offer_terms']/p/br/preceding-sibling::text()" is: [object Text]. It should be an element. (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 25 milliseconds
For documentation on this error, please visit: http://seleniumhq.org/exceptions/invalid_selector_exception.html
Build info: version: '2.25.0', revision: '17482', time: '2012-07-18 21:08:56'
System info: os.name: 'Windows 7', os.arch: 'amd64', os.version: '6.1', java.version: '1.7.0_07'
Driver info: driver.version: EventFiringWebDriver
Session ID: 9c3726770e6dcfd11a32e8c529fa3bbf

I understand why the error is thrown but cannot think of anyway to grab this text and assert that the <br> are in the right place.

Thanks 

Robbie

robbiewinston

unread,
Sep 7, 2012, 8:47:34 AM9/7/12
to seleniu...@googlegroups.com
Formatting problems in my post, the <br> are actually <br /> in the real html. 

Edwolb

unread,
Sep 7, 2012, 8:50:11 AM9/7/12
to seleniu...@googlegroups.com
Since Selenium is meant to be a tool that replicates a user's web browser experience, I don't believe there are easy and native ways to pick out things that are delimited by tags that are somewhat invisible to a user (other than a line break).  There are probably two ways of doing this:

1. Get the element <p>, do a getText() on it, and then analyze the text based on new lines.  The BRs won't show up, but the text should have new lines that represent what the users sees.  You could validate it using a regular expression of some sort.  I'll be honest though, I'm not sure if a BR translates to a newline through a getText() call.

2. Use the JavaScript executor to grab the source of the element, and analyze it that way.

--
Chris

robbiewinston

unread,
Sep 7, 2012, 8:50:29 AM9/7/12
to seleniu...@googlegroups.com
Okay, so I can get the text of <P> and split on /r/n but was wondering about a prettier solution?

Edwolb

unread,
Sep 7, 2012, 8:56:32 AM9/7/12
to seleniu...@googlegroups.com
From a user experience point of view, that's what you'd be validating.  "Is each set of text on a new line", not "Are all the <br> tags in the right place".  So I'd say that solution is a little prettier than a tag analysis :)  I think the problem is that "text" itself isn't considered an element, so if you're hoping to get a list of elements, some that are text and some that represent BR tags, I'm not sure that'll be possible.  If you're not using WebElements, you're not using Selenium's native capabilities, so you're looking for a workaround, which perhaps isn't pretty, but it should do the job :)

--
Chris

Peter Gale

unread,
Sep 7, 2012, 9:24:31 AM9/7/12
to Selenium Users
Does the design require the text to be split into separate lines?

If not, you couldn't raise a bug if they weren't on separate lines, so there'd be no point in running the tests anyway.

If they are, you could ask the developers to help facilitate the automated testing against the design by wrapping each line in a non-formatiing pair of tags (perhaps 'spans'?).

They'd still need to leave a <br> tag in before each subsequent span you'd be able to validate each line separately as well as check for a line break in between using WebDriver.


Date: Fri, 7 Sep 2012 05:44:18 -0700
From: wareham...@gmail.com
To: seleniu...@googlegroups.com
Subject: [selenium-users] Capturing text between elements
--
You received this message because you are subscribed to the Google Groups "Selenium Users" group.
To post to this group, send email to seleniu...@googlegroups.com.
To unsubscribe from this group, send email to selenium-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/selenium-users/-/u5pdXw5EgFsJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

sirus tula

unread,
Sep 7, 2012, 9:36:07 AM9/7/12
to seleniu...@googlegroups.com
Hi robbie,
 
I had similiar problem when i had to assert the text that was broken into several lines using <br>. I used this approach as shown below. I used :\r\n for broken up lines like shown below.
 

string desc = driver.FindElement(By.XPath("//div[@id='<uniquedivnamebeforethatp>']")).Text;

Assert.AreEqual(desc, "abc:\r\ndef"\r\ngh");

Hope that helps.

Sirus

 



sirus tula

unread,
Sep 7, 2012, 9:50:53 AM9/7/12
to seleniu...@googlegroups.com

Typo
 
Assert.AreEqual(desc, "abc:\r\ndef"\r\ngh");
 

robbiewinston

unread,
Sep 7, 2012, 11:41:24 AM9/7/12
to seleniu...@googlegroups.com
Thanks, looks like I'm going to have to go with splitting on the /n/r

Darrell Grainger

unread,
Sep 7, 2012, 3:46:28 PM9/7/12
to seleniu...@googlegroups.com
The text is not part of the BR tags. So anything like finding all the BR tags then looking for the text will not work. Getting the P element then splitting based on return characters would be the most elegant solution. If you did:

    WebElement p = driver.findElement(By.cssSelector("div#offer_terms>p"));
    String text = p.getText();

then using the split method of the String class to split it into an array of strings. You can then loop through the array checking all the strings.

Darrell


On Friday, 7 September 2012 08:44:18 UTC-4, robbiewinston wrote:

David

unread,
Sep 8, 2012, 11:24:01 PM9/8/12
to seleniu...@googlegroups.com
Just wanted to mention that if you wanted to check the BR tags with the text rather than the formatted equivalent of newlines, you might be able to do so with using getAttribute("innerHTML") instead of getText() to preserve the source inside the p tags.

Nitya Jakkam

unread,
Aug 29, 2017, 11:49:37 PM8/29/17
to Selenium Users
I tried xpath, tagname and cssSelectors with .getText() and .toString(). None of them worked. getAttribute("innerHTML") worked, finally! Thanks David!
Reply all
Reply to author
Forward
0 new messages