Cross-browser function for Text content

Rüdiger Plantiko

unread,

Apr 12, 2010, 9:05:55 AM4/12/10

to Prototype & script.aculo.us

Hi there,

is there a cross-browser function for retrieving the text content of
an element?

If I have an element like

I get the number 4711 in IE with $("test").innerText and in FF with $
("test").textContent - does Prototype provide a browser-independent
abstraction for this?

Regards,
Rüdiger

T.J. Crowder

unread,

Apr 12, 2010, 1:04:51 PM4/12/10

to Prototype & script.aculo.us

Hi,

> I get the number 4711 in IE with $("test").innerText and in FF with $
> ("test").textContent - does Prototype provide a browser-independent
> abstraction for this?

Hopefully you get the *string* "4711" rather than the number 4711
(unless you parse it). :-)

`innerHTML` works on all major browsers. It was introduced by IE back
in v5 or so, supported by every major browser, and is now standardized
in the HTML5 stuff. Of course, it returns the HTML rather than the
text, so given:

$('test').innerHTML will return "<em>4711</em>". If you want just the
text with the tags stripped away, Prototype adds `stripTags`[1] to the
`String` prototype, so $('test').innerHTML.stripTags() will return
"4711".

If you wanted to shorten that a bit, you could add a `text` function
via Element.addMethods[2]:

Element.addMethods({
text: function(element) {
if (!(element = $(element))) return;
return element.innerHTML.stripTags();
}
});

(That's off-the-cuff, but I think it's correct; more on the addMethods
page.)

[1] http://api.prototypejs.org/language/string/prototype/striptags/
[2] http://api.prototypejs.org/dom/element/addmethods/

HTH,
--
T.J. Crowder
Independent Software Consultant
tj / crowder software / com
www.crowdersoftware.com

On Apr 12, 2:05 pm, Rüdiger Plantiko <ruediger.plant...@astrotexte.ch>
wrote:

Rüdiger Plantiko

unread,

Apr 12, 2010, 3:35:10 PM4/12/10

to Prototype & script.aculo.us

Hi TJ,

> > I get the number 4711 in IE with $("test").innerText and in FF with $
> > ("test").textContent - does Prototype provide a browser-independent
> > abstraction for this?
>
> Hopefully you get the *string* "4711" rather than the number 4711
> (unless you parse it). :-)

You are right, in a posting every word is important, in order
to avoid misunderstandings. So, yes: I am getting the string, not the
number.

> ... innerHTML ...

yeah, if the document structure guarantees to me that the element in
question only contains a text node, then I could use innerHTML
equivalently
to innerText/textContent.

> Element.addMethods({
> text: function(element) {
> if (!(element = $(element))) return;
> return element.innerHTML.stripTags();
> }
>
> });

thanks for the reference to String.stripTags() - I hadn't realized the
existence of
such a function before.

- Regards,
Rüdiger

Eric

unread,

Apr 13, 2010, 5:27:18 AM4/13/10

to Prototype & script.aculo.us

What I use in this case is:
$('test').textContent || $('test').innerText

If textContent is defined, it is used. if it isn't, innerText is used.
This would be a lot faster than stripping tags.
The only drawback I can think of is that sometime you may get an
undefined instead of an empty string (which would be equivalent to
false).

Eric

On Apr 12, 9:35 pm, Rüdiger Plantiko <ruediger.plant...@astrotexte.ch>
wrote:

Message has been deleted

Eric

unread,

Apr 13, 2010, 5:39:13 AM4/13/10

to Prototype & script.aculo.us

Oooops, gmail sent the message before I finished... :o)

Here is the correct message (please ignore the previous one)

On Apr 12, 7:04 pm, "T.J. Crowder" <t...@crowdersoftware.com> wrote:

> Element.addMethods({
> text: function(element) {
> if (!(element = $(element))) return;
> return element.innerHTML.stripTags();
> }

> });

wouldn't it be wiser to check for the native method once and use it?

Something like (untested)

Element.addMethods({
text: ($$('BODY').first().textContent===undefined)
? function(element) { if (!(element = $(element))) return;
return element.innerText; }

: function(element) { if (!(element = $(element))) return;

return element.textContent; }
});

Eric

NB: I know, the testing condition is ugly... feel free to post a
better one :o)

T.J. Crowder

unread,

Apr 13, 2010, 8:20:59 AM4/13/10

to Prototype & script.aculo.us

On Apr 13, 10:39 am, Eric <lefauv...@gmail.com> wrote:
> wouldn't it be wiser to check for the native method once and use it?

Probably. I'd also check for innerText (in fact, I'd check for that
first), since it's supported by IE, WebKit (so Chrome, Safari), and
Opera; only Mozilla holds out. textContent is supported by all of them
except IE. So:

Element.addMethods((function() {

return {
/**
* Element.text() -> String
*
* Gets the text within the element, ignoring any tags
(essentially the sum of all of the
* text nodes within).
**/
text: (function() {
var element, testvalue;

element = document.createElement("span");
element.innerHTML = testvalue = "foo";
if (text_fromInnerText(element) == testvalue) {
return text_fromInnerText;
}
if (text_fromTextContent(element) == testvalue) {
return text_fromTextContent;
}
return text_fromStripping;
})()
};

// Get the element's inner text via innerText if available (IE,
WebKit, Opera, ...)
function text_fromInnerText(element) {

if (!(element = $(element))) return;
return element.innerText;
}

// Get the element's inner text via textContent if available
(Gecko, WebKit, Opera, ...)
function text_fromTextContent(element) {

if (!(element = $(element))) return;
return element.textContent;
}

// Get the element's inner text by getting innerHTML and stripping
tags (fallback)
function text_fromStripping(element) {

if (!(element = $(element))) return;
return element.innerHTML.stripTags();
}

})());

Do people think I should submit this to core? jQuery has an equivalent
function, and I think I saw one in Closure as well. So it's not just
the OP who wants to do this...

-- T.J. :-)

kangax

unread,

Apr 13, 2010, 9:42:05 AM4/13/10

to Prototype & script.aculo.us

We've been getting these requests in the past. Take a look at, for
example: <URL:
http://groups.google.com/group/prototype-core/browse_thread/thread/8ef26e7cedb43afc/47033b4bc8dc4c74#47033b4bc8dc4c74>

I still think that it's not a trivial solution (for the reasons
outlined in the post linked above) and so is best handled by a
standalone plugin. And using context-unaware `stripTags` on something
like HTML is usually asking for trouble :) (imagine what stripTags
would do to a string like this — "foo bar <script>function wrap(html)
{ return 'div' + html + '</div>'}</script> baz"; and then there are
other elements with CDATA content model, like STYLE)

--
kangax

T.J. Crowder

unread,

Apr 13, 2010, 11:29:11 AM4/13/10

to Prototype & script.aculo.us

Hi,

On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote:
> We've been getting these requests in the past. Take a look at, for

> example: <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...>

>
> I still think that it's not a trivial solution (for the reasons

> outlined in the post linked above)...

"Oh my sweet Lord in heaven!" he exclaimed, after reading the
stackoverflow answer linked from the above and seeing all of the
myriad inconsistencies.

Blech. That's useful to know, thanks.

jQuery seems to go with the collect-all-text-nodes answer, completely
ignoring innerText and textContext, presumably for these very reasons.
It also fails to strip the content of script tags (like FF's
textContext does), which seems odd (doesn't look very optimized,
either, but perhaps it's fast enough without).

But I would argue that these reasons are exactly why Prototype should
have this feature. This is Prototype's raison d'etre, smoothing out
various browser differences (and outright insanities, such as
including script contents!).

I coded up a simple text node gatherer[1] that omits the contents of
script elements, and the performance isn't bad at all. Even using the
slowest major browser, it happily gave me the 12k of text content in a
moderately-complex page (various menus and controls, plus a 580-row 3-
column table containing links) in about a third of a second on my
little Atom-class netbook.

I created a bookmarklet[2] of it that reports character count, time,
and such and ran it against a large, complex document (the current all-
in-one-page HTML5 specification[3]) using Chrome, which gave me all
2,090,693 characters spread across 86,018 elements (I didn't count all
nodes, just elements) in just under two seconds (again on the
netbook). Firefox did the same in just under three seconds, and IE7
(after taking several *minutes* -- and several script errors -- just
to load the document) ran the bookmarklet in 12.5 seconds. Pretty
decent for IE. :-) The character counts were identical between Chrome
and Firefox; IE saw slightly fewer characters (1,891,293) and elements
(85,972), but that could have been down to the script errors. Firefox
reported one fewer element than Chrome.

I haven't particularly tested or optimized that code, it's just a
starting point. It builds things up in an array and uses #join at the
end, which is probably slower for small tasks than jQuery's approach
(string concatenation), but probably faster for large tasks (like the
HTML spec). I say "probably" in each case because I haven't tested,
and I've learned not to make performance assertions without data. :-)

[1] http://pastie.org/917566 (also quoted inline below)
[2] http://pastie.org/917567
[3] http://www.w3.org/TR/html5/Overview.html (warning: *LARGE*
document)

Code from [1] pasted inline:
* * * *
Element.addMethods((function() {

/**
* Element.textValue() -> String
*
* Gets the text within the element, ignoring any tags; e.g.,
returns the sum of all of the
* text nodes. Omits the text nodes within `script` elements.
**/
function textValue(element) {

if (!(element = $(element))) return;

var collector = [];
textValueCollector(element, collector);
return collector.join("");
}
function textValueCollector(element, collector) {
var node;

for (node = element.firstChild; node; node = node.nextSibling)
{
switch (node.nodeType) {
case 3: // text
case 4: // cdata
collector.push(node.nodeValue);
break;
case 8: // comment
break;
case 1: // element
if (node.tagName == 'SCRIPT') {
break;
}
// FALL THROUGH TO DEFAULT
default:
// Descend
textValueCollector(node, collector);
break;
}
}
}

return {textValue: textValue};
})());
* * * *

-- T.J. :-)

On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote:
> We've been getting these requests in the past. Take a look at, for

> example: <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...>

T.J. Crowder

unread,

Apr 15, 2010, 9:35:41 AM4/15/10

to Prototype & script.aculo.us

Interestingly, Firefox's `textContent` behavior of including the
script element's contents (which I called "insanity") is *standard* as
far as I can tell -- and has been for years:

http://www.w3.org/TR/html5/infrastructure.html#textcontent
http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent

Surprisingly, IE8 still doesn't support it, but even if it did,
frankly it doesn't do what _I'd_ want.

-- T.J.

> [1]http://pastie.org/917566(also quoted inline below)
> [2]http://pastie.org/917567
> [3]http://www.w3.org/TR/html5/Overview.html(warning: *LARGE*

Rüdiger Plantiko

unread,

Apr 21, 2010, 10:38:11 AM4/21/10

to Prototype & script.aculo.us

Hi Eric and TJ,

thanks for your further research in this matter!

> Surprisingly, IE8 still doesn't support it, but even if it did,
> frankly it doesn't do what _I'd_ want.

Does IE8 claim to be HTML5 compliant?

> $('test').textContent || $('test').innerText

> The only drawback I can think of is that sometime you may get an
> undefined instead of an empty string (which would be equivalent to
> false).

If somebody really is annoyed by this drawback, he could write

$('test').textContent || $('test').innerText || ""

This solution is fine if you have a handful evaluations of text
contents, as it will usually be the case. But anyway, a browser-
independent abstraction would be nice so that this construct is only
at one place in the code. This why I posted the question.

As you point out in your next posting, an optimization would be to
check the existence for the implementation of the property
"HTMLElement.textContent" only once instead of with each text content
evaluation.

TJ, thanks for the solution propagating the DOM tree with the nice
recursive textValueCollector() function. In the 90% case, only one
descent will be necessary. This makes your solution even better: It
works with satisfactory performance for complex text contents, and it
returns quickly in the 90% case.

Thanks and regards,
Rüdiger

--
You received this message because you are subscribed to the Google Groups "Prototype & script.aculo.us" group.
To post to this group, send email to prototype-s...@googlegroups.com.
To unsubscribe from this group, send email to prototype-scripta...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/prototype-scriptaculous?hl=en.

T.J. Crowder

unread,

Apr 21, 2010, 11:22:09 AM4/21/10

to Prototype & script.aculo.us

Hi,

On Apr 21, 3:38 pm, Rüdiger Plantiko <ruediger.plant...@astrotexte.ch>
wrote:

> > Surprisingly, IE8 still doesn't support it, but even if it did,
> > frankly it doesn't do what _I'd_ want.
>
> Does IE8 claim to be HTML5 compliant?

I doubt it (it would be quite a trick as the HTML5 working group
hasn't even stopped accepting new proposals yet), but the second[1] of
my textContent references was from six years ago. Microsoft's brought
out two major revisions since then, and yet...

> If somebody really is annoyed by this drawback, he could write
>
> $('test').textContent || $('test').innerText || ""

See Yuriy's links for why those aren't the same thing. :-)

> TJ, thanks for the solution propagating the DOM tree with the nice
> recursive textValueCollector() function. In the 90% case, only one
> descent will be necessary. This makes your solution even better: It
> works with satisfactory performance for complex text contents, and it
> returns quickly in the 90% case.

Thanks. Yeah, actually, it's really quick even in the more complex
case, too. I was surprised and pleased by that. :-)

[1] http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent

--
T.J. Crowder
Independent Software Consultant
tj / crowder software / com
www.crowdersoftware.com

On Apr 21, 3:38 pm, Rüdiger Plantiko <ruediger.plant...@astrotexte.ch>
wrote:

Reply all

Reply to author

Forward