is there a cross-browser function for retrieving the text content of
an element?
If I have an element like
<span id="test">4711</span>
I get the number 4711 in IE with $("test").innerText and in FF with $
("test").textContent - does Prototype provide a browser-independent
abstraction for this?
Regards,
Rüdiger
> I get the number 4711 in IE with $("test").innerText and in FF with $
> ("test").textContent - does Prototype provide a browser-independent
> abstraction for this?
Hopefully you get the *string* "4711" rather than the number 4711
(unless you parse it). :-)
`innerHTML` works on all major browsers. It was introduced by IE back
in v5 or so, supported by every major browser, and is now standardized
in the HTML5 stuff. Of course, it returns the HTML rather than the
text, so given:
<span id="test"><em>4711</em></span>
$('test').innerHTML will return "<em>4711</em>". If you want just the
text with the tags stripped away, Prototype adds `stripTags`[1] to the
`String` prototype, so $('test').innerHTML.stripTags() will return
"4711".
If you wanted to shorten that a bit, you could add a `text` function
via Element.addMethods[2]:
Element.addMethods({
text: function(element) {
if (!(element = $(element))) return;
return element.innerHTML.stripTags();
}
});
(That's off-the-cuff, but I think it's correct; more on the addMethods
page.)
[1] http://api.prototypejs.org/language/string/prototype/striptags/
[2] http://api.prototypejs.org/dom/element/addmethods/
HTH,
--
T.J. Crowder
Independent Software Consultant
tj / crowder software / com
www.crowdersoftware.com
On Apr 12, 2:05 pm, Rüdiger Plantiko <ruediger.plant...@astrotexte.ch>
wrote:
> > I get the number 4711 in IE with $("test").innerText and in FF with $
> > ("test").textContent - does Prototype provide a browser-independent
> > abstraction for this?
>
> Hopefully you get the *string* "4711" rather than the number 4711
> (unless you parse it). :-)
You are right, in a posting every word is important, in order
to avoid misunderstandings. So, yes: I am getting the string, not the
number.
> ... innerHTML ...
yeah, if the document structure guarantees to me that the element in
question only contains a text node, then I could use innerHTML
equivalently
to innerText/textContent.
> Element.addMethods({
> text: function(element) {
> if (!(element = $(element))) return;
> return element.innerHTML.stripTags();
> }
>
> });
thanks for the reference to String.stripTags() - I hadn't realized the
existence of
such a function before.
- Regards,
Rüdiger
If textContent is defined, it is used. if it isn't, innerText is used.
This would be a lot faster than stripping tags.
The only drawback I can think of is that sometime you may get an
undefined instead of an empty string (which would be equivalent to
false).
Eric
On Apr 12, 9:35 pm, Rüdiger Plantiko <ruediger.plant...@astrotexte.ch>
wrote:
Here is the correct message (please ignore the previous one)
On Apr 12, 7:04 pm, "T.J. Crowder" <t...@crowdersoftware.com> wrote:
> Element.addMethods({
> text: function(element) {
> if (!(element = $(element))) return;
> return element.innerHTML.stripTags();
> }
> });
wouldn't it be wiser to check for the native method once and use it?
Something like (untested)
Element.addMethods({
text: ($$('BODY').first().textContent===undefined)
? function(element) { if (!(element = $(element))) return;
return element.innerText; }
: function(element) { if (!(element = $(element))) return;
return element.textContent; }
});
Eric
NB: I know, the testing condition is ugly... feel free to post a
better one :o)
Probably. I'd also check for innerText (in fact, I'd check for that
first), since it's supported by IE, WebKit (so Chrome, Safari), and
Opera; only Mozilla holds out. textContent is supported by all of them
except IE. So:
Element.addMethods((function() {
return {
/**
* Element.text() -> String
*
* Gets the text within the element, ignoring any tags
(essentially the sum of all of the
* text nodes within).
**/
text: (function() {
var element, testvalue;
element = document.createElement("span");
element.innerHTML = testvalue = "foo";
if (text_fromInnerText(element) == testvalue) {
return text_fromInnerText;
}
if (text_fromTextContent(element) == testvalue) {
return text_fromTextContent;
}
return text_fromStripping;
})()
};
// Get the element's inner text via innerText if available (IE,
WebKit, Opera, ...)
function text_fromInnerText(element) {
if (!(element = $(element))) return;
return element.innerText;
}
// Get the element's inner text via textContent if available
(Gecko, WebKit, Opera, ...)
function text_fromTextContent(element) {
if (!(element = $(element))) return;
return element.textContent;
}
// Get the element's inner text by getting innerHTML and stripping
tags (fallback)
function text_fromStripping(element) {
if (!(element = $(element))) return;
return element.innerHTML.stripTags();
}
})());
Do people think I should submit this to core? jQuery has an equivalent
function, and I think I saw one in Closure as well. So it's not just
the OP who wants to do this...
-- T.J. :-)
I still think that it's not a trivial solution (for the reasons
outlined in the post linked above) and so is best handled by a
standalone plugin. And using context-unaware `stripTags` on something
like HTML is usually asking for trouble :) (imagine what stripTags
would do to a string like this — "foo bar <script>function wrap(html)
{ return 'div' + html + '</div>'}</script> baz"; and then there are
other elements with CDATA content model, like STYLE)
--
kangax
On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote:
> We've been getting these requests in the past. Take a look at, for
> example: <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...>
>
> I still think that it's not a trivial solution (for the reasons
> outlined in the post linked above)...
"Oh my sweet Lord in heaven!" he exclaimed, after reading the
stackoverflow answer linked from the above and seeing all of the
myriad inconsistencies.
Blech. That's useful to know, thanks.
jQuery seems to go with the collect-all-text-nodes answer, completely
ignoring innerText and textContext, presumably for these very reasons.
It also fails to strip the content of script tags (like FF's
textContext does), which seems odd (doesn't look very optimized,
either, but perhaps it's fast enough without).
But I would argue that these reasons are exactly why Prototype should
have this feature. This is Prototype's raison d'etre, smoothing out
various browser differences (and outright insanities, such as
including script contents!).
I coded up a simple text node gatherer[1] that omits the contents of
script elements, and the performance isn't bad at all. Even using the
slowest major browser, it happily gave me the 12k of text content in a
moderately-complex page (various menus and controls, plus a 580-row 3-
column table containing links) in about a third of a second on my
little Atom-class netbook.
I created a bookmarklet[2] of it that reports character count, time,
and such and ran it against a large, complex document (the current all-
in-one-page HTML5 specification[3]) using Chrome, which gave me all
2,090,693 characters spread across 86,018 elements (I didn't count all
nodes, just elements) in just under two seconds (again on the
netbook). Firefox did the same in just under three seconds, and IE7
(after taking several *minutes* -- and several script errors -- just
to load the document) ran the bookmarklet in 12.5 seconds. Pretty
decent for IE. :-) The character counts were identical between Chrome
and Firefox; IE saw slightly fewer characters (1,891,293) and elements
(85,972), but that could have been down to the script errors. Firefox
reported one fewer element than Chrome.
I haven't particularly tested or optimized that code, it's just a
starting point. It builds things up in an array and uses #join at the
end, which is probably slower for small tasks than jQuery's approach
(string concatenation), but probably faster for large tasks (like the
HTML spec). I say "probably" in each case because I haven't tested,
and I've learned not to make performance assertions without data. :-)
[1] http://pastie.org/917566 (also quoted inline below)
[2] http://pastie.org/917567
[3] http://www.w3.org/TR/html5/Overview.html (warning: *LARGE*
document)
Code from [1] pasted inline:
* * * *
Element.addMethods((function() {
/**
* Element.textValue() -> String
*
* Gets the text within the element, ignoring any tags; e.g.,
returns the sum of all of the
* text nodes. Omits the text nodes within `script` elements.
**/
function textValue(element) {
if (!(element = $(element))) return;
var collector = [];
textValueCollector(element, collector);
return collector.join("");
}
function textValueCollector(element, collector) {
var node;
for (node = element.firstChild; node; node = node.nextSibling)
{
switch (node.nodeType) {
case 3: // text
case 4: // cdata
collector.push(node.nodeValue);
break;
case 8: // comment
break;
case 1: // element
if (node.tagName == 'SCRIPT') {
break;
}
// FALL THROUGH TO DEFAULT
default:
// Descend
textValueCollector(node, collector);
break;
}
}
}
return {textValue: textValue};
})());
* * * *
-- T.J. :-)
On Apr 13, 2:42 pm, kangax <kan...@gmail.com> wrote:
> We've been getting these requests in the past. Take a look at, for
> example: <URL:http://groups.google.com/group/prototype-core/browse_thread/thread/8e...>
http://www.w3.org/TR/html5/infrastructure.html#textcontent
http://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent
Surprisingly, IE8 still doesn't support it, but even if it did,
frankly it doesn't do what _I'd_ want.
-- T.J.
> [1]http://pastie.org/917566(also quoted inline below)
> [2]http://pastie.org/917567
> [3]http://www.w3.org/TR/html5/Overview.html(warning: *LARGE*