If you're wondering why I need it. I'm scripting an
InternetExplorer.Application object and parsing the text on a page.
It's not an InternetExplorer.Application question though.
It's a "how do you get all the text between body tags" question.
Thanks.
Use something like:
function getText(el) {
if (typeof el.textContent == 'string') return el.textContent;
if (typeof el.innerText == 'string') return el.innerText;
}
Then call it with:
alert(getText(document.body));
Note that it will also return the content of script elements in some
browsers. Also, some browsers support both innerText and textContent
and may return a different value for each.
--
Rob
You want
document.body.innerHTML
or
document.body.innerText
depending on whether you want the text only (innerText) or the markup
(as serialized by innerHTML).
--
Martin Honnen
http://JavaScript.FAQTs.com/
I just noticed that Safari 2 - 2.0.4 (and maybe earlier) actually has
`innerText` but that `innerText` is always an empty string : /
Apparently, simple `typeof` check is not enough (as we could at least
fall back to manual descendants traversal/innerHTML in cases when
`innerText` is faulty)
But then, the whole thing becomes quite monstrous in size:
var getInnerText = (function(){
function f(element) {
return null;
}
if (document.createElement && document.createTextNode) {
var testee = document.createElement('div');
var textNode = document.createTextNode('x');
if (testee && testee.appendChild && textNode) {
testee.appendChild(textNode);
if (typeof testee.textContent == 'string' &&
testee.textContent === 'x') {
f = function(element) {
return element.textContent;
}
} else if (typeof testee.innerText == 'string' &&
testee.innerText === 'x') {
f = function(element) {
return element.innerText;
}
} else {
f = function(element) {
function getChildren(parent) {
var child = parent.firstChild;
while (child) {
if (child.nodeType == 3) {
result.push(child.nodeValue);
}
else {
getChildren(child);
}
child = child.nextSibling;
}
}
var result = [];
getChildren(element);
return result.join('');
}
}
testee = null;
}
}
return f;
})();
And we need to take care of script elements' content as well.
--
kangax
My mistake.
Safari *does* return "proper" `innerText` but only if an element is
neither hidden nor orphaned (there was an element with "display:none" in
my test). Considering these 2 "issues", the method, unfortunately,
becomes even more complex (still without proper script element handling).
I'm not sure if such "forking" is worth the trouble. Recursively
collecting node values would produce more consistent results, albeit
being slower.
var getInnerText = (function(){
function f(element) {
return null;
}
if (document.createElement && document.createTextNode) {
var root = (document.body || document.documentElement);
var testee = document.createElement('div');
var textNode = document.createTextNode('x');
if (root && testee && testee.appendChild && textNode) {
// Safari 2.x returns empty string as `innerText`
// of an orphaned element
// so we append it to a document temporarily
testee.appendChild(textNode);
root.appendChild(testee);
if (typeof testee.textContent == 'string' &&
testee.textContent == 'x') {
f = function(element) {
return element.textContent;
}
} else if (typeof testee.innerText == 'string' &&
testee.innerText == 'x') {
// store
var value = testee.style.display;
testee.style.display = 'none';
// test
var HIDDEN_ELEMENTS_ARE_BUGGY = (testee.innerText !== 'x');
// restore
testee.style.display = value;
if (HIDDEN_ELEMENTS_ARE_BUGGY) {
f = function(element) {
var el = element, values = [ ];
// display all ancestors
while (element && element.style) {
values.push(element.style.display);
element.style.display = '';
element = element.parentNode;
}
// get value
var s = el.innerText;
// restore ancestors' display value
while (element && element.style) {
element.style.display = values.shift();
element = element.parentNode;
}
return s;
}
}
else {
f = function(element) {
return element.innerText;
}
}
} else {
f = function(element) {
function getChildren(parent) {
var child = parent.firstChild;
while (child) {
if (child.nodeType == 3) {
result.push(child.nodeValue);
}
else {
getChildren(child);
}
child = child.nextSibling;
}
}
var result = [];
getChildren(element);
return result.join('');
}
}
root.removeChild(testee);
testee = null;
}
}
return f;
})();
--
kangax
Safari had issues with hidden elements in version 2 (Safari 2).
> I'm not sure if such "forking" is worth the trouble. Recursively
> collecting node values would produce more consistent results, albeit
> being slower.
>
There are bugs with Safari 2 and hidden elements (including
visibility:
hidden and display: none). If this presents a problem, there are other
ways around it.
For example, the element that needed to be hidden could be positioned
outside the viewport, instead of using visibility: hidden. Another
alternative would be to set the element's visibility to "visible"just
prior to reading the innerText.
A saved reference to the property is faster.
var dom = {};
dom.textContent = "textContent" in document.documentElement ?
"textContent" : "innerText";
alert( el[dom.textContent] );
Note: Posting failed on the news server I usually use:
nntp.motzarella.org
Garrett
That would confuse screen readers, which skip `display:none` content but
announce the one that's merely positioned outside of the viewport.
> alternative would be to set the element's visibility to "visible"just
> prior to reading the innerText.
Yes, and not only on element but on all of its ancestors as well (for
obvious reasons). Now that you mentioned `visibility:hidden` (which, as
I just tested, does indeed prevent proper `innerText`) we would also
need to take care of that while traversing ancestors. Each of the
ancestor's `visibility` style values should be saved and then restored -
just like with `display` style values.
I really don't like how complex this solution becomes.
>
> A saved reference to the property is faster.
>
> var dom = {};
>
> dom.textContent = "textContent" in document.documentElement ?
> "textContent" : "innerText";
>
> alert( el[dom.textContent] );
Interesting. I would expect the opposite - after all `dom.textContent`
needs to be resolved to a string before property lookup occurs
(in `el[dom.textContent]`)
--
kangax
I see. So the alternatives are:
visibility: hidden
- problem: innerText is "" in Safari 2
position away from view
- problem: a screen reader will read it
>> alternative would be to set the element's visibility to "visible"just
>> prior to reading the innerText.
>
> Yes, and not only on element but on all of its ancestors as well (for
> obvious reasons). Now that you mentioned `visibility:hidden` (which, as
> I just tested, does indeed prevent proper `innerText`) we would also
> need to take care of that while traversing ancestors. Each of the
> ancestor's `visibility` style values should be saved and then restored -
> just like with `display` style values.
>
> I really don't like how complex this solution becomes.
>
I don't either.
I would address the problem if and when it arises. It could be
special-cased to first show the div, then grab its innerText.
I understand that if the constraint involved potentially unknown
ancestors that this would pose a problem.
>>
>> A saved reference to the property is faster.
>>
>> var dom = {};
>>
>> dom.textContent = "textContent" in document.documentElement ?
>> "textContent" : "innerText";
>>
>> alert( el[dom.textContent] );
>
> Interesting. I would expect the opposite - after all `dom.textContent`
> needs to be resolved to a string before property lookup occurs
> (in `el[dom.textContent]`)
>
>
Calling a function would be slower than finding dom.textContent.
A string literal "textContent" would be faster, but would fail in IE.
In scope, a local reference could be saved:-
(function(){
var dom = Lib.dom,
textContent = dom.textContent;
//...
})();
If the script is minified, and the local var |textContent| is used many
times, the renaming of textContent to a one-letter identifier would also
have the effect of making the script a bit smaller.
--
comp.lang.javascript FAQ <URL: http://jibbering.com/faq/ >