Need something like doc.getElementById().innerHTML but want everything between body tags

gimme_this...@yahoo.com

unread,

Jan 23, 2009, 9:08:56 PM1/23/09

to

Is there a document command that returns the text between body tags -
something similar to getElementById("myid").innerHTML ?

If you're wondering why I need it. I'm scripting an
InternetExplorer.Application object and parsing the text on a page.

It's not an InternetExplorer.Application question though.

It's a "how do you get all the text between body tags" question.

Thanks.

RobG

unread,

Jan 23, 2009, 10:28:08 PM1/23/09

to

On Jan 24, 12:08 pm, "gimme_this_gimme_t...@yahoo.com"

Use something like:

function getText(el) {
if (typeof el.textContent == 'string') return el.textContent;
if (typeof el.innerText == 'string') return el.innerText;
}

Then call it with:

alert(getText(document.body));

Note that it will also return the content of script elements in some
browsers. Also, some browsers support both innerText and textContent
and may return a different value for each.

--
Rob

Martin Honnen

unread,

Jan 24, 2009, 6:37:47 AM1/24/09

to

gimme_this...@yahoo.com wrote:
> Is there a document command that returns the text between body tags -
> something similar to getElementById("myid").innerHTML ?
>
> If you're wondering why I need it. I'm scripting an
> InternetExplorer.Application object and parsing the text on a page.

You want
document.body.innerHTML
or
document.body.innerText
depending on whether you want the text only (innerText) or the markup
(as serialized by innerHTML).

--

Martin Honnen
http://JavaScript.FAQTs.com/

kangax

unread,

Jan 25, 2009, 12:37:48 AM1/25/09

to

RobG wrote:
[...]

> Note that it will also return the content of script elements in some
> browsers. Also, some browsers support both innerText and textContent
> and may return a different value for each.

I just noticed that Safari 2 - 2.0.4 (and maybe earlier) actually has
`innerText` but that `innerText` is always an empty string : /

Apparently, simple `typeof` check is not enough (as we could at least
fall back to manual descendants traversal/innerHTML in cases when
`innerText` is faulty)

But then, the whole thing becomes quite monstrous in size:

And we need to take care of script elements' content as well.

--
kangax

kangax

unread,

Jan 25, 2009, 2:38:14 AM1/25/09

to

kangax wrote:
> RobG wrote:
> [...]
>> Note that it will also return the content of script elements in some
>> browsers. Also, some browsers support both innerText and textContent
>> and may return a different value for each.
>
> I just noticed that Safari 2 - 2.0.4 (and maybe earlier) actually has
> `innerText` but that `innerText` is always an empty string : /

My mistake.

Safari *does* return "proper" `innerText` but only if an element is
neither hidden nor orphaned (there was an element with "display:none" in
my test). Considering these 2 "issues", the method, unfortunately,
becomes even more complex (still without proper script element handling).

I'm not sure if such "forking" is worth the trouble. Recursively
collecting node values would produce more consistent results, albeit
being slower.

var getInnerText = (function(){
function f(element) {
return null;
}
if (document.createElement && document.createTextNode) {

var root = (document.body || document.documentElement);

var testee = document.createElement('div');
var textNode = document.createTextNode('x');

if (root && testee && testee.appendChild && textNode) {
// Safari 2.x returns empty string as `innerText`
// of an orphaned element
// so we append it to a document temporarily
testee.appendChild(textNode);
root.appendChild(testee);

if (typeof testee.textContent == 'string' &&

testee.textContent == 'x') {

f = function(element) {
return element.textContent;
}
} else if (typeof testee.innerText == 'string' &&

testee.innerText == 'x') {

// store
var value = testee.style.display;
testee.style.display = 'none';
// test
var HIDDEN_ELEMENTS_ARE_BUGGY = (testee.innerText !== 'x');
// restore
testee.style.display = value;

if (HIDDEN_ELEMENTS_ARE_BUGGY) {
f = function(element) {
var el = element, values = [ ];
// display all ancestors
while (element && element.style) {
values.push(element.style.display);
element.style.display = '';
element = element.parentNode;
}
// get value
var s = el.innerText;
// restore ancestors' display value
while (element && element.style) {
element.style.display = values.shift();
element = element.parentNode;
}
return s;
}
}
else {

f = function(element) {
return element.innerText;
}
}
} else {
f = function(element) {
function getChildren(parent) {
var child = parent.firstChild;
while (child) {
if (child.nodeType == 3) {
result.push(child.nodeValue);
}
else {
getChildren(child);
}
child = child.nextSibling;
}
}
var result = [];
getChildren(element);
return result.join('');
}
}

root.removeChild(testee);

testee = null;
}
}
return f;
})();

--
kangax

dhtml

unread,

Jan 25, 2009, 4:13:59 AM1/25/09

to

On Jan 24, 11:38 pm, kangax <kan...@gmail.com> wrote:
> kangax wrote:
> > RobG wrote:
> > [...]
> >> Note that it will also return the content of script elements in some
> >> browsers. Also, some browsers support both innerText and textContent
> >> and may return a different value for each.
>
> > I just noticed that Safari 2 - 2.0.4 (and maybe earlier) actually has
> > `innerText` but that `innerText` is always an empty string : /
>
> My mistake.
>
> Safari *does* return "proper" `innerText` but only if an element is
> neither hidden nor orphaned (there was an element with "display:none" in
> my test). Considering these 2 "issues", the method, unfortunately,
> becomes even more complex (still without proper script element handling).
>

Safari had issues with hidden elements in version 2 (Safari 2).

> I'm not sure if such "forking" is worth the trouble. Recursively
> collecting node values would produce more consistent results, albeit
> being slower.
>

There are bugs with Safari 2 and hidden elements (including
visibility:
hidden and display: none). If this presents a problem, there are other
ways around it.

For example, the element that needed to be hidden could be positioned
outside the viewport, instead of using visibility: hidden. Another
alternative would be to set the element's visibility to "visible"just
prior to reading the innerText.

A saved reference to the property is faster.

var dom = {};

dom.textContent = "textContent" in document.documentElement ?
"textContent" : "innerText";

alert( el[dom.textContent] );

Note: Posting failed on the news server I usually use:
nntp.motzarella.org

Garrett

kangax

unread,

Jan 25, 2009, 12:00:22 PM1/25/09

to

dhtml wrote:
[...]

> There are bugs with Safari 2 and hidden elements (including
> visibility:
> hidden and display: none). If this presents a problem, there are other
> ways around it.
>
> For example, the element that needed to be hidden could be positioned
> outside the viewport, instead of using visibility: hidden. Another

That would confuse screen readers, which skip `display:none` content but
announce the one that's merely positioned outside of the viewport.

> alternative would be to set the element's visibility to "visible"just
> prior to reading the innerText.

Yes, and not only on element but on all of its ancestors as well (for
obvious reasons). Now that you mentioned `visibility:hidden` (which, as
I just tested, does indeed prevent proper `innerText`) we would also
need to take care of that while traversing ancestors. Each of the
ancestor's `visibility` style values should be saved and then restored -
just like with `display` style values.

I really don't like how complex this solution becomes.

>
> A saved reference to the property is faster.
>
> var dom = {};
>
> dom.textContent = "textContent" in document.documentElement ?
> "textContent" : "innerText";
>
> alert( el[dom.textContent] );

Interesting. I would expect the opposite - after all `dom.textContent`
needs to be resolved to a string before property lookup occurs
(in `el[dom.textContent]`)

--
kangax

Garrett Smith

unread,

Jan 25, 2009, 4:08:20 PM1/25/09

to

kangax wrote:
> dhtml wrote:
> [...]
>> There are bugs with Safari 2 and hidden elements (including
>> visibility:
>> hidden and display: none). If this presents a problem, there are other
>> ways around it.
>>
>> For example, the element that needed to be hidden could be positioned
>> outside the viewport, instead of using visibility: hidden. Another
>
> That would confuse screen readers, which skip `display:none` content but
> announce the one that's merely positioned outside of the viewport.
>

I see. So the alternatives are:
visibility: hidden
- problem: innerText is "" in Safari 2
position away from view
- problem: a screen reader will read it

>> alternative would be to set the element's visibility to "visible"just
>> prior to reading the innerText.
>
> Yes, and not only on element but on all of its ancestors as well (for
> obvious reasons). Now that you mentioned `visibility:hidden` (which, as
> I just tested, does indeed prevent proper `innerText`) we would also
> need to take care of that while traversing ancestors. Each of the
> ancestor's `visibility` style values should be saved and then restored -
> just like with `display` style values.
>
> I really don't like how complex this solution becomes.
>

I don't either.

I would address the problem if and when it arises. It could be
special-cased to first show the div, then grab its innerText.

I understand that if the constraint involved potentially unknown
ancestors that this would pose a problem.

>>
>> A saved reference to the property is faster.
>>
>> var dom = {};
>>
>> dom.textContent = "textContent" in document.documentElement ?
>> "textContent" : "innerText";
>>
>> alert( el[dom.textContent] );
>
> Interesting. I would expect the opposite - after all `dom.textContent`
> needs to be resolved to a string before property lookup occurs
> (in `el[dom.textContent]`)
>
>

Calling a function would be slower than finding dom.textContent.

A string literal "textContent" would be faster, but would fail in IE.

In scope, a local reference could be saved:-

(function(){

var dom = Lib.dom,
textContent = dom.textContent;
//...
})();

If the script is minified, and the local var |textContent| is used many
times, the renaming of textContent to a one-letter identifier would also
have the effect of making the script a bit smaller.

--
comp.lang.javascript FAQ <URL: http://jibbering.com/faq/ >