Convert an HTTP responseText into a DOM document object

Water Cooler v2

unread,

Nov 2, 2007, 12:42:16 PM11/2/07

to

I send an XmlHttpRequest and GET a response back. Is it possible to
convert that response into a DOM Level 3 HTML 'document' object and
investigate it?

Martin Honnen

unread,

Nov 2, 2007, 12:54:31 PM11/2/07

to

The response is parsed into a DOM document if the server sends it as
text/xml or application/xml. Then the responseXML property is populated
as a DOM document. Whether that is DOM Level 3 document depends on the
browser (Opera 9 has some DOM Level 3 support).
I don't think any browser currently allows you to take a string (e.g.
responseText) and parse it into a HTML DOM document, unless you
document.wrote it to a frame window.

--

Martin Honnen
http://JavaScript.FAQTs.com/

Thomas 'PointedEars' Lahn

unread,

Nov 8, 2007, 6:38:26 AM11/8/07

to

Martin Honnen wrote:
> Water Cooler v2 wrote:
>> I send an XmlHttpRequest and GET a response back. Is it possible to
>> convert that response into a DOM Level 3 HTML 'document' object and
>> investigate it?
>
> The response is parsed into a DOM document if the server sends it as
> text/xml or application/xml. Then the responseXML property is populated
> as a DOM document. Whether that is DOM Level 3 document depends on the
> browser (Opera 9 has some DOM Level 3 support).

As have Gecko-based UAs since a while.

http://developer.mozilla.org/en/docs/DOM_Levels#DOM_Level_3

> I don't think any browser currently allows you to take a string (e.g.
> responseText) and parse it into a HTML DOM document, unless you
> document.wrote it to a frame window.

Your own FAQ entry says that it is possible to parse the response into an
X(HT)ML document with MSXML and with Gecko's DOMParser::parseFromString(),
which could suffice here:

http://www.faqts.com/knowledge_base/entry/versions/index.phtml?aid=15302

PointedEars
--
"Use any version of Microsoft Frontpage to create your site. (This won't
prevent people from viewing your source, but no one will want to steal it.)"
-- from <http://www.vortex-webdesign.com/help/hidesource.htm>

Darko

unread,

Nov 8, 2007, 7:14:03 PM11/8/07

to

Collecting different pieces of code for a while, removing good from
bad etc.,
now I use the following function which I hope (and think) you'll find
of use. Just
hand it the ajax object (after it had sent the request and the
readyState had been
4), and you'll be returned the document. If I copied it correctly,
then it will work like
it does on my site, and that is - supporting Opera, IE and Mozilla.

function getXMLDocument( ajax )
{
if (typeof DOMParser == "undefined") {
DOMParser = function()
{};

DOMParser.prototype.parseFromString = function(str, contentType)
{
if (typeof ActiveXObject != "undefined") {
var doc = new ActiveXObject("MSXML.DomDocument");
doc.loadXML(str);
return doc;
} else if ( typeof XMLHttpRequest != "undefined" ) {
var req = new XMLHttpRequest();
req.open("GET", "data:" + (contentType || "application/xml") +
";charset=utf-8," + encodeURIComponent(str), false);
if ( req.overrideMimeType )
req.overrideMimeType(contentType);
req.send(null);
return req.responseXML;
} else
throw new FatalException( "Can't find a valid xml parser",
"AJAX::getXMLDocument()" );
}
}
var strDocument = ajax.responseText;
var xmlDocument = ajax.responseXML;
try {
if( ! xmlDocument || xmlDocument.childNodes.length === 0 )
xmlDocument = (new DOMParser()).parseFromString( strDocument,
"application/xml" );
return xmlDocument;
} catch( e ) {
return null;
}
}

Regards

Daniel Webb

unread,

Jan 27, 2011, 8:51:33 AM1/27/11

to

Thanks for this code, Дарко. I've found that adding the line:

doc.validateOnParse = false;

after

var doc = new ActiveXObject("MSXML.DomDocument");

improves the success rate of the loadXML call. In my case, I found that html with a <!DOCTYPE html> doctype was causing loadXML to return null when validateOnParse was enabled.

Seeing as how your post was written 4 odd years ago in 2007, I'm sure there's an easier way to do the same thing now, but it's working for me in ie8/7/chrome/firefox.

Regards

Thomas 'PointedEars' Lahn

unread,

Jan 27, 2011, 9:08:26 AM1/27/11

to

Daniel Webb wrote:

> Thanks for this code, Дарко. I've found that adding the line:
>
> doc.validateOnParse = false;
>
> after
> var doc = new ActiveXObject("MSXML.DomDocument");
>
> improves the success rate of the loadXML call. In my case, I found that
> html with a <!DOCTYPE html> doctype was causing loadXML to return null
> when validateOnParse was enabled.

Sure, because HTML is not X(HT)ML. Web 101.

> Seeing as how your post was written 4 odd years ago in 2007, I'm sure
> there's an easier way to do the same thing now, but it's working for me in
> ie8/7/chrome/firefox.

Perhaps you would care to explain who and what you are referring to?

<http://jibbering.com/faq/#posting>

PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.
-- Richard Cornford, cljs, <cife6q$253$1$8300...@news.demon.co.uk> (2004)

opensource junkie

unread,

Jan 27, 2011, 11:47:10 PM1/27/11

to

On Jan 27, 9:08 am, Thomas 'PointedEars' Lahn <PointedE...@web.de>
wrote:

> -- Richard Cornford, cljs, <cife6q$253$1$8300d...@news.demon.co.uk> (2004)

For anyone who stumbles upon this thread:
I searched the web for about an hour, looking for a function that
would simply convert an html string to a DOM tree. Seemed like
something that should be in native javascript, but no such luck.
After trying a few workarounds that didn't quite work for my
situation, I finally gave up on finding a solution, and decided to
write my own.

The code can be found at http://www.terrasketch.com/export/DomConverter.js.
It's an html parser that's far from perfect (at the moment, it doesn't
work with html comments, for example), and I've only really tested it
for my own needs. Thus it may be a little buggy, but if someone else
can use it, please do.

Any bug reports, comments, or modifications can be sent to
opensour...@gmail.com.
Regards,
-- Nathanael Schmolze

Evertjan.

unread,

Jan 28, 2011, 8:51:16 AM1/28/11

to

opensource junkie wrote on 28 jan 2011 in comp.lang.javascript:

> For anyone who stumbles upon this thread:

Stumbles? Newsgroups like c.l.j. are rather specific,
and have an audience that doesn't just stumble..

> I searched the web for about an hour, looking for a function that
> would simply convert an html string to a DOM tree. Seemed like

> something that should be in native javascript, ...

Native?

Like in "non-external"?

Why should it?

Javascript in itself has nothing to do with DOM-functions,
in other words: The DOM-functions are "external" to javascript,
and only available if javascript is used in a browser,
and even then browser dependent.

So the DOM-tree is also an effect of the specific browser.

Javascript can very well be used in non-browser applications, like in
cscript/wscript, as a serverside scripting language or in an editor.

--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)

opensource junkie

unread,

Jan 28, 2011, 11:24:13 AM1/28/11

to

that's a good point; I myself have used javascript to write asp applications, for instance. 'native to javascript' would be the wrong concept, but native to the DOM specifications for ECMAScript would be a more accurate description of my desire and frustration.

Pardon my imprecision,
-- Nate

Dr J R Stockton

unread,

Jan 29, 2011, 3:16:06 PM1/29/11

to

In comp.lang.javascript message <79c0ccf6-579e-4fa8-972a-996326e75abd@l1
8g2000yqm.googlegroups.com>, Thu, 27 Jan 2011 20:47:10, opensource
junkie <opensour...@gmail.com> posted:

>For anyone who stumbles upon this thread:
>I searched the web for about an hour, looking for a function that
>would simply convert an html string to a DOM tree. Seemed like
>something that should be in native javascript, but no such luck.
>After trying a few workarounds that didn't quite work for my
>situation, I finally gave up on finding a solution, and decided to
>write my own.
>
>The code can be found at http://www.terrasketch.com/export/DomConverter.js.
>It's an html parser that's far from perfect (at the moment, it doesn't
>work with html comments, for example), and I've only really tested it
>for my own needs. Thus it may be a little buggy, but if someone else
>can use it, please do.

Firstly, write the HTML string as an HTML file on your local machine.

Secondly, using another page on your local machine, read that file into
an iframe.

You then can access, from the outer page, the DOM free in that iframe,
and can traverse it as desired - unless you are using recent Chrome.

AFAIK, it may not be *necessary* to use an iframe. But iframes suit me
for this. The following pages on my site do so :
js-grphx.htm linxchek.htm pageindx.htm sitedata.htm

Page linxchek.htm, with ToID checked, now traverses the entire DOM tree
to list the IDs on the page. ANNOUNCE: So can handle pages like the CLJ
FAQ, which links internally to IDs rather than to traditional anchors.

AFAICS, one should be able to open the HTML string in a new window and
(which I have not tried) read it there.

One thing to mote : your code should give the same results on all
browsers, even with malformed HTML. The above approach gives the DOM
tree obtained by fudging all possible HTML errors, and that is likely to
be browser-dependent.

If, in a local copy of my page js-quick.htm , one puts
F.innerHTML = document.body.innerHTML
in the obvious textarea and presses the nearby Eval button, one gets a
page visibly including a copy of its previous self. That suggests that
you only need to assign your string to an innerHTML property and it will
turn into a DOM tree.

One then wonders : if your string is imperfect HTML, will it read back
as written or as fudged?

ASIDE : in recent Chrome, including current 8.0, a page loaded from my
local machine can visibly show a local text file in an iframe;
but trying to read Fram.contentDocument.body gives an error,
preceded by a warning :
"Unsafe JavaScript attempt to access frame with URL
file:///myob/$dir.txt from frame with URL
file:///myob/linxchek.htm#ToF.
Domains, protocols and ports must match.
TypeError: Cannot read property 'body' of undefined"

Further aside : there, myob is a mung.

I can find no working way of reporting this fault, which my
other four browsers lack, to Google. (I do now catch it in
linxchek; but the only work-round is to suggest a different
browser,)

--
(c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
Web <http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)

Gildas

unread,

Jan 31, 2011, 9:07:30 AM1/31/11

to

On chrome, I use document.implementation.createHTMLDocument() and
document.write().

Thomas 'PointedEars' Lahn

unread,

Feb 4, 2011, 6:35:16 PM2/4/11

to

Gildas wrote:

> On chrome, I use document.implementation.createHTMLDocument() and
> document.write().

This reads like a quite pointless, mystical incantation of a DOM API instead
of something based on informed DOM use. It certainly must strongly be
recommended against.

It is the nature of document.write() to allow (for the time being) a
document to be written that is not driven by a DTD, let alone well-formed.
The W3C DOM Level 2 HTML Specification explicitly says that it would "Write
a string of text to a document stream opened by open()":

<http://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-75233634>

HTMLDOMImplementation::createHTMLDocument(), which had been *removed* from
the W3C DOM Level 2 Specification for the Proposed Recommendation in favor
of DOMImplementation::createDocument() already long ago (in 2000), was not
specified to call HTMLDocument::open().

<http://www.w3.org/TR/2000/CR-DOM-Level-2-20000510/html.html#ID-HTML-DOM>
<http://www.w3.org/TR/2000/PR-DOM-Level-2-Core-20000927/>

Nor is that specified so in the HTML 5 *Draft*. However, that draft is the
first Specification to state that open() must be called when write() is
called and it is not used within the current stream. Insofar that *might*
work for the time being. However, implementations are still unstable and
even the ever-changing WHATWG HTML draft is wise enough to warn against
document.write() being used:

<http://www.whatwg.org/specs/web-apps/current-work/multipage/apis-in-html-
documents.html#dom-document-write>

Further, the return value of document.implementation.createHTMLDocument()
had nothing to do with the value of the `document' property, nor could a
HTMLDocument be inserted into an existing document in a standards-compliant
way (you would need a W3C DOM Level 3 Core DocumentFragment, and
importNode(), for that instead).

PointedEars
--
Danny Goodman's books are out of date and teach practices that are
positively harmful for cross-browser scripting.

-- Richard Cornford, cljs, <cife6q$253$1$8300...@news.demon.co.uk> (2004)

Gildas

unread,

Feb 5, 2011, 9:07:25 PM2/5/11

to

On 5 fév, 00:35, Thomas 'PointedEars' Lahn <PointedE...@web.de> wrote:
> Gildas wrote:
> > On chrome, I use document.implementation.createHTMLDocument() and
> > document.write().
>
> This reads like a quite pointless, mystical incantation of a DOM API instead
> of something based on informed DOM use. It certainly must strongly be
> recommended against.
>
> It is the nature of document.write() to allow (for the time being) a
> document to be written that is not driven by a DTD, let alone well-formed.
> The W3C DOM Level 2 HTML Specification explicitly says that it would "Write
> a string of text to a document stream opened by open()":
>
> <http://www.w3.org/TR/DOM-Level-2-HTML/html.html#ID-75233634>
>
> HTMLDOMImplementation::createHTMLDocument(), which had been *removed* from
> the W3C DOM Level 2 Specification for the Proposed Recommendation in favor
> of DOMImplementation::createDocument() already long ago (in 2000), was not
> specified to call HTMLDocument::open().
>
> <http://www.w3.org/TR/2000/CR-DOM-Level-2-20000510/html.html#ID-HTML-DOM>
> <http://www.w3.org/TR/2000/PR-DOM-Level-2-Core-20000927/>
>
> Nor is that specified so in the HTML 5 *Draft*. However, that draft is the
> first Specification to state that open() must be called when write() is
> called and it is not used within the current stream. Insofar that *might*
> work for the time being. However, implementations are still unstable and
> even the ever-changing WHATWG HTML draft is wise enough to warn against
> document.write() being used:
>

> <http://www.whatwg.org/specs/web-apps/current-work/multipage/apis-in-h...

> documents.html#dom-document-write>
>
> Further, the return value of document.implementation.createHTMLDocument()
> had nothing to do with the value of the `document' property, nor could a
> HTMLDocument be inserted into an existing document in a standards-compliant
> way (you would need a W3C DOM Level 3 Core DocumentFragment, and
> importNode(), for that instead).
>

Of course, I use open and close methods too. I thought it was obvious.

Actually, I do not know any other way to parse any full HTML document
(with
doctype declaration) on chrome. I need to do this kind of code in an
extension
that saves any page on the web and displays saved pages for offline
reading.

Here's a sample code:
var doc = document.implementation.createHTMLDocument();
doc.open();
doc.writeln('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">');
doc.write('<html><head></head><body></body></html>');
doc.close();
console.log(doc.querySelector("body"));

With createDocument method, I would write:
var doc = document.implementation.createDocument("",
"HTML", document.implementation.createDocumentType("HTML",
"-//W3C//DTD HTML 4.01//EN",
"http://www.w3.org/TR/html4/strict.dtd"));

But then, how to parse '<html><head></head><body></body></html>'
content ?

Thomas 'PointedEars' Lahn

unread,

Feb 10, 2011, 2:34:35 PM2/10/11

to

Gildas wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Gildas wrote:
>> > On chrome, I use document.implementation.createHTMLDocument() and
>> > document.write().
>>

>> [document.write() can write documents that are not driven by a DTD, let
>> alone well-formed. W3C DOM Level 2 HTML: "Write a string of text to a
>> document stream opened by open()."]
>>
>> [HTMLDOMImplementation::createHTMLDocument() is OBSOLETE since 2000 CE]
>>
>> [Only HTML 5 *Draft* specifies open() to be called on write(), but
>> it recommends against using document.write().]

>>
>> Further, the return value of document.implementation.createHTMLDocument()
>> had nothing to do with the value of the `document' property, nor could a
>> HTMLDocument be inserted into an existing document in a
>> standards-compliant way (you would need a W3C DOM Level 3 Core
>> DocumentFragment, and importNode(), for that instead).
>
> Of course, I use open and close methods too. I thought it was obvious.

Still, those methods are not designed to operate in that context.

> Actually, I do not know any other way to parse any full HTML document
> (with doctype declaration) on chrome. I need to do this kind of code in an
> extension that saves any page on the web and displays saved pages for
> offline reading.
>
> Here's a sample code:
> var doc = document.implementation.createHTMLDocument();
> doc.open();
> doc.writeln('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd">');
> doc.write('<html><head></head><body></body></html>');
> doc.close();
> console.log(doc.querySelector("body"));
>
> With createDocument method, I would write:
> var doc = document.implementation.createDocument("",
> "HTML", document.implementation.createDocumentType("HTML",
> "-//W3C//DTD HTML 4.01//EN",
> "http://www.w3.org/TR/html4/strict.dtd"));

The first argument to DOMImplementation::createDocument() must be a
namespaceURI; since HTML does not support namespaces, the value must be
`null', not the zero-length string. Interestingly, though, the HTML5/WHATWG
HTML Working Draft specifies both DOMImplementation::createDocument() and
DOMImplementation::createHTMLDocument(), stating that createDocument()
should only be used for creating XML documents.

So you might have an HTMLDocument, then. You are still one or two steps
away from including the content of that document into another: in
particular, to include the content as rendered (if that was the objective;
in case you did not notice, you *cannot* insert one document into another.)

> But then, how to parse '<html><head></head><body></body></html>'
> content ?

A) You do not. You serve a HTML fragment, then assign to the `innerHTML'
property. While still being proprietary, it has the virtue of being
widely implemented and used, and defined in the January 13 HTML5 Working
Draft, and the February 5 HTML5 Editor's Draft, and on Last Call in
WHATWG HTML as of February 5, 2011. It does not look as if it was going
to go away anytime soon.

B) You do not. You serve JSON or JSON-like content instead, and let the
client-side script modify the document tree accordingly.

C) You use a markup parser. You could use an existing implementation,
or make use of your own one. Since built-in HTML parsers are AFAIK
non-existent, but there are built-in XML parsers, it would be
possible, with little effort and assuming the HTML markup would conform
to a Strict variant, to convert it to XHTML and let that be parsed. Then
you would have at least a Document object to work with.

Or you would use the `innerHTML' property with the serialized parse
result.

Your example could be parsed in Gecko-based, WebCore-based, and
KHTML-based browsers as follows:

var doc = (new DOMParser()).parseFromString(
"<html><head></head><body></body></html>",
"application/xhtml+xml");

PointedEars
--
realism: HTML 4.01 Strict
evangelism: XHTML 1.0 Strict
madness: XHTML 1.1 as application/xhtml+xml
-- Bjoern Hoehrmann