Original DOM of a XSLT'd XML

Rowan Nairn

unread,

Mar 30, 2006, 12:41:21 PM3/30/06

to

Hi,

I'm testing for certain types of XML documents when a page loads in the
xul:browser. But if an XML document has an XSLT stylesheet associated
with it (feedburner feeds for instance) then the browser document is
the DOM for the transform result but not the original XML document.
How can I get hold of the original? I'd be happy even if I could get
it as a string which I could maybe test with a regex and then parse
with DOMParser if I needed to.

I don't want to do XMLDocument.load() everytime a page loads just to
see if it really is the XML I want. XMLDocument.load probably doesn't
even use the browser cache, does it? And I don't want to do
nsIWebBrowserPersist.saveURI() to a local file either. Is there any
way to get at the in memory DOM of the original XML doc or even its xml
string?

Thanks for your help

Alex Vincent

unread,

Mar 30, 2006, 1:10:10 PM3/30/06

to

See https://bugzilla.mozilla.org/show_bug.cgi?id=153281 . :-)

Jonas Sicking

unread,

Mar 30, 2006, 1:33:46 PM3/30/06

to

Rowan Nairn wrote:
> Hi,
>
> I'm testing for certain types of XML documents when a page loads in the
> xul:browser. But if an XML document has an XSLT stylesheet associated
> with it (feedburner feeds for instance) then the browser document is
> the DOM for the transform result but not the original XML document.
> How can I get hold of the original? I'd be happy even if I could get
> it as a string which I could maybe test with a regex and then parse
> with DOMParser if I needed to.

Unfortunatly no. One reason is that we don't want to waste memory
keeping the original document around when most people don't need it.

What you can do however is simply copy the source document through and
insert it in your result DOM. Something like this for example

<html>
<head> .... other stuff that you currently have
in your stylesheet ...
<div id="source"><xsl:copy-of select="/" /></div>
</head>
... the rest of your stylesheet

You can then access the source document using
document.getElementById('source')

Hope that helps

/ Jonas

r0wb0t

unread,

Mar 30, 2006, 2:54:37 PM3/30/06

to

Thanks Alex. I guess I should register a +1 for that bug?

Jonas, I'm talking about accessing XML documents on the web from XUL,
XML documents which I don't own and which I don't have any control over
their stylesheets (and they must be loaded in the browser not with
XMLDocument.load).

I'll look into document.documentElement.textContent as mentioned by
this comment:
https://bugzilla.mozilla.org/show_bug.cgi?id=153281#c10

As a last resort I wouldn't mind turning off all automatic XSL
processing by the xul:browser. Anyone know if that's possible?

Rowan

Christian Biesinger

unread,

Apr 6, 2006, 11:14:02 AM4/6/06

to

Rowan Nairn wrote:
> XMLDocument.load probably doesn't even use the browser cache, does it?

Why wouldn't it? Everything uses the cache, unless it specifically wants
to bypass it. (I think we do that for XMLHTTPRequest POST requests, and
of course shift+reload, but not much else)

(And why are you cross-posting to 4 groups without setting Followup-To?
Setting one here.)

r0wb0t

unread,

Apr 8, 2006, 7:46:06 PM4/8/06

to

Christian Biesinger wrote:
> Rowan Nairn wrote:
> > XMLDocument.load probably doesn't even use the browser cache, does it?
>
> Why wouldn't it? Everything uses the cache, unless it specifically wants
> to bypass it. (I think we do that for XMLHTTPRequest POST requests, and
> of course shift+reload, but not much else)

Ok, great! So my follow-up question is: How can I tell
XMLDocument.load to only use the cache or else abort? Or even better,
to tell if a certain URL will be taken from the cache before I call
load.

Thanks for the reply,
Rowan

> (And why are you cross-posting to 4 groups without setting Followup-To?
> Setting one here.)

Sorry. Missed that option.

Christian Biesinger

unread,

Apr 8, 2006, 9:03:41 PM4/8/06

to

r0wb0t wrote:
> Ok, great! So my follow-up question is: How can I tell
> XMLDocument.load to only use the cache or else abort? Or even better,
> to tell if a certain URL will be taken from the cache before I call
> load.

I don't think you can do either. Hm... you could use the cache APIs to
check whether a certain URL is in cache, but you'd also have to verify
that it's not expired of course. This would be around nsICacheService
(the "key" is the URL, basically, at least for GET requests)

What you could also do is use lower level necko APIs and set the
LOAD_ONLY_FROM_CACHE load flag. That ensures that the server won't be
contacted. Maybe you can also use that together with XMLHttpRequest
(after .open() and before .send(), do req.channel.loadFlags |=
Components.interfaces.nsICachingChannel.LOAD_ONLY_FROM_CACHE. but I'm
not sure whether this will work)

r0wb0t

unread,

May 8, 2006, 11:58:01 AM5/8/06

to

Christian Biesinger wrote

> What you could also do is use lower level necko APIs and set the
> LOAD_ONLY_FROM_CACHE load flag. That ensures that the server won't be
> contacted. Maybe you can also use that together with XMLHttpRequest
> (after .open() and before .send(), do req.channel.loadFlags |=
> Components.interfaces.nsICachingChannel.LOAD_ONLY_FROM_CACHE. but I'm
> not sure whether this will work)

FYI, this seems to work fine. Thanks! ....However, it turns out it's
not really what I need. I'm looking for the XML source of every page
that loads in the browser, not just the ones that get cached (it turns
out alot of the ones I'm interested in have no-cache headers).

It's a pity that XSLTs obliterate the original document like this. I'm
still looking for another way to get at it....

Rowan