Problem with DOMParser for img elements

Foteos Macrides

unread,

Jul 11, 2005, 5:33:08 PM7/11/05

to

If I use this test markup for an application/xhtml+xml document with a
container element that has id="foo"

the str value is converted, inserted and displayed within the container
element, but for the img element the alt string is displayed instead of the
image. It doesn't matter whether the src attribute value is a partial or
complete URL. When I look at the innerHTML, both the alt and src
attribute=value pairs are present.

Is this a bug in the handling of img elements or a problem in my test markup?

Also, the innerHTML string that is returned has <hr>, <br> and <img etc.> tags
instead of the <hr />, <br /> <img etc. /> elements that were used in the str
string variable definition and are needed in application/xhtml+xml documents.
Is there a way to get back the XHTML instead of HTML forms for those in XHTML
documents (short of writing my own function to make the conversion of what is
returned via innerHTML)?

Fote
--

Boris Zbarsky

unread,

Jul 11, 2005, 5:55:47 PM7/11/05

to

Foteos Macrides wrote:
> the str value is converted, inserted and displayed within the container
> element, but for the img element the alt string is displayed instead of the
> image.

See https://bugzilla.mozilla.org/show_bug.cgi?id=251354 (I assume you're using
an old Gecko here).

> Also, the innerHTML string that is returned has <hr>, <br> and <img etc.> tags
> instead of the <hr />, <br /> <img etc. /> elements that were used in the str
> string variable definition and are needed in application/xhtml+xml documents.

See https://bugzilla.mozilla.org/show_bug.cgi?id=155723

-Boris

Foteos Macrides

unread,

Jul 11, 2005, 7:26:44 PM7/11/05

to

"Boris Zbarsky" <bzba...@mit.edu> wrote in message
news:daupt3$gt...@ripley.netscape.com...

> Foteos Macrides wrote:
> > the str value is converted, inserted and displayed within the
> > container element, but for the img element the alt string is
> > displayed instead of the image.
>
> See https://bugzilla.mozilla.org/show_bug.cgi?id=251354 (I
> assume you're using an old Gecko here).

I tried it in Firefox 1.0.4 (20050511), which I hadn't perceived as
old. The currently last entry in that bug report is on 20050416. I
also have a Moz v1.8b2 around which has the problem. In what
formally released versions of the Geckos has your fix actually
been included?

> > Also, the innerHTML string that is returned has <hr>, <br> and
> > <img etc.> tags instead of the <hr />, <br /> <img etc. /> elements
> > that were used in the str string variable definition and are needed
> > in application/xhtml+xml documents.
>
> See https://bugzilla.mozilla.org/show_bug.cgi?id=155723
>
> -Boris

I had difficulty following the comments in that bug. My impression is
that you intend eventually to have innerHTML use the HTML or XHTML
forms for br, hr and img depending on the content type, for both reads
and writes. Is that correct? In Firefox 1.0.4 it does both reads and
writes with text/html, and it handles reads with application/xhtml+xml
(but returns the HTML forms). However, it throws an exception on
attempted writes with application/xhtml+xml no mater what markup is
in the string (i.e., including <br />, <hr /> or <img etc. />). The same is
true for createContextualFragment(), which possibly reflects the
underlying problem for those exceptions.

Fote
--

Boris Zbarsky

unread,

Jul 11, 2005, 7:48:03 PM7/11/05

to

Foteos Macrides wrote:
> I tried it in Firefox 1.0.4 (20050511), which I hadn't perceived as
> old.

It's using a Gecko from April 2004, with some security fixes. Pretty old, if
you ask me. ;)

> The currently last entry in that bug report is on 20050416. I
> also have a Moz v1.8b2 around which has the problem.

True. It was fixed after 1.8b2.

> In what formally released versions of the Geckos has your fix actually
> been included?

Deer Park Alpha 1 is the only one I'm aware of (for some values of "formally
released").

> I had difficulty following the comments in that bug. My impression is
> that you intend eventually to have innerHTML use the HTML or XHTML
> forms for br, hr and img depending on the content type, for both reads
> and writes. Is that correct?

That's correct.

> In Firefox 1.0.4 it does both reads and writes with text/html

Right.

> However, it throws an exception on
> attempted writes with application/xhtml+xml no mater what markup is
> in the string (i.e., including <br />, <hr /> or <img etc. />)

This is fixed in current trunk builds, and in Deer Park Alpha 1 in particular.

-Boris

Foteos Macrides

unread,

Jul 11, 2005, 8:24:35 PM7/11/05

to

"Boris Zbarsky" <bzba...@mit.edu> wrote in message

news:dav0fk$is...@ripley.netscape.com...

> Foteos Macrides wrote:
> > I tried it in Firefox 1.0.4 (20050511), which I hadn't
> > perceived as old.
>
> It's using a Gecko from April 2004, with some security
> fixes. Pretty old, if you ask me. ;)

Could you elaborate on that? If Firefox 1.0.4 reports
Gecko/20050511 but I should treat it as Gecko/200404xx
I find that puzzling. What do those numbers in the
user-agent strings and Help-About displays actually mean,
and are they in register across flavors of Gecko-based
browsers?

Fote
--

Boris Zbarsky

unread,

Jul 11, 2005, 9:49:59 PM7/11/05

to

Foteos Macrides wrote:
> Could you elaborate on that? If Firefox 1.0.4 reports
> Gecko/20050511 but I should treat it as Gecko/200404xx
> I find that puzzling. What do those numbers in the
> user-agent strings and Help-About displays actually mean,
> and are they in register across flavors of Gecko-based
> browsers?

The number after "Gecko/" is the date that that build was compiled. It
indicates nothing about what code was used to compile the build, and compiling
the same exact code on different days will give different numbers there.

The number that's most relevant to the actual Gecko version used is the number
that comes after "rv:" in the User-Agent string.

-Boris

Foteos Macrides

unread,

Jul 12, 2005, 1:18:31 AM7/12/05

to

"Boris Zbarsky" <bzba...@mit.edu> wrote in message

news:dav7k7$pc...@ripley.netscape.com...

> The number after "Gecko/" is the date that that build was
> compiled. It indicates nothing about what code was used
> to compile the build, and compiling the same exact code
> on different days will give different numbers there.
>
> The number that's most relevant to the actual Gecko version
> used is the number that comes after "rv:" in the User-Agent string.

Thanks for the explanation, but at the risk of seeming dense I want
to be sure I have this straight. Tonight (2005-07-11) I went to the
Developers page and under Nightly Builds I clicked on the
Windows link for Firefox. On installation it describes itself as
Deer Park Alpha 2 and indicates rv:1.8b3 Gecko/20050711
Firefox 1.0+ If I had clicked on the Windows link for the Mozilla
Suite and installed the browser component would I have ended up
with the same engine and only have differences in the chrome?

Writes via the DOMParser's parseFromString() vs. innerHTML
now both work fine for application/xhtml+xml documents. I still
need to indicate the name space for the DOMParser, but not
for innerHTML. Is the latter assuming
xmlns="http://www.w3.org/1999/xhtml" and how does the issue
of Strict vs. Transitional vs. Frameset DTDs come into play?

The hr, br and img elements need a slash before their
close-angle-bracket but don't need a space before the slash
(though I, and I presume most people, will be including the
space anyway, in keeping with the W3C recommendation).

But no matter what method I use for writing, when I read the
innerHTML the hr, br and img elements have the space and
slash stripped from before the close-angle-bracket. This is a
problem if I read to a string variable and want to use that to
create a child for another container element in the
application/xhtml+xml documents. Do I need to roll my own
function for putting in the slashes, or is that a "bug fix" on
your agenda?

Fote
--

Boris Zbarsky

unread,

Jul 12, 2005, 1:59:30 AM7/12/05

to

Foteos Macrides wrote:
> Thanks for the explanation, but at the risk of seeming dense I want
> to be sure I have this straight. Tonight (2005-07-11) I went to the
> Developers page and under Nightly Builds I clicked on the
> Windows link for Firefox. On installation it describes itself as
> Deer Park Alpha 2 and indicates rv:1.8b3 Gecko/20050711
> Firefox 1.0+ If I had clicked on the Windows link for the Mozilla
> Suite and installed the browser component would I have ended up
> with the same engine and only have differences in the chrome?

I don't know which page "the Developers page" is, so I don't know... Try and
see what the "rv:" part says? If it says "rv:1.8b3" then yes.

> I still need to indicate the name space for the DOMParser, but not
> for innerHTML. Is the latter assuming
> xmlns="http://www.w3.org/1999/xhtml" and how does the issue
> of Strict vs. Transitional vs. Frameset DTDs come into play?

innerHTML assumes the namespace of the node you're setting innerHTML on,
basically. Or rather, createContextualFragment assumes that.

> The hr, br and img elements need a slash before their
> close-angle-bracket but don't need a space before the slash

Right. The whole thing is just parsed as XML.

> But no matter what method I use for writing, when I read the
> innerHTML the hr, br and img elements have the space and
> slash stripped from before the close-angle-bracket.

Right. That's https://bugzilla.mozilla.org/show_bug.cgi?id=155723

-Boris

Foteos Macrides

unread,

Jul 12, 2005, 9:52:25 AM7/12/05

to

"Boris Zbarsky" <bzba...@mit.edu> wrote in message

news:davm84$6e...@ripley.netscape.com...

> Foteos Macrides wrote:
>
> > describes itself as
> > Deer Park Alpha 2 and indicates rv:1.8b3 Gecko/20050711
>

> I don't know which page "the Developers page" is, so I don't know.

> Try and see what the "rv:" part says? If it says "rv:1.8b3" then yes.

The documents at http://www.mozilla.org/ have a menu at top which
includes a "Developers" link to http://www.mozilla.org/developer/
The latter document has a "Nightly Builds" section with links for
"Firefox" or the "Mozilla Suite" and the browser (installed by me on
Windows) presently says "Alpha 2" in words but has "rv:1.8b3" in
the User-Agent string and Help-About display. I have been
interpreting the "rv:1.8b3" to mean that it was a "Beta 3" (plus very
recent patches as of the date indicated by "Gecko/YYYYMMDD")
heading toward a Mozilla 1.8 and Firefox 1.2 "formal release" with
associated advertising for "non-developers" to get the latest and
greatest version. The other documents at www.mozilla.org, e.g.,
http://www.mozilla.org/download.html would lead visitors to get either
Firefox 1.0.4 or Mozilla 1.7.8 and still give the impression that those
are the latest and greatest, though you have brought out in this
thread that their Gecko engine has become more than a year old.

But the most important point, if I understand you correctly, is that in
the next Firefox and Mozilla releases without an Alpha or Beta
qualifier one's javascript for DHTML that uses innerHTML for writes,
and works with text/html documents, should also work in
application/xhtml+xml documents if the string in the innerHTML
write conforms appropriately to the XHTML 1.0 or 1.1 specs,
PLUS, what you get for reads with innerHTML also will be
consistent with those specs (though it isn't yet with last night's
"rv:1.8b3 plus recent patches" nightly build).

I presume innerHTML will never become precisely specified in the
ECMAScript standards, but with respect to across-browser
compatibility, the expectations for the Geckos stated in the
preceding paragraph do now appear to apply for Opera8. Does
anyone know what the situation and development plans for
innerHTML with XHTML are in Safari and Konqueror? Is there an
online resource which keeps track of this issue in a straightforward
way?

Fote
--

Martin Honnen

unread,

Jul 12, 2005, 12:36:55 PM7/12/05

to

Foteos Macrides wrote:

> I presume innerHTML will never become precisely specified in the
> ECMAScript standards, but with respect to across-browser
> compatibility, the expectations for the Geckos stated in the
> preceding paragraph do now appear to apply for Opera8.

Not really, Boris has stated that setting innerHTML in
application/xhtml+xml uses an XML parser while with Opera 8 I can
happily throw HTML which would not be well-formed XML at an element in
an application/xhtml+xml document and the HTML is parsed without parse
errors e.g. if you do
document.body.innerHTML = '<ul><li>Kibo<li>Xibo</ul>'
in Opera 8 in an application/xhtml+xml document that snippet is parsed
without error while in Mozilla nightlies the XML parser throws an error.

--

Martin Honnen
http://JavaScript.FAQTs.com/

Boris Zbarsky

unread,

Jul 12, 2005, 2:11:41 PM7/12/05

to

Foteos Macrides wrote:
> The documents at http://www.mozilla.org/ have a menu at top which
> includes a "Developers" link to http://www.mozilla.org/developer/
> The latter document has a "Nightly Builds" section with links for
> "Firefox" or the "Mozilla Suite"

Ah, OK. Yes, the builds linked from this page at any given time should have
more or less equivalent Geckos.

> I have been
> interpreting the "rv:1.8b3" to mean that it was a "Beta 3" (plus very
> recent patches as of the date indicated by "Gecko/YYYYMMDD")

Actually, it's "not quite beta3 yet". The "rv" string has already been updated,
but beta3 hasn't been shipped yet. That will hopefully happen soon.

> The other documents at www.mozilla.org, e.g.,
> http://www.mozilla.org/download.html would lead visitors to get either
> Firefox 1.0.4 or Mozilla 1.7.8 and still give the impression that those
> are the latest and greatest

For released versions, they unfortunately are. There are no stable releases
with a 1.8anything Gecko in them yet, which is quite sad.

> But the most important point, if I understand you correctly, is that in
> the next Firefox and Mozilla releases without an Alpha or Beta
> qualifier one's javascript for DHTML that uses innerHTML for writes,
> and works with text/html documents, should also work in
> application/xhtml+xml documents if the string in the innerHTML
> write conforms appropriately to the XHTML 1.0 or 1.1 specs

That's correct.

> PLUS, what you get for reads with innerHTML also will be
> consistent with those specs

This might not be the case -- the bug on this is still open, and may not end up
fixed in time.

> Does anyone know what the situation and development plans for
> innerHTML with XHTML are in Safari and Konqueror? Is there an
> online resource which keeps track of this issue in a straightforward
> way?

I'm pretty sure WHATWG plans to standardize innerHTML, so you may want to ask
them...

-Boris

Martin Honnen

unread,

Jul 12, 2005, 2:38:42 PM7/12/05

to

Boris Zbarsky wrote:

>> Does anyone know what the situation and development plans for
>> innerHTML with XHTML are in Safari and Konqueror? Is there an
>> online resource which keeps track of this issue in a straightforward
>> way?
>
>
> I'm pretty sure WHATWG plans to standardize innerHTML, so you may want
> to ask them...

It is planned it seems but not done:
<http://www.whatwg.org/specs/web-apps/current-work/#serialization>

Foteos Macrides

unread,

Jul 12, 2005, 3:50:21 PM7/12/05

to

"Martin Honnen" <maho...@yahoo.de> wrote in message
news:db0rj9$ho...@ripley.netscape.com...

Yes, Opera 8 still does "tag soup" parsing if it "needs to" for
innerHTML in application/xhtml+xml documents according to
the attitude of the early web (in the last century :) that a browser
should do "something reasonable" to display, somehow or
other, whatever a document's author throws at it rather than just
"failing" to display the bad markup. I suspect that if IE is ever
upgraded for application/xhtml+xml documents, its innerHTML
will be that way too.

But my stated "expectation" for the eventual Mozilla 1.8 and
Firefox 1.1 "release with fanfare for non-developers" was that
if one does use well-formed XHTML 1.0 or 1.1 strings for writes
with innerHTML in application/xhtml+xml documents, they will
work (as they did in last night's nightly for Firefox that I tried, as
well as in Opera 8), PLUS then return a string with well-formed
markup for reads with innerHTML (as Opera 8 also did, but the
Gecko nightly I tried did not, and Boris now says might not for
the 1.8 release). It's not reasonable, IMHO, to expect across
browser compatibility for all error handling of bad markup, but
one should expect such compatibility for markup which complies
with the relevant standards (particularly if innerHTML indeed will
be standardized at some point, as you and Boris have just
indicated).

At this point the DOMParser's parseFromString, though I really
liked it while I was trying it out, involves more code than a write
with innerHTML and thus may go the way of using ranges plus
createContextualFragment as far as overt usage in scripts is
concerned. Opera 8 passes a simple capability test for the
DOMParser, but it is "inactive" code. Do you know whether
those developers, or one's of other browsers, plan to pursue it
further?

One thing I really liked about it, is that it promptly throws up a
display pointing to the start of ill-formed markup in the string,
instead of just failing and requiring an author to call up the
JavaScript Console to start figuring out why it failed, as is the
case for innerHTML writes with Gecko. The former seems more
in keeping with the notion of doing "something reasonable" in the
twenty-first century (help the author learn to do it correctly, though
lots of casual authors might still prefer that it "just works" more or
less no matter what they do :).

Fote
--

Foteos Macrides

unread,

Jul 13, 2005, 4:03:58 PM7/13/05

to

"Foteos Macrides" <fot...@hotmail.com> wrote in message
news:db3eon$64...@ripley.netscape.com...
>
> I just checked and see there is a newer release of Opera 8
> out. I'll get its free version and explore it this weekend.

My curiosity proved too great for me to wait until the weekend,
but getting Opera 8.01 didn't help. I figured out the problem,
though, and had been mis-interpreting the failure and Opera's
Javascript console message as indicating that the DOMParser
hadn't actually created a DOM tree. Note that the test script I
had posted at the beginning of this thread was not importing
the DOM tree into the current document, which Opera requires,
before making calls such as appendChild. Once I added that
step, all of my scripts worked fine with both Firefox and Opera.

Why don't the Geckos also require that step?

Fote
--

Martin Honnen

unread,

Jul 13, 2005, 7:43:52 AM7/13/05

to

Foteos Macrides wrote:

> Opera 8 passes a simple capability test for the
> DOMParser, but it is "inactive" code. Do you know whether
> those developers, or one's of other browsers, plan to pursue it
> further?

I am not sure what you want to say with "inactive" code.
The object is there and its method works e.g.

var xmlDocument = new
DOMParser().parseFromString('<gods><god>Kibo</god></gods>',
'application/xml');

creates an XML DOM document in Opera 8.
And Opera 8 is the first browser to have implemented the W3C DOM Level 3
Load and Save module so they have XMLSerializer, DOMParser, and DOM
Level 3 Load and Save, not sure what you want them to pursue besides that.

> One thing I really liked about it, is that it promptly throws up a
> display pointing to the start of ill-formed markup in the string,
> instead of just failing and requiring an author to call up the
> JavaScript Console to start figuring out why it failed, as is the
> case for innerHTML writes with Gecko.

I think that the end user browser should certainly not show/display
script errors to the user but browsers should have settings for web
developers to show the script console when an error occurs. I think that
is possible with both Firefox and Opera, not sure whether you need an
extension for Firefox.

Boris Zbarsky

unread,

Jul 13, 2005, 5:27:23 PM7/13/05

to

Foteos Macrides wrote:
> Why don't the Geckos also require that step?

Long story, but basically because it wasn't implemented at the very beginning of
the DOM impl and at this point fixing it would involve fixing all of our chrome
to cope, at the very least.

-Boris

Foteos Macrides

unread,

Jul 13, 2005, 12:16:12 PM7/13/05

to

"Martin Honnen" <maho...@yahoo.de> wrote in message

news:db2upr$ob...@ripley.netscape.com...

>
> I am not sure what you want to say with "inactive" code.
> The object is there and its method works

I don't use Opera seriously, but had the free version of 8.00
for testing. It passed relevant typeof tests, but my scripts
that worked with Firefox didn't with that version of Opera.

I just checked and see there is a newer release of Opera 8
out. I'll get its free version and explore it this weekend.

Fote
--

Brendan Eich

unread,

Jul 17, 2005, 12:51:02 PM7/17/05

to Foteos Macrides

Foteos Macrides wrote:

> I presume innerHTML will never become precisely specified in the
> ECMAScript standards,

innerHTML is not specified *at all* (not even imprecisely) by ECMA-262
or another ECMA standard. Nor will it be. It should have been
standardized by the DOM working group during the browser wars, but that
did not happen for both obvious and subtle reasons. As others have
noted, it looks like the WHAT Working Group will standardize it among at
least some browser vendors.

/be

Foteos Macrides

unread,

Jul 17, 2005, 5:09:23 PM7/17/05

to

"Brendan Eich" <bre...@meer.net> wrote in message
news:42DA8C76...@meer.net...

> As others have
> noted, it looks like the WHAT Working Group will standardize

> [innerHTML] among at least some browser vendors.

According to recent comments in

https://bugzilla.mozilla.org/show_bug.cgi?id=155723

Boris did fix the handling of empty elements for innerHTML
reads in application/xhtml+xml documents. So I got the
rv:1.8b4 Gecko/20050716 Firefox nightly and have been
comparing what it does for writes with innerHTML or the
DOMParser's parseFromString, and then reads with
innerHTML or the XMLSerializer's serializeToString , versus
what Opera 8.01 does for such writes, and then such reads
or ones with the Load and Save LSSerializer's writeToString,

One difference already noted by Martin is that for writes
Opera's innerHTML (but not its parseFromString) accepts
strings that have empty tags without the />, as well as
containers with only an implied end tag, and "fixes up" such
strings to be well-formed for XHTML, whereas the Geckos
require that the strings have well-formed XHTML markup in
the first place. I've also noticed that Opera converts tags such
as <hr noshade> to <hr noshade="true"/> and that if you use
<hr noshade="noshade" /> or <hr noshade="true" /> or
<hr noshade="false" /> they all end up as <hr noshade="true"/>,
whereas Firefox retains whatever value was used in the
noshade name="value" pair to make it OK for XHTML. Since
the value is irrelevant for whether the noshade attribute is acted
upon, it would seem this difference between the browsers is
irrelevant, but correct me if those with more experience see a
situation in which it might matter.

Another difference is that reads with Opera's innerHTML return
start tags with xmlns="http://www.w3.org/1999/xhtml" in them only
if the tags overtly indicated that name space in the original markup.
In contrast, the Gecko's innerHTML adds that name="value" pair
for all start tags which are not encased in a container whose start
tag has it (in the markup for the overall XHTML document, the name
space may have been inherited such that the start tags didn't need
that name="value" pair). This difference between the two browsers
might make a difference in the level of certainty one can have that
one's javascript can use innerHTML to fish the content out of one
element and plug it into another element without any problems in
application/xhtml+xml documents.

Fote
--

Boris Zbarsky

unread,

Jul 17, 2005, 7:52:44 PM7/17/05

to

Foteos Macrides wrote:
> Another difference is that reads with Opera's innerHTML return
> start tags with xmlns="http://www.w3.org/1999/xhtml" in them only
> if the tags overtly indicated that name space in the original markup.
> In contrast, the Gecko's innerHTML adds that name="value" pair
> for all start tags which are not encased in a container whose start
> tag has it (in the markup for the overall XHTML document, the name
> space may have been inherited such that the start tags didn't need
> that name="value" pair). This difference between the two browsers
> might make a difference in the level of certainty one can have that
> one's javascript can use innerHTML to fish the content out of one
> element and plug it into another element without any problems in
> application/xhtml+xml documents.

Right. This is exactly why we set the namespace on all the "toplevel" nodes --
this way if you stick the string into some random element that happens to be in
a different namespace (eg MathML/SVG/XUL/whatever) it'll still get parsed correctly.

-Boris