bug in e4x? missing = in XML attribute

Leni

unread,

Feb 12, 2009, 6:16:14 PM2/12/09

to dev-te...@lists.mozilla.org

Hi, I think I may have encountered a bug in e4x parsing related to a
3-byte sequence of UTF-8.

The reason I think it's a bug is that it seems unreasonable that the
test case XML is parsable by the DOM parser but not the e4x parser.

Before filing in bugzilla I thought I would post here to see if anyone
has another explanation for the behaviour.

An email describing the problem with a test case is attached.

Regards -

Leni.

Martin Honnen

unread,

Feb 14, 2009, 11:35:03 AM2/14/09

to

Can you post the XML you are trying to parse?

--

Martin Honnen
http://JavaScript.FAQTs.com/

Leni

unread,

Feb 14, 2009, 3:12:53 PM2/14/09

to Martin...@gmx.de, dev-te...@lists.mozilla.org

Test-case xml is attached.

I also have a question about a workaround I was considering using:

var serializer = new XMLSerializer();
var str = serializer.serializeToString(req.responseXML);
var xml = new XML(str);

By running the DOM's XML through the XMLserialzer to make a string then
giving that to the e4x parser at least it parses.

But XMLserialiser turns that three-byte UTF-8 sequence into a '('
character. So two more questions:
a) can someone offer a pointer to how XMLserializer is supposed
to behave when there is a 3-byte UTF-8 sequence in the content
of an element?
b) can anyone suggest any other workaround?

The real-world thing I am trying to do is get a UTF-8 encoded Atom feed
coming from Google into an e4x XML object.

Leni.

Leni

unread,

Feb 14, 2009, 3:27:15 PM2/14/09

to Martin...@gmx.de, dev-te...@lists.mozilla.org

The earlier attached xml didn't pass through the email correctly so here
it is again in a .zip.

Leni.

Leni

unread,

Feb 14, 2009, 3:33:37 PM2/14/09

to Martin...@gmx.de, dev-te...@lists.mozilla.org

Leni wrote:
> The earlier attached xml didn't pass through the email correctly so here
> it is again in a .zip.

Ok, it looks like the mailing list software is removing the attachement,
so here is a URL:
http://www.zindus.com/tmp/1.xml.zip

Leni.

Martin Honnen

unread,

Feb 15, 2009, 8:29:15 AM2/15/09

to

Leni wrote:

>> Can you post the XML you are trying to parse?
>
> Test-case xml is attached.

I can't reproduce the issue with Firefox 3.0 (Mozilla/5.0 (Windows; U;
Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6). Test
case is at
http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html
and loads XML document from
http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501Test.xml
which is the file you sent.

I don't get any script or XML parsing errors.

Leni

unread,

Feb 15, 2009, 5:05:17 PM2/15/09

to Martin...@gmx.de, dev-te...@lists.mozilla.org

Martin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0 (Mozilla/5.0 (Windows; U;
> Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6). Test
> case is at
> http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html
> and loads XML document from
> http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501Test.xml
> which is the file you sent.
>
> I don't get any script or XML parsing errors.

Yes, you are right.

The extension I am working on is for Thunderbird2 and Thunderbird3, and
I can only reproduce the problem under Thunderbird2, not Thunderbird3.
Sorry for not making this clear in the original posting (I didn't test tb3).

If you are curious to reproduce this problem in Thunderbird using
Martin's test case, install the ThunderbirdBrowse extension:
https://addons.mozilla.org/en-US/thunderbird/addon/5373

Then visit the link in ThunderBrowse:
http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501.html

In Thunderbird3, the page is served correctly - the XML is shown.

In Thunderbird2, the page is not served correctly - the javascript error
console reports:

Error: e.target.parentNode.hasAttribute is not a function
Source File: chrome://tbrowse/content/tburlclk.js
Line: 377

I won't file a bug report for this tb2-only problem then because I doubt
it would get much attention.

About a workaround for Thunderbird 2, the DOM ==> XMLSerializer ==> e4x
technique does parse the XML but converts that 3-byte UTF-8 sequence
into a '(' which makes it lossy. If someone can shed any light on what
is going on here and in particular, what class of UTF-8 byte sequences
might be affected by such lossy conversion, it would help me evaluate
whether this technique is acceptable.

Or if anyone can think of a better workaround for tb2 it will be welcome!

Thanks -

Leni.

shows.G...@gmail.com

unread,

Feb 16, 2009, 3:38:38 PM2/16/09

to

On Feb 15, 2:05 pm, Leni <mozilla....@zindus.com> wrote:
> Martin Honnen wrote:
> > I can't reproduce the issue with Firefox 3.0 (Mozilla/5.0 (Windows; U;
> > Windows NT 5.1; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6). Test
> > case is at

> >http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501....

> > and loads XML document from

> >http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501T...

> > which is the file you sent.
>
> > I don't get any script or XML parsing errors.
>
> Yes, you are right.
>
> The extension I am working on is for Thunderbird2 and Thunderbird3, and
> I can only reproduce the problem under Thunderbird2, not Thunderbird3.
> Sorry for not making this clear in the original posting (I didn't test tb3).
>
> If you are curious to reproduce this problem in Thunderbird using
> Martin's test case, install the ThunderbirdBrowse extension:https://addons.mozilla.org/en-US/thunderbird/addon/5373
>

> Then visit the link inThunderBrowse:http://home.arcor.de/martin.honnen/javascript/2009/02/test2009021501....

>
> In Thunderbird3, the page is served correctly - the XML is shown.
>
> In Thunderbird2, the page is not served correctly - the javascript error
> console reports:
>
> Error: e.target.parentNode.hasAttribute is not a function
> Source File: chrome://tbrowse/content/tburlclk.js
> Line: 377
>
> I won't file a bug report for this tb2-only problem then because I doubt
> it would get much attention.
>
> About a workaround for Thunderbird 2, the DOM ==> XMLSerializer ==> e4x
> technique does parse the XML but converts that 3-byte UTF-8 sequence
> into a '(' which makes it lossy. If someone can shed any light on what
> is going on here and in particular, what class of UTF-8 byte sequences
> might be affected by such lossy conversion, it would help me evaluate
> whether this technique is acceptable.
>
> Or if anyone can think of a better workaround for tb2 it will be welcome!
>
> Thanks -
>
> Leni.

Actually, it's a bug that deals with javascript link handling in
ThunderBrowse. 3.2.3 fixes the bug.

Leni

unread,

Feb 16, 2009, 4:40:45 PM2/16/09

to Martin...@gmx.de, dev-te...@lists.mozilla.org

Martin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0

After the posting from shows.G...@gmail.com I did some more testing
and found that I can't reproduce it when the e4x parsing happens
inside a <browser> element.

So ... here is another test case along the same lines.

To run the test:
- copy and paste the code below into a text editor and remove
all the newlines - all the code should be on one line
- copy and paste into the javascript error console and click evaluate

The error console reports:
Error: missing = in XML attribute
Source File:
Line: 3, Column: 2
Source Code:
le><content>Alice, Kerry

I can reproduce this in tb2, tb3beta1 and firefox 3.06. It's the \u2028
character in the code below which causes the problem.

var str = "<?xml version='1.0' encoding='UTF-8'?><feed
xmlns='http://www.w3.org/2005/Atom'
xmlns:openSearch='http://a9.com/-/spec/opensearch/1.1/'><id>exa...@gdomain.example.com</id><updated>2009-02-11T05:58:32.673Z</updated><category
scheme='http://schemas.google.com/g/2005#kind'
term='http://schemas.google.com/contact/2008#contact'/><generator
version='1.0'
uri='http://www.google.com/m8/feeds'>Contacts</generator><entry><app:edited
xmlns:app='http://www.w3.org/2007/app'>2009-02-11T05:48:11.672Z</app:edited><title>Alice
Midxxxxxx</title><content>Alice, Kerry \u2028Ex:
Jones</content></entry></feed>";var xml = new XML(str.replace(/\<\?xml
version=.*?\?\>/,""));

Regards -

Leni.

Leni

unread,

Feb 16, 2009, 4:43:50 PM2/16/09

to shows.G...@gmail.com, dev-te...@lists.mozilla.org

shows.G...@gmail.com wrote:
> Actually, it's a bug that deals with javascript link handling in
> ThunderBrowse. 3.2.3 fixes the bug.

Yes - thanks for that. With ThunderBrowse 3.2.3 Martin's test case now
works for me too.

Leni.

Leni

unread,

Feb 17, 2009, 12:25:19 AM2/17/09

to Martin...@gmx.de, dev-te...@lists.mozilla.org

Martin Honnen wrote:
> I can't reproduce the issue with Firefox 3.0

Just for good measure, I can now reproduce the problem using a test case
similar to the one you used.

Test case:
http://www.zindus.com/tmp/test-case-2009-02-17-1.html
The xml:
http://www.zindus.com/tmp/test-case-2009-02-17-1.xml

Firefox 3.0.6 error console reports:
Error: illegal XML character

The .xml is different to the one provided earlier, but the problem is
the same - related to that unicode character, in this example it is just
before the string "Jones".

I hope I am not making a big noise over something that has a simple
explanation.

Leni.

Martin Honnen

unread,

Feb 17, 2009, 8:21:55 AM2/17/09

to

Leni wrote:
> Martin Honnen wrote:
>> I can't reproduce the issue with Firefox 3.0
>
> Just for good measure, I can now reproduce the problem using a test case
> similar to the one you used.
>
> Test case:
> http://www.zindus.com/tmp/test-case-2009-02-17-1.html
> The xml:
> http://www.zindus.com/tmp/test-case-2009-02-17-1.xml
>
> Firefox 3.0.6 error console reports:
> Error: illegal XML character
>
> The .xml is different to the one provided earlier, but the problem is
> the same - related to that unicode character, in this example it is just
> before the string "Jones".

I see that problem too with Firefox 3.0.6.

Now to move the problem into a bug report it would be best to have a
minimal test case, preferably, as the E4X XML constructor is implemented
by the JavaScript engine itself, a test case not even needing to load an
XML document with XMLHttpRequest, but rather a script test case doing
new XML(string) and causing the error.

I am however struggling to indentify the character causing the problem.
According to your earlier post, it is encoded in UTF-8 as 0xe2 0x80 0xa8
which would be the Unicode character U2028 I think.
However doing
var el = new XML('<foo>Line 1.\u2028Line 2.</foo>');
in Firefox 3.0.6 does not cause any error, so that way the character is
parsed fine. So either it is not that character causing the error or
that error only occurs with longer strings.

Boris Zbarsky

unread,

Feb 17, 2009, 10:03:30 AM2/17/09

to

Martin Honnen wrote:
>> Test case:
>> http://www.zindus.com/tmp/test-case-2009-02-17-1.html
>> The xml:
>> http://www.zindus.com/tmp/test-case-2009-02-17-1.xml
>>
>> Firefox 3.0.6 error console reports:
>> Error: illegal XML character

I get that too in trunk Gecko.

However, if I start reducing it (and it's possible to reduce it a good
bit while still getting that error), I eventually get to a point where
the error starts changing (e.g. complaining about there being a missing
'=' in an attribute).

If I breakpoint on the "invalid XML character" error, I see that it
happens when we get a '<' while we think we're in the process or parsing
an open tag.

In particular, it thinks it's looking at a string that looks something like:

<author/www.google.com/m8/feeds/contacts/a.b%40gdomain.example.com/thin?start-index=2681&max-results=10'

Which is pretty clearly bogus.

-Boris

Boris Zbarsky

unread,

Feb 17, 2009, 10:26:55 AM2/17/09

to

OK, I have this minimized to this script:

var xmlEl = new XML("<feed
xmlns:gContact='http://schemas.google.com/contact/2'
xmlns:batch='http://schemas.google.com/gdata/batch'
xmlns:gd='http://schemas.google.com/g/2005'
gd:etag='W/"xxxxxxxxxxxxxxxxxxxxxxw."'><updated>2009-02-1</updated><e><c>\u2028</c></e></feed>");
var pre = document.createElement('pre');
pre.appendChild(document.createTextNode(xmlEl.toXMLString()));
document.body.appendChild(pre);

with no XMLHttpRequest required. Deleting chars from the string
sometimes changes the error, and sometimes makes it go away entirely,
but I bet it can be minimized some more. If someone wants to take a
shot at that, great.

This doesn't look like an XML issue, though, but a JS engine one.

-Boris

Martin Honnen

unread,

Feb 17, 2009, 1:19:30 PM2/17/09

to

Thanks for the reduction. I agree it is a JavaScript engine issue, I
have filed https://bugzilla.mozilla.org/show_bug.cgi?id=478905 on this.