Javascript Xml performance

343 views
Skip to first unread message

David Peek

unread,
Mar 3, 2012, 8:13:31 PM3/3/12
to haxe...@googlegroups.com
Hi list,

Massive does a lot of work in the connected TV space, which means two things: first, we parse a lot of XML (catalogues, epgs etc), and second, we often work in embedded browsers on very low spec hardware. The Haxe javascript target has served us very well in all ways except one: XML parsing can be prohibitively slow on some browsers, particularly on low end devices.

The test shows a comparison of parsing 100kb of XML using Xml.parse() vs DOMParser().parseFromString().

The results on my fairly capable MacBookPro:

Chrome:
Main.hx:11: haxe: 49ms
Main.hx:15: haxe: 9ms

Safari:
Main.hx:11: haxe: 2625ms (apparently WebKit isn't WebKit anymore)
Main.hx:15: haxe: 6ms

Firefox:
Main.hx:11: haxe: 3005ms
Main.hx:15: haxe: 7ms

Opera: (installed especially for this test)
Main.hx:11: haxe: 63ms
Main.hx:15: haxe: 20ms

You can imagine how this test goes on a device with the hardware/browser of an early 90's PC.

The XmlNative class (included in the source) is something we use internally to address this issue. It wraps native parsing in an API that mirrors the Haxe API. It is by no means feature complete, or even a particularly efficient solution, but meets our needs so far.

I understand the RegEx based parser was built to address differences in browser XML parsing. Has the browser landscape changed enough for us to remove this fallback? What are the problems the RegEx parser addresses, and are there other ways we can resolve them? I'm no javascript developer, but it seems very strange that a language would roll it's own XML parsing on a platform built on XML parsing ;)

Anyhow, just wanted to start a discussion and see if there was anyway we could improve the situation.

Best,
David

Marc Weber

unread,
Mar 3, 2012, 8:27:38 PM3/3/12
to haxelang
Excerpts from David Peek's message of Sun Mar 04 02:13:31 +0100 2012:

> Anyhow, just wanted to start a discussion and see if there was anyway we
> could improve the situation.
I I was unsure I'd do both: keep using the HaXe parser but also use the
browser parser. Then send browser parser result (eg as serialized json
string) to your server.
Compare results of runs after 1 day / one week. If slow down is too huge
(should not because browsers do "native parsing") only ask clients to do
so if you have no parsed result for that browser engine yet.

At least then you'll have ideas about which problems you might run into
- and in how many cases.

Maybe somebody else has better ideas.

Marc Weber

Bruno Garcia

unread,
Mar 3, 2012, 8:40:56 PM3/3/12
to haxe...@googlegroups.com
I hear you, we were parsing large XML files and it was actually timing
out in Firefox and mobile Safari. We only fixed it by switching to JSON.
Of course, that XML file was 900k, but still :)

On 03/03/2012 05:13 PM, David Peek wrote:
> I understand the RegEx based parser was built to address differences
> in browser XML parsing. Has the browser landscape changed enough for
> us to remove this fallback?

There's still no standard for XML parsing as far as I know. DOMParser
doesn't work in IE, but it looks like their API is similar enough.

We should keep the regex based parser as a fallback if there's no
available native API, for non-browser environments like nodeJS.

Could you open an issue for this?

Bruno

David Peek

unread,
Mar 3, 2012, 9:37:06 PM3/3/12
to haxe...@googlegroups.com
I've posted the issue here: http://code.google.com/p/haxe/issues/detail?id=676

I'll see if I can improve XmlNative parse/traverse consistently across IE/Gecko/WebKit. DOM manipulation is something I haven't even looked at yet.

Best,
David

Nicolas Cannasse

unread,
Mar 4, 2012, 5:52:51 AM3/4/12
to haxe...@googlegroups.com
Le 04/03/2012 02:13, David Peek a �crit :

> I understand the RegEx based parser was built to address differences in
> browser XML parsing. Has the browser landscape changed enough for us to
> remove this fallback? What are the problems the RegEx parser addresses,
> and are there other ways we can resolve them? I'm no javascript
> developer, but it seems very strange that a language would roll it's own
> XML parsing on a platform /built/ on XML parsing ;)

Well, I think the RegEx parser could be optimized by using one single
regexp check in the form (node|pcdata|cdata|...) instead of trying the
different ones.

We could also port the Neko XML parser (written in C here :
http://nekovm.googlecode.com/svn/trunk/libs/std/xml.c) that should be
faster than Regexp (I guess) on most modern browsers.

Providing an alternative not-crossplatform js-specific XML parser (in
js.FastXml or something) is also a solution but not my favorite :)

Best,
Nicolas

Reply all
Reply to author
Forward
0 new messages