data.xml

65 views
Skip to first unread message

Ryan Senior

unread,
Jan 5, 2012, 10:53:22 AM1/5/12
to cloju...@googlegroups.com
With the recent movement in data.xml, I figured it would be a good time to discuss some changes I made a while back.  I sent a message to Chouser asking how I could help data.xml get to a releasable state.  We're using lazy-xml at Revelytix and it's one of the things keeping us on Clojure 1.2.1.  Key things he mentioned was CDATA and comments support and possibly moving to the JDK 1.6 included StAX parser (currently data.xml is using Apache Axiom).  I'm waiting on Chouser to get repo access, but I'm hoping to push my changes to a branch soon (my work in progress is here: https://github.com/senior/data.xml/tree/jdk16-pull-parser).  Below are some of the changes I've made thus far:

* Switched from Axiom to JDK included StAX parser

This has the benefit of data.xml not needing any additional dependencies if you're on 1.6+.  If you're on 1.5, you'll need to include the StAX JAR (http://stax.codehaus.org/).

* SAX based parsing has been removed

With the JDK supporting StAX directly, I couldn't see any reason to keep the SAX parsing around.  The SAX model doesn't fit well with lazy parsing, so the code was complex.  Everything I have read indicated there was no significant performance advantage of SAX over StAX.  It also causes build time and dependency complexity.

* Collapsed multi-module project

Since the SAX parsing was gone, there was no need for the project to produce multiple artifacts.  I have collapsed the modules into a single project/artifact.

* Emitting XML with StAX 

I found using the XML stream writer boosted performance significantly over the XML transformers that were being used.  IIRC it was something like 20% faster in my tests (see below for drawbacks).

* Comments and CDATA

I have initial support for for these.  I need to review this again to make sure I have all the pieces right.

* Indentation 

The current version of data.xml supports it through XML transformers and the XML stream writer stuff does not.  I'm planning to add indentation as a separate feature to keep the current emitting fast.  It would be a transformer-type of approach like before.

* Omit XML declaration

Currently supported with the transformers, not supported (as far as I can tell) from the XML stream writer.  I'm not planning to implement this.

-Ryan


Chouser

unread,
Jan 10, 2012, 9:08:35 PM1/10/12
to cloju...@googlegroups.com

This is fantastic, Ryan, thanks for all your work on it. I think any
minor feature regression is well worth the simplified build.

--Chouser

Reply all
Reply to author
Forward
0 new messages