Woodstox-core

3 views
Skip to first unread message

Klacee Sawatzky

unread,
Aug 5, 2024, 2:30:50 AM8/5/24
to fugamilto
Ihave a java spring boot application A that has dependency B which is a third party jar. B in turn has dependency C. When people need upgrade C (say from v1.0 to v2.0), a common approach is that in pom.xml of A, using Maven exclusion feature to exclude C from B, then either declare C-v2.0 as a direct dependency, or add C-v2.0 to dependencyManagement section.

This approach doesn't guarantee work in all situations. An example is org.glassfish.metro:webservices-rt:2.4.3 has dependency woodstox-core:5.1.0 which contains high security vulnerabilities and need to upgrade to 6.4.0.


My project A has (direct)dependency webservices-rt:2.4.3. Applying above approach doesn't exclude woodstox-core:5.1.0 from my project. Note: the maven dependency tree doesn't show woodstox-core:5.1.0 any more, but Aqua Scan still indicates that webservices-rt has dependency woodstox-core:5.1.0.


The current woodstox version used has been renamed after 4.x and continues as woodstox-core. The reason this update is relevant is because this way axis2 can use the stax2-api version 4.x (currently 3.x), which is necessary for axis2 to be compatible with some important libraries like the newer jackson versions.


Woodstox 3.2.x is no longer maintained. Starting with version 1.2.14, Axiom depends on Woodstox 4.1.x, although using 3.2.x (and 4.0.x) is still supported. This may have an impact on projects that use Maven, because the artifact ID used by Woodstox changed from wstx-asl to woodstox-core-asl. These projects may need to update their dependencies to avoid depending on two different versions of Woodstox.


In contrast to previous versions, the OMFactory implementations for DOOM are stateless in Axiom 1.2.14. This makes it easier to write application code that is portable between LLOM and DOOM (in the sense that code that is known to work with LLOM will usually work with DOOM without changes). However, this slightly changes the behavior of DOOM with respect to owner documents, which means that in some cases existing code written for DOOM may trigger WRONG_DOCUMENT_ERR exceptions if it uses the DOM API on a tree created or manipulated using the Axiom API.


I have an XML Parser in Java using WoodStox that I wrote. This parser is going to be parsing through extremely large files, could be 5+GB. The goal of the parser is to convert a nest XML file into a CSV. The XML file is going to be formatted in such a way where there will be a 'rowTag' that has the actual information that the parser is interested in. Take for example XML file:


My goal is to make this faster and/or consume less memory. Any other advice is appreciated, of course. I am executing it using -Xms4g -Xmx4g tags. Right now, it takes around 25 seconds to run on an xml file that is 1.5Gb approximately.


First of all there's a newer version of Woodstox on Maven Central.

Gradle dependency: implementation 'com.fasterxml.woodstox:woodstox-core:6.0.3'

They now have XMLStreamReader2 with .configureForSpeed() option. I didn't really check what it does, but for my test it didn't do much.


Core thing for XML parsing speed (apart from just good io code) is to not allocate unnecessary garbage, the parser should not be allocating Strings all the time to just do a comparison or give you an array of tag's attributes. Javolution does exactly that using an internal sliding buffer and refernecing it. Like a java.lang.CharSequence, called CharArray in javolution. It's important to use CharArray#contentEquals() when comparing to Strings to avoid extra String creation.


it implements Stax API (as well as Stax2 extension, SAX); and as long as you do not need full DTD handling (which I suspect you don't) has the feature set you need.For common read use cases I think it can be 30-40% faster; but most importantly it should be very easy to just try out.


And XMLInputFactory implementation com.fasterxml.aalto.stax.InputFactoryImpl. I would recommend creating instance directly, instead of using XMLInputFactory.newInstance() so you can sure of the exact implementation you have (if you have multiple Stax implementations in classpath, choice is arbitrary).

3a8082e126
Reply all
Reply to author
Forward
0 new messages