DOM benchmarks

admin

unread,

Nov 16, 2020, 2:14:16 PM11/16/20

to css4j

Yesterday I performed a few benchmarks probing DOM performance, and they are now available at the benchmarks repository at Github. Several DOM implementations are tested: css4j-dom4j, plain DOM4J, Css4j's native DOM and finally the DOM implementation that comes with the JDK (a Xerces DOM fork).

Currently, the tests only measure the speed at which a document is parsed onto a DOM implementation. One test uses the validator.nu HTML5 parser to parse a small (38 KB) HTML file, while the other two parse XML files with the SAX parser that comes bundled with the JDK; one parses a small (38 KB) XHTML file (not the same one as in the HTML test), while the other parses a larger (1 MB) XML file.

I performed another test (which is not shown here) to parse the small XHTML file with the validator.nu HTML5 parser, and found out that the HTML parser causes nearly a 40% slowdown compared to parsing the same file with the XML parser.

The tests are run on 4 CPU cores.

HTML build

A small HTML file (the Css4j usage guide) is parsed with the HTML5 parser. Results (higher are better):

Implementation Mode Cnt Score Error Units

Css4j-DOM4J thrpt 32 321,760 ▒ 7,279 ops/s

Css4j DOM thrpt 32 309,080 ▒ 2,576 ops/s

JDK thrpt 32 359,725 ▒ 11,114 ops/s

Stand-alone DOM4J could not be tested as it is not enough DOM-compliant to be used with the HTML parser.

XML build (38 KB file)

Results (higher are better):

Implementation Mode Cnt Score Error Units

Css4j-DOM4J thrpt 32 612,553 ▒ 1,988 ops/s

Css4j DOM thrpt 32 505,988 ▒ 5,809 ops/s

DOM4J thrpt 32 672,043 ▒ 2,660 ops/s

Jdk thrpt 32 696,178 ▒ 2,391 ops/s

XML build (1 MB file)

Results (higher are better):

Implementation Mode Cnt Score Error Units

Css4jDOM4J thrpt 32 88,693 ▒ 3,131 ops/s

DOM thrpt 32 64,077 ▒ 1,827 ops/s

DOM4J thrpt 32 114,941 ▒ 0,839 ops/s

Jdk thrpt 32 136,875 ▒ 1,183 ops/s

Profiling

I did some profiling to identify performance bottlenecks, and the results were interesting. JMH allows some basic profiling with command lines like:

java -jar build/benchmarks.jar XMLBuildBenchmark -prof stack:lines=5;top=3;detailLine=true;period=1

and something stands out, whenever DOM4J is involved (either stand-alone or in css4j-dom4j):

Secondary result "io.sf.carte.doc.style.css.mark.XMLBuildBenchmark.markBuildDOM4J: stack":

Stack profiler:

....[Thread state distributions]....................................................................

72,3% BLOCKED

27,6% RUNNABLE

....[Thread state: BLOCKED].........................................................................

72,3% 100,0% java.util.Collections$SynchronizedMap.get

org.dom4j.tree.QNameCache.get

org.dom4j.DocumentFactory.createQName

org.dom4j.tree.NamespaceStack.createQName

org.dom4j.tree.NamespaceStack.pushQName

Yes, DOM4J has some contention problems. The performance on many-core systems is bad, as explained in DOM4J's issue #114, which claims a 6x improvement on a 64-core machine when replacing the original QNameCache with a non-synchronized version. None of the other contenders shows a similar issue with a BLOCKED state in the current benchmarks.

The performance of Css4j's native DOM is disappointing though, and the profiling shows one cause:

....[Thread state: RUNNABLE]........................................................................

72,3% 72,5% java.lang.String.intern

io.sf.carte.doc.dom.DOMDocument.createElementNS

io.sf.carte.doc.dom.XMLDocumentBuilder$MyContentHandler.startElement

com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement

26,2% 26,3% java.lang.String.intern

io.sf.carte.doc.dom.DOMDocument.createAttributeNS

io.sf.carte.doc.dom.XMLDocumentBuilder$MyContentHandler.setAttributes

io.sf.carte.doc.dom.XMLDocumentBuilder$MyContentHandler.startElement

com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement

That's because the native DOM interns the local names of elements and attributes to use less memory (and be a bit faster in some operations).

I prepared a fe-performance branch that does not intern strings (only namespace URIs) and the speed improved by about 4%. I plan more improvements to that experimental branch, because the native DOM has more features than the other implementations and can be allowed to be a bit slower, but probably not by that much.

admin

unread,

Nov 21, 2020, 12:03:50 PM11/21/20

to css4j

I merged the performance improvements in the native DOM and prepared an updated set of benchmarks, then created a new "Benchmarks" section in the website:

https://css4j.github.io/benchmarks.html

You can find the new results there, as well as updated SAC benchmarks comparing the SAC implementation in css4j 1.x with other SAC parsers. For a glimpse, here is the new XML build benchmark graphic:

which looks a bit better (for the native DOM) than the previous one (shown in the initial post of this thread). If you look at the above chart (and all the DOM benchmark charts in the website), it becomes apparent that the JDK's DOM is now slower, and that could be caused by the different JDK used in the tests: the Oracle JDK version 8 was used for the benchmarks shown in the initial post, while AdoptOpenJDK v.15 was used in the above one (and in the website's charts).

A glimpse of one of the SAC benchmarks:

Yes, Batik's simple SAC parser is 45 times faster than the SteadyState CSSParser, and only 1.7 times the speed of css4j's parser. If a larger CSS file is used, the performance gap with the SS CSSParser grows.

admin

unread,

Nov 24, 2020, 8:59:48 AM11/24/20

to css4j

More results have been added to the DOM benchmarks page, this time about DOM traversal and modification. The JDK DOM is the absolute performance winner.

Reply all

Reply to author

Forward