DOM benchmarks

58 views
Skip to first unread message

admin

unread,
Nov 16, 2020, 2:14:16 PM11/16/20
to css4j
Yesterday I performed a few benchmarks probing DOM performance, and they are now available at the benchmarks repository at Github. Several DOM implementations are tested: css4j-dom4j, plain DOM4J, Css4j's native DOM and finally the DOM implementation that comes with the JDK (a Xerces DOM fork).

Currently, the tests only measure the speed at which a document is parsed onto a DOM implementation. One test uses the validator.nu HTML5 parser to parse a small (38 KB) HTML file, while the other two parse XML files with the SAX parser that comes bundled with the JDK; one parses a small (38 KB) XHTML file (not the same one as in the HTML test), while the other parses a larger (1 MB) XML file.

I performed another test (which is not shown here) to parse the small XHTML file with the validator.nu HTML5 parser, and found out that the HTML parser causes nearly a 40% slowdown compared to parsing the same file with the XML parser.

The tests are run on 4 CPU cores.


HTML build
A small HTML file (the Css4j usage guide) is parsed with the HTML5 parser. Results (higher are better):

Implementation    Mode   Cnt    Score   Error  Units
Css4j-DOM4J       thrpt   32  321,760 ▒ 7,279  ops/s
Css4j DOM         thrpt   32  309,080 ▒ 2,576  ops/s
JDK               thrpt   32  359,725 ▒ 11,114 ops/sHTMLBuildBenchmark.png

Stand-alone DOM4J could not be tested as it is not enough DOM-compliant to be used with the HTML parser.

XML build (38 KB file)
Results (higher are better):
Implementation    Mode   Cnt    Score   Error  Units
Css4j-DOM4J       thrpt   32  612,553 ▒ 1,988  ops/s
Css4j DOM         thrpt   32  505,988 ▒ 5,809  ops/s
DOM4J             thrpt   32  672,043 ▒ 2,660  ops/s
Jdk               thrpt   32  696,178 ▒ 2,391  ops/s
XMLBuildBenchmark38K.png

XML build (1 MB file)
Results (higher are better):
Implementation    Mode   Cnt    Score   Error  Units
Css4jDOM4J        thrpt   32   88,693 ▒ 3,131  ops/s
DOM               thrpt   32   64,077 ▒ 1,827  ops/s
DOM4J             thrpt   32  114,941 ▒ 0,839  ops/s
Jdk               thrpt   32  136,875 ▒ 1,183  ops/s
XMLBuildBenchmark1M.png

Profiling
I did some profiling to identify performance bottlenecks, and the results were interesting. JMH allows some basic profiling with command lines like:

 java -jar build/benchmarks.jar XMLBuildBenchmark -prof stack:lines=5;top=3;detailLine=true;period=1

and something stands out, whenever DOM4J is involved (either stand-alone or in css4j-dom4j):

Secondary result "io.sf.carte.doc.style.css.mark.XMLBuildBenchmark.markBuildDOM4J: stack":
Stack profiler:

....[Thread state distributions]....................................................................
 72,3%         BLOCKED
 27,6%         RUNNABLE

....[Thread state: BLOCKED].........................................................................
 72,3% 100,0% java.util.Collections$SynchronizedMap.get
              org.dom4j.tree.QNameCache.get
              org.dom4j.DocumentFactory.createQName
              org.dom4j.tree.NamespaceStack.createQName
              org.dom4j.tree.NamespaceStack.pushQName

Yes, DOM4J has some contention problems. The performance on many-core systems is bad, as explained in DOM4J's issue #114, which claims a 6x improvement on a 64-core machine when replacing the original QNameCache with a non-synchronized version. None of the other contenders shows a similar issue with a BLOCKED state in the current benchmarks.

The performance of Css4j's native DOM is disappointing though, and the profiling shows one cause:

....[Thread state: RUNNABLE]........................................................................
 72,3%  72,5% java.lang.String.intern
              io.sf.carte.doc.dom.DOMDocument.createElementNS
              io.sf.carte.doc.dom.DOMDocument.createElementNS
              io.sf.carte.doc.dom.XMLDocumentBuilder$MyContentHandler.startElement
              com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement

 26,2%  26,3% java.lang.String.intern
              io.sf.carte.doc.dom.DOMDocument.createAttributeNS
              io.sf.carte.doc.dom.XMLDocumentBuilder$MyContentHandler.setAttributes
              io.sf.carte.doc.dom.XMLDocumentBuilder$MyContentHandler.startElement
              com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement

That's because the native DOM interns the local names of elements and attributes to use less memory (and be a bit faster in some operations).

I prepared a fe-performance branch that does not intern strings (only namespace URIs) and the speed improved by about 4%. I plan more improvements to that experimental branch, because the native DOM has more features than the other implementations and can be allowed to be a bit slower, but probably not by that much.

admin

unread,
Nov 21, 2020, 12:03:50 PM11/21/20
to css4j
I merged the performance improvements in the native DOM and prepared an updated set of benchmarks, then created a new "Benchmarks" section in the website:


You can find the new results there, as well as updated SAC benchmarks comparing the SAC implementation in css4j 1.x with other SAC parsers. For a glimpse, here is the new XML build benchmark graphic:

xml-build.png

which looks a bit better (for the native DOM) than the previous one (shown in the initial post of this thread). If you look at the above chart (and all the DOM benchmark charts in the website), it becomes apparent that the JDK's DOM is now slower, and that could be caused by the different JDK used in the tests: the Oracle JDK version 8 was used for the benchmarks shown in the initial post, while AdoptOpenJDK v.15 was used in the above one (and in the website's charts).

A glimpse of one of the SAC benchmarks:

sac-small.png

Yes, Batik's simple SAC parser is 45 times faster than the SteadyState CSSParser, and only 1.7 times the speed of css4j's parser. If a larger CSS file is used, the performance gap with the SS CSSParser grows.

admin

unread,
Nov 24, 2020, 8:59:48 AM11/24/20
to css4j
More results have been added to the DOM benchmarks page, this time about DOM traversal and modification. The JDK DOM is the absolute performance winner.
Reply all
Reply to author
Forward
0 new messages