SAC Parser Benchmarks

18 views

Skip to first unread message

admin

unread,

Sep 23, 2017, 4:28:51 PM9/23/17

to css4j

A simple set of benchmarks have been created that could be used to detect performance regressions in CSS4J, using the jmh tool. So far there is only a single file with six benchmarks, and can be found in the new css4j-mark git repository. Those six benchmarks are in fact two tests that are run with three different SAC parsers:

This library's own new SAC/NSAC parser.
Batik's parser.
Steadystate's cssparser.

The first three benchmarks (named markParseCSSStyleSheet, markParseCSSStyleSheetBatik and markParseCSSStyleSheetSSParser) measure the process of reading a style sheet and building a CSS Object Model representation of it, and its performance depends on the speed of this library's object model implementation and the performance of the SAC parser; while the last three (markSACParseCSSStyleSheet, markSACParseCSSStyleSheetBatik and markSACParseCSSStyleSheetSSParser) are essentially testing the raw speed of the SAC parsers (CSS4J's, Batik's and SS Cssparser) with an empty CSS handler, and give a glimpse of their relative performance.

The style sheet used in the process is the HTML5 default sheet (with nearly 100 rules) that the library uses as its default user agent style sheet. Being a default sheet, it has a lot of type and attribute selectors, but no ID nor class conditions, so it is not exactly an 'average' sheet, but still gives an idea of the performance differences among the parsers.

These are the results of two independent runs (same computer, different sessions) with the CSS4J 0.37 release (units are operations per second, so bigger is better):

# Run complete. Total time: 00:42:05


Benchmark                                       Mode  Cnt    Score   Error  Units
MyBenchmark.markParseCSSStyleSheet             thrpt  200  256,381 ± 2,498  ops/s
MyBenchmark.markParseCSSStyleSheetBatik        thrpt  200  454,101 ± 5,288  ops/s
MyBenchmark.markParseCSSStyleSheetSSParser     thrpt  200   67,151 ± 0,315  ops/s
MyBenchmark.markSACParseCSSStyleSheet          thrpt  200  384,928 ± 1,772  ops/s
MyBenchmark.markSACParseCSSStyleSheetBatik     thrpt  200  999,741 ± 5,836  ops/s
MyBenchmark.markSACParseCSSStyleSheetSSParser  thrpt  200   76,757 ± 0,230  ops/s

# Run complete. Total time: 00:42:04


Benchmark                                       Mode  Cnt     Score    Error  Units
MyBenchmark.markParseCSSStyleSheet             thrpt  200   255,803 ±  1,556  ops/s
MyBenchmark.markParseCSSStyleSheetBatik        thrpt  200   468,556 ±  4,698  ops/s
MyBenchmark.markParseCSSStyleSheetSSParser     thrpt  200    67,033 ±  0,218  ops/s
MyBenchmark.markSACParseCSSStyleSheet          thrpt  200   394,692 ±  1,768  ops/s
MyBenchmark.markSACParseCSSStyleSheetBatik     thrpt  200  1054,781 ± 10,514  ops/s
MyBenchmark.markSACParseCSSStyleSheetSSParser  thrpt  200    77,380 ±  0,310  ops/s

Given that the last three benchmarks measure the raw performance of the SAC parsers, the following can be found:

Batik's parser is by far the fastest, being between 13x and 14x times speedier than the SS Cssparser, and 2.6 times faster than this library's own parser.
The CSS4J native parser is in the middle, being slower than Batik's but five times faster than Cssparser.

The comparison is not totally fair, however, as Batik supports a subset of the features that the other parsers have (and the CSS4J parser is the one that supports more features, with full level 4 selectors and level 3 values including calc()). So in principle it is more advisable to use Cssparser, despite being slow, than Batik's. However, if your use case only requires simple level 2 selectors and values, and performance is the main consideration (like when CSS is used to specify the user interface of a Java application), Batik's parser could be the most adequate.

The above figures should not be a surprise, as the Steadystate Cssparser is an automatically generated parser intended to be robust, rather than fast, while Batik is manually written and focused on performance. The CSS4J parser stays in the middle: it is based on a CSS-agnostic micro-parser that sends micro-events (like words) to a handler that does know about CSS. This architecture allowed it to be written rapidly, but it may not reach the execution speeds of a full manually-tailored parser like Batik's.

As mentioned, the benchmarks should be useful to detect performance regressions; it was run against today's git tree, to test a patch related to memory footprint that was applied today. These were the results:

# Run complete. Total time: 00:42:04


Benchmark                                       Mode  Cnt     Score   Error  Units
MyBenchmark.markParseCSSStyleSheet             thrpt  200   256,253 ± 1,786  ops/s
MyBenchmark.markParseCSSStyleSheetBatik        thrpt  200   467,440 ± 4,592  ops/s
MyBenchmark.markParseCSSStyleSheetSSParser     thrpt  200    68,024 ± 0,150  ops/s
MyBenchmark.markSACParseCSSStyleSheet          thrpt  200   390,197 ± 2,390  ops/s
MyBenchmark.markSACParseCSSStyleSheetBatik     thrpt  200  1030,599 ± 6,078  ops/s
MyBenchmark.markSACParseCSSStyleSheetSSParser  thrpt  200    75,472 ± 0,501  ops/s

So there is no measurable impact (if any, it seems to be positive). With tests like these, it should be easier to detect regressions.

Vincent Massol

unread,

Sep 25, 2017, 5:05:46 AM9/25/17

to css4j

Thanks for sharing this. Very interesting and good that you now have a baseline for the future.

Reply all

Reply to author

Forward

0 new messages