PSA: Shipping new CSS parser

1,398 views
Skip to first unread message

Timothy Loh

unread,
Mar 27, 2015, 12:22:39 AM3/27/15
to blin...@chromium.org

Hey blink-dev, I plan to promote the new CSS Parser to stable in the next few days. This is an entirely hand-written parser, based on the CSS Syntax Module Level 3 spec, which will replace the existing Bison CSS Parser. The flag flip will make all CSS parsing use the new parser, aside from when the inspector uses a CSSParserObserver (I’ll add support for this soon). If all goes well, I intend to remove the Bison parser shortly after the M43 branch, which will allow simplifying some of the interactions between the parser implementation and the rest of Blink. Performance: I tested parsing performances of various stylesheets gathered from the internet on a Nexus 7 (2013 model). On most stylesheets, the new parser performed 15-25% faster. https://docs.google.com/a/chromium.org/spreadsheets/d/1pItJ74dODk6x26JNvp35OrCh6x8gtpviX-I9lt3VtVo/edit#gid=1791732769 Spec compliance: http://dev.w3.org/csswg/css-syntax The new implementation generally adheres to the spec very closely. The differences from the spec I’m aware of are all very small and match existing behaviour (we still create <unicode-range-token>s even though they were removed from the spec a few months ago, we don’t yet handle at-rules in declaration lists, we don’t drop descriptors with !important, we don’t yet insert comment tokens during serialization where needed). There are also a number of small behaviour changes from the Bison parser, which are listed at https://crrev.com/1015303002/. Future work: While the parser rewrite has been primarily a code-health effort, this also enables CSS Custom Properties work to proceed. This also opens up various possibilities in the future, such as off-main-thread parsing (some of the interfaces to the rest of the codebase use AtomicStrings, but everything else should be thread-safe), lazily parsing declaration blocks, and interruptible parsing. There’s also plenty of clean-up to do after removing the Bison parser (the largest of these is making the CSSPropertyParser use CSSParserTokens directly). Cheers, Timothy Loh

Mike Lawther

unread,
Mar 27, 2015, 1:24:12 AM3/27/15
to Timothy Loh, blin...@chromium.org
Good times! Happy to see this ship, and I won't be sad to see bison go.

Yoav Weiss

unread,
Mar 27, 2015, 4:01:19 AM3/27/15
to Mike Lawther, Timothy Loh, blin...@chromium.org
WOOT! This makes me extremely happy :D
And ~20% perf boost on mobile. /me got something in my eye...

Rune Lillesveen

unread,
Mar 27, 2015, 4:54:50 AM3/27/15
to Timothy Loh, blin...@chromium.org
On Fri, Mar 27, 2015 at 5:22 AM, Timothy Loh <tim...@chromium.org> wrote:
> Hey blink-dev, I plan to promote the new CSS Parser to stable in the next
> few days. This is an entirely hand-written parser, based on the CSS Syntax
> Module Level 3 spec, which will replace the existing Bison CSS Parser.

Yay!

--
Rune Lillesveen

Addy Osmani

unread,
Mar 27, 2015, 5:25:42 AM3/27/15
to blin...@chromium.org, tim...@chromium.org
This is amazing work. Equally as excited about the perf wins on mobile. Looking forward to us being able to explore off-main-thread parsing in the not too distant future :)

dio.r...@gmail.com

unread,
Mar 27, 2015, 5:58:50 AM3/27/15
to blin...@chromium.org, tim...@chromium.org
Awesome!

Alexis Menard

unread,
Mar 27, 2015, 8:04:48 AM3/27/15
to dio.r...@gmail.com, blink-dev, tim...@chromium.org
Very nice work Timothy!
> To unsubscribe from this group and stop receiving emails from it, send an
> email to blink-dev+...@chromium.org.

Chris Harrelson

unread,
Mar 27, 2015, 2:22:59 PM3/27/15
to Timothy Loh, blin...@chromium.org
Great work! A question though:

Since your parser adheres closely to the spec, what is the likelihood of web compatibility issues?

Chris

On Thu, Mar 26, 2015 at 9:22 PM, Timothy Loh <tim...@chromium.org> wrote:

Hey blink-dev, I plan to promote the new CSS Parser to stable in the next few days. This is an entirely hand-written parser, based on the CSS Syntax Module Level 3 spec, which will replace the existing Bison CSS Parser. The flag flip will make all CSS parsing use the new parser, aside from when the inspector uses a CSSParserObserver (I’ll add support for this soon). If all goes well, I intend to remove the Bison parser shortly after the M43 branch, which will allow simplifying some of the interactions between the parser implementation and the rest of Blink. Performance: I tested parsing performances of various stylesheets gathered from the internet on a Nexus 7 (2013 model). On most stylesheets, the new parser performed 15-25% faster. https://docs.google.com/a/chromium.org/spreadsheets/d/1pItJ74dODk6x26JNvp35OrCh6x8gtpviX-I9lt3VtVo/edit#gid=1791732769 Spec compliance: http://dev.w3.org/csswg/css-syntax The new implementation generally adheres to the spec very closely. The differences from the spec I’m aware of are all very small and match existing behaviour (we still create <unicode-range-token>s even though they were removed from the spec a few months ago, we don’t yet handle at-rules in declaration lists, we don’t drop descriptors with !important, we don’t yet insert comment tokens during serialization where needed). There are also a number of small behaviour changes from the Bison parser, which are listed at https://crrev.com/1015303002/. Future work: While the parser rewrite has been primarily a code-health effort, this also enables CSS Custom Properties work to proceed. This also opens up various possibilities in the future, such as off-main-thread parsing (some of the interfaces to the rest of the codebase use AtomicStrings, but everything else should be thread-safe), lazily parsing declaration blocks, and interruptible parsing. There’s also plenty of clean-up to do after removing the Bison parser (the largest of these is making the CSSPropertyParser use CSSParserTokens directly). Cheers, Timothy Loh

To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.

Timothy Loh

unread,
Mar 29, 2015, 11:35:58 PM3/29/15
to Chris Harrelson, nd...@chromium.org, blin...@chromium.org
On Sat, Mar 28, 2015 at 5:22 AM, Chris Harrelson <chri...@chromium.org> wrote:
Great work! A question though:

Since your parser adheres closely to the spec, what is the likelihood of web compatibility issues?

Chris

I don't expect any web-compat issues from these changes. Most of the changes are already handled correctly by Firefox, and the ones which aren't are edge cases that probably won't be depended upon (e.g. escaping in dimensions, "left: 16\70x")

On Sat, Mar 28, 2015 at 4:20 AM, Nat Duca <nd...@chromium.org> wrote:
Do we have a telemetry benchmark that captures parse times?  Presumably you could copy the layout benchmarks from benjhayden...
 
I had a look at the perf dashboard and there's at least blink_perf.parser/css-parser-yui and blink_perf.css/StyleSheetInsert-bootstrap. The tests blink_perf.parser/html-parser and blink_perf.parser/html5-full-render also showed a small improvement on some bots. Eventually we'll probably have a suite of benchmarks which report on parsing/selector matching/property application/etc.

Peter Kasting

unread,
Mar 30, 2015, 12:39:38 AM3/30/15
to Timothy Loh, blin...@chromium.org
This is great work.  As someone who's put a lot of time into warning fixes, the Bison-generated code is a source of warnings in our codebase that until now we've just had to suppress.

When you rip out the Bison code entirely, make sure to grep for any references in .gyp file warning sections, valgrind exception lists, and other "hidden" places where we might declare various functions or files as "known to be crappy".  I will be super happy to see all that stuff vanish!

PK

Nat Duca

unread,
Mar 30, 2015, 2:01:44 PM3/30/15
to Timothy Loh, Chris Harrelson, blin...@chromium.org
On Sat, Mar 28, 2015 at 4:20 AM, Nat Duca <nd...@chromium.org> wrote:
Do we have a telemetry benchmark that captures parse times?  Presumably you could copy the layout benchmarks from benjhayden...
 
I had a look at the perf dashboard and there's at least blink_perf.parser/css-parser-yui and blink_perf.css/StyleSheetInsert-bootstrap. The tests blink_perf.parser/html-parser and blink_perf.parser/html5-full-render also showed a small improvement on some bots. Eventually we'll probably have a suite of benchmarks which report on parsing/selector matching/property application/etc.

I was asking about a telemetry test, so that we could confirm that there are no parsing time regressions on key mobile sites, desktop sites, etc. It sounds like we're flying a bit blind here. The layout.py in telemetry/... is a good starting point for a telemetry benchmark.

Mike Lawther

unread,
Mar 31, 2015, 4:05:39 AM3/31/15
to Nat Duca, Timothy Loh, Chris Harrelson, blin...@chromium.org
Tim's original mail contained results of perf testing on a mobile device against ~60 test case style sheets gathered from top sites. 15-25% perf wins is pretty excellent for what was primarily a code health rewrite. There was one regression (-1.51% on fb4.css).

This could be turned into a telemetry benchmark to add to the existing ones that Tim mentioned above so we can look out for future regressions once this lands.

I don't think we are flying blind here with respect to testing the perf of this new code against the old code.

Eric Seidel

unread,
Mar 31, 2015, 9:45:45 AM3/31/15
to Timothy Loh, blin...@chromium.org
Very exciting.

Could you speak to your parser's performance for inline style?  I would presume it's significantly faster for element.style.cssText = "foo:bar;" and for lots of very short style-sheets.  The old parser was very bad at both.

Could you also speak to why you have a single outlier of being < 1% slower on fb4.css?  As discussed previously on this list, outliers are often a very useful source of insight. :)

Thank you so much for seeing this through.



On Thu, Mar 26, 2015 at 9:22 PM, Timothy Loh <tim...@chromium.org> wrote:

Elliott Sprehn

unread,
Apr 1, 2015, 12:58:13 AM4/1/15
to Timothy Loh, blin...@chromium.org
This is really great work! While the performance improvement is fantastic, having a parser we feel comfortable maintaining long term and iterating on is even more important. I've already heard from one engineer how much easier it was to add features to the new parser.

The detailed perf analysis you did is really great as well. I'd love to see more projects do similar testing.

btw, you might try running the fb4.css benchmark again, it looks like the "New" one has a lot of variance. Mostly it looks about the same as the old parser, but sometimes it's a fair bit worse. You may have gotten hit by the OS being loaded down or a background process descheduling you, or maybe there's some funny CPU cache stuff going on. Either way I don't think we we should let that block shipping.

On Thu, Mar 26, 2015 at 9:22 PM, Timothy Loh <tim...@chromium.org> wrote:

Daniel Bratell

unread,
Apr 1, 2015, 11:12:35 AM4/1/15
to blin...@chromium.org, Timothy Loh
On Fri, 27 Mar 2015 05:22:12 +0100, Timothy Loh <tim...@chromium.org> wrote:

Hey blink-dev, I plan to promote the new CSS Parser to stable in the next few days.


Me like!

One string management question: I learned a few years back that there are huge (many hundred KB, maybe MB) CSS files used by common sites because they embed fonts. At the same time most CSS documents are very close to 100% ASCII, but not always 100% (small unicode symbol characters come to mind).

Will large strings (fonts) be stored in ASCII if possible?

/Daniel

Timothy Loh

unread,
Apr 1, 2015, 10:02:41 PM4/1/15
to Daniel Bratell, blin...@chromium.org, Timothy Loh
On Thu, Apr 2, 2015 at 2:12 AM, Daniel Bratell <bra...@opera.com> wrote:
One string management question: I learned a few years back that there are huge (many hundred KB, maybe MB) CSS files used by common sites because they embed fonts. At the same time most CSS documents are very close to 100% ASCII, but not always 100% (small unicode symbol characters come to mind).

Will large strings (fonts) be stored in ASCII if possible?

Yep, in this case there's no difference to the Bison parser; we just store a view of the input during tokenization and the property parser makes a String object. Of the stylesheets I used to test performance, wp3.css and fb4.css both consisted of many long strings. Since we scan over <url-token>s at a similar speed as the Bison tokenizer and there is comparatively little other tokenization/parsing in these tests, the performance ended up very similar.

Daniel Bratell

unread,
Apr 2, 2015, 11:35:02 AM4/2/15
to Timothy Loh, blin...@chromium.org, Timothy Loh
The Bison parser used to (a year or so ago) make an additional buffer the size of the stylesheet if it encountered an UTF-16 character, unintentionally using 3 bytes per character. Just wanted to make sure that "feature" wasn't preserved. It shouldn't be since it was mostly a side-effect of Bison being ... cattle.

/Daniel





林作健

unread,
Jun 8, 2015, 10:44:22 AM6/8/15
to blin...@chromium.org, tim...@chromium.org

I don't get it, why bison generates a slower parser? And I skim the sources. A new tokenizer too? What's wrong with perfect hash based tokenizer? It supposes to be the fastest algorithm for a tokensizer.
If there is a improvement inside bison or flex, I think it's better to patch them too. It will help many people using them to generate parsers.
在 2015年3月27日星期五 UTC+8下午12:22:39,Timothy Loh写道:
Reply all
Reply to author
Forward
0 new messages