Selector matching and caching (aka kill RuleProcessorData)

Boris Zbarsky

unread,

Jan 6, 2010, 5:04:47 PM1/6/10

to

I've been doing some more profiling of selector matching and looking at
webkit's implementation a bit, and one difference is that they don't
have an equivalent of our RuleProcessorData setup: their selector
matching uses only data that's available on the node itself, or caches
data on the node or on its RenderStyle. In particular, any cache used
there persists across possibly multiple style resolutions and is not
dynamically allocated.

I tried hacking up SelectorMatches/SelectorMatchesTree to just match
directly on the node instead of on RuleProcessorData and changed various
callsites to not actually allocate RuleProcessorData structs, more or
less. This sped up the SlickSpeed querySelector test by a good bit (40%
or so), as expected, since in that case we only match each data against
one selector so the caching is not worth it. I also tried the patch on
a complete reframe of the HTML5 single-page spec, and it looks like it's
a slight win there too (order of 50ms out of 1200ms). This test is more
interesting, since this is the situation the RuleProcessorData is
supposed to help with... however it might be that
namespace+tag+id+classes are fast enough to get and eliminate enough
selectors that in practice the more complicated caching isn't worth it,
at least on this page.

Now it _is_ possible to create testcases where such a non-caching
approach will be a lot slower than what we have now; I'm just not sure
how common they are in practice.

Looking in detail into what RuleProcessorData stores (after sdwilsh
lands his async history stuff) we have:

* mPresContext -- only needed to allocate the parent/prevsibling data,
so can go away if RuleProcessorData does.
* mContent -- would become an argument
* mParentContent -- cheap to get
* mRuleWalker -- only needed for the rulehash enumeration, NOT for
selector matching. Rulehash enumeration could keep
using a struct that has the rulewalker, prescontext
and maybe a few other things.
* mScopedRoot -- Need to figure out the right place to stash this.
Most importantly, this is tied to the rule being
matched, not to the element or selector.
* mContentTag -- cheap to get
* mContentID -- cheap enough to get; can be made cheaper
* mIsHTMLContent -- cheap to get
* mIsHTML -- involves a check on the document, but the document boolean
here is invariant across a wide range of things (e.g. all
of style resolution for a node, or an entire querySelector
invocation), so can be passed around to selector-matching
code explicitly.
* mHasAttributes -- in practice, cheap enough to get
* mCompatMode -- can be passed around like the document HTML boolean;
doesn't depend on element.
* mNameSpaceID -- cheap to get
* mClasses -- cheap enough; can be made cheaper
* mPreviousSiblingData -- would go away
* mParentData -- would go away
* mLanguage -- could just be computed each time in the rare cases it's
needed, I think. It _would_ be possible to write
pathological testcases that are slower as a result, but
I don't think we care.
* mNthIndices -- see below
* mContentState -- could be cached in the node, I think... Or we could
stop using a bitfield here and use the webkit setup
of explicit boolean getters plus casts to subclasses
in some cases (e.g. for the form control states).
Or we could just get it each time we need it (for
pseudo-class matches only). It's usually not THAT
expensive to compute, and not needed that much.

That leaves mNthIndices. As a first cut we could just not cache these,
but that can lead to a bit of pain if multiple :nth-child() selectors
are around that might all match a given node. That's the only case it
helps us right now, though, except in querySelector where we use the
previous sibling's indices to good effect. What Webkit does here is to
cache a 17-bit index in nodes (using some spare bits they have); they
only cache the :nth-child index, so there's no caching for nth-of-type
or either of the *-last pseudos. If we wanted to, we could move this
cache to either the element or to slots, but then invalidation becomes
an issue. Webkit invalidates by simply triggering a reresolve on the
parent on DOM mutations if one of these selectors was used; since they
reresolve along the DOM this works fine (earlier kids' indices are
recomputed beforel later ones, so the later ones can use the earlier
cache to figure out theirs). We reresolve along the frame tree, and XBL
makes it such that the order here does not match DOM order. So we would
not in fact have the needed earlier/later guarantee.

So apart from mNthIndices and maybe mContentState, I think this struct
can just go away. If we can figure out a good plan for mNthIndices, I
think we should just kill it off....

Thoughts?

-Boris

L. David Baron

unread,

Jan 6, 2010, 6:03:55 PM1/6/10

to dev-tec...@lists.mozilla.org

On Wednesday 2010-01-06 17:04 -0500, Boris Zbarsky wrote:
> I tried hacking up SelectorMatches/SelectorMatchesTree to just match
> directly on the node instead of on RuleProcessorData and changed
> various callsites to not actually allocate RuleProcessorData
> structs, more or less. This sped up the SlickSpeed querySelector
> test by a good bit (40% or so), as expected, since in that case we
> only match each data against one selector so the caching is not
> worth it. I also tried the patch on a complete reframe of the HTML5
> single-page spec, and it looks like it's a slight win there too
> (order of 50ms out of 1200ms). This test is more interesting, since
> this is the situation the RuleProcessorData is supposed to help
> with... however it might be that namespace+tag+id+classes are fast
> enough to get and eliminate enough selectors that in practice the
> more complicated caching isn't worth it, at least on this page.

Sounds good to me. When RuleProcessorData was created (and it was a
big performance win at the time), there was much more XPCOM involved
in getting the various pieces of information involved.

> That leaves mNthIndices. As a first cut we could just not cache

Maybe we could cache (somewhere?) the indices for the two elements
that needed them most recently? That would let us do prev-sibling
optimizations and node-matches-multiple-selectors optimizations.

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

Boris Zbarsky

unread,

Jan 6, 2010, 7:51:57 PM1/6/10

to

On 1/6/10 6:03 PM, L. David Baron wrote:
> Sounds good to me. When RuleProcessorData was created (and it was a
> big performance win at the time), there was much more XPCOM involved
> in getting the various pieces of information involved.

Sure. I'm not saying we didn't need it before!

>> That leaves mNthIndices. As a first cut we could just not cache
>
> Maybe we could cache (somewhere?) the indices for the two elements
> that needed them most recently? That would let us do prev-sibling
> optimizations and node-matches-multiple-selectors optimizations.

Most simply, the document could have a two-element cache. Each element
would have a pointer to an nsIContent and its indices. When an
nsIContent is destroyed or changes ownerDocument, it would check the
cache and if one of the cached pointers is to itself it would clear the
cache.

We'd probably also need to clear the cache any time there's a DOM
mutation or something.

Seem ok?

-Boris

L. David Baron

unread,

Jan 6, 2010, 9:02:47 PM1/6/10

to dev-tec...@lists.mozilla.org

Sounds fine to me, if that's easy to implement.

(I'd note that this would allow us to maintain all the different
indices that we have now.)

Boris Zbarsky

unread,

Jan 6, 2010, 10:34:10 PM1/6/10

to

On 1/6/10 9:02 PM, L. David Baron wrote:
> Sounds fine to me, if that's easy to implement.

I think it should be pretty simple.

> (I'd note that this would allow us to maintain all the different
> indices that we have now.)

Yep, exactly.

I'll wait for Shawn to land his stuff, then do this.

-Boris

Boris Zbarsky

unread,

Jan 8, 2010, 10:41:05 AM1/8/10

to

On 1/6/10 5:04 PM, Boris Zbarsky wrote:
> I've been doing some more profiling of selector matching and looking at
> webkit's implementation a bit, and one difference is that they don't
> have an equivalent of our RuleProcessorData setup: their selector
> matching uses only data that's available on the node itself, or caches
> data on the node or on its RenderStyle. In particular, any cache used
> there persists across possibly multiple style resolutions and is not
> dynamically allocated.

I lied about the multiple style resolutions thing, at least for the
RenderStyle cache. Their RenderStyle is sort of like our nsStyleContext
(but not shared between nodes, looks like), and the node gets a new one
every time its style is reresolved. Which is why they don't have
dynamic update bugs in that stuff, but it also means the cache is only
for the duration of a single style resolution.

I don't think that much affects our plan here. ;)

-Boris

Boris Zbarsky

unread,

Jan 17, 2010, 12:30:44 AM1/17/10

to

On 1/6/10 6:03 PM, L. David Baron wrote:
>> That leaves mNthIndices. As a first cut we could just not cache
>
> Maybe we could cache (somewhere?) the indices for the two elements
> that needed them most recently? That would let us do prev-sibling
> optimizations and node-matches-multiple-selectors optimizations.

I thought about it some more, and it _sort_ of does. It doesn't for the
querySelector(":nth-child()") case, because you would match a node, then
all its kids (thrashing your index cache) and only _then_ its next sibling.

For normal style computation (from frame construction) it would work a
bit better, since we do all the kids one after the other building our
frame construction item list, except for inline kids (where we descend
into all inline descendants).

I'll still do it and see how it goes. In practice, the most common use
of these is likely to be for table rows and such, where the "normal
style computation" case would be hit. The most likely querySelector
issue, I suspect, is benchmarks and not production code. And the
benchmarks use DOMs that are small enough that the caching is not that
relevant, I suspect. I'll have to check.

-Boris

Boris Zbarsky

unread,

Jan 19, 2010, 12:36:21 AM1/19/10

to

On 1/17/10 12:30 AM, Boris Zbarsky wrote:
> I'll still do it and see how it goes. In practice, the most common use
> of these is likely to be for table rows and such, where the "normal
> style computation" case would be hit.

OK, I tried doing this (as in, implement the nth-index cache, and get
rid of RuleProcessorData usage in selector matching altogether). One
issue is that the way we currently compute the nth index means that the
cache is basically never hit for anything but multiple :nth-* selectors
for the same element; in particular going back to a previous sibling is
actually somewhat expensive (involves an IndexOf), so we don't do it in
GetNthIndex.... and hence can't benefit from having our previous
sibling's index cached. That's about where we are right now for style
matching, I think, but we do better on querySelector at the moment. I
think we could change to use IndexOf, but I'm not sure it would be much
of a win, if any. Perhaps worth revisiting once we have a linked-list
DOM child storage.

The other issue I ran into is that IntrinsicState() can actually be
somewhat expensive if called repeatedly without caching the result (e.g.
the QI+addref+release that Link::LinkState does accounts for about 10%
of the time on reframe of the HTML5 page with the patches in my tree).

Would people be ok with adding an mIntrinsicState to nsGenericElement,
say, and caching the state? We know all the places the state might
change anyway (they're the ones that currently call
ContentStatesChanged), so those could be responsible for updating the
cached state.

It might also be possible to remove the QI from LinkState by making
bind/unbind on Link subclasses do more work to flag the in-document
state on the Link. Would that be preferable to caching the state? It
might be good enough to at least not regress performance on the HTML5
reframe test.

-Boris

Shawn Wilsher

unread,

Jan 19, 2010, 1:08:22 PM1/19/10

to

On Jan 18, 9:36 pm, Boris Zbarsky <bzbar...@mit.edu> wrote:
> The other issue I ran into is that IntrinsicState() can actually be
> somewhat expensive if called repeatedly without caching the result (e.g.
> the QI+addref+release that Link::LinkState does accounts for about 10%
> of the time on reframe of the HTML5 page with the patches in my tree).

It would be pretty easy for mozilla::dom::Link to cache a pointer to
the nsIContent.

Cheers,

Shawn

Boris Zbarsky

unread,

Jan 19, 2010, 1:22:00 PM1/19/10

to

On 1/19/10 1:08 PM, Shawn Wilsher wrote:
> It would be pretty easy for mozilla::dom::Link to cache a pointer to
> the nsIContent.

Hmm. That might be the best way to go short-term, yeah.

It might still be worth caching the state.

-Boris