Revisiting IndexOf

Boris Zbarsky

unread,

Jun 22, 2009, 9:54:44 AM6/22/09

to

I was looking at some profiles recently that happened to spend a lot of
time in IndexOf because they thrashed our IndexOf cache. Specifically,
the use case (selection) had a range and then compared a whole bunch of
node+offset pairs to the range's endpoints. Due to the structure of the
markup (<textarea>, basically), the endpoint of the range was a sibling
of most of the nodes being tested and there were a _lot_ of nodes
involved (long text in the textarea).

One thought I had was caching the content offset of its endpoints and
all their ancestors in the range, but that leads to the question of how
to invalidate the cache. This seemed difficult.

Another thought I had involves fundamentally changing how IndexOf and
nsContentUtils::ComparePoints work. The proposal is as follows:

1) We add a floating point (or double; see below) member to nsINode.
Call it the nodeIndex, say.
2) Nodes that are not in a child list, including anonymous nodes, have
nodeIndex set to some flag value. Negative infinity seems like a
good choice for reasons to follow.
3) Nodes that are in a child list can have nodeIndex set to any finite
and non-NaN floating-point value, with the constraint that all the
nodes in a child list have distinct values of nodeIndex and that
sorting nodes by nodeIndex does not change the order of the child
list. In other words, a (parent, nodeIndex) pair uniquely
identifies a node in the parent's childlist and we can do binary
search on the child list given the nodeIndex.
4) IndexOf uses said binary search. We can either keep the current
indexof cache in a simplified form to keep .nextChild traversal O(N)
instead of O(N log N), or just get rid of it altogether. We know
our N is bounded above by (1<<32), so the slowdown is a known
quantity; some measuring can tell us whether the indexof cache is
still worth it. For nodes not in any child list, IndexOf can
fast-path to return -1 by checking for our flag value.
5) On removal of a node from the child list, nothing needs to be done
other than resetting its nodeIndex to the flag value.
6) On insertion at the start of the list, we can set the nodeIndex of
the new node to (the nodeIndex of the old start of the list - 1).
7) On append to the child list, we can set the nodeIndex of the new
node to (the nodeIndex ofthe old end of the list + 1).
8) On insertion into the list we can set the nodeIndex of the new node
to the arithmetic mean of the nodeIndices of the two nodes between
which it has been inserted.
9) Change ComparePoints to directly compare nodeIndex instead of doing
IndexOf. Using negative infinity as the "anonymous or disconnected
content" flag value will ensure that these come before any "real"
nodes just like they do now. The function already ensures that
both nodes have the same parent before doing IndexOf.
10) Due to the limited precision of floats and doubles, steps 6,7,8
will need a fallback mode for the case when adding/subtracting 1,
or averaging, gives the same value as one of the values we started
out with. In that case, we'll need to renumber the nodes around
the insertion location to give us more numeric range to work with.
This is the hardest part of the proposal to implement, honestly,
and the part I haven't completely worked out yet. For doubles,
a simple option is to walk the whole child list and renumber the
nodes using 1,...,N (their actual IndexOf indices). Since the
double exact-integer range covers all of PRUint32, this has no
overflow issues. For floats, we'd need to do something else.
Even for doubles we might want to walk less than the whole child
list.

Benefits of this setup:
1) Faster IndexOf in cases that currently blow out the cache.
2) Possibly faster ComparePoints due to not needing to IndexOf at all.

Drawbacks of this setup:
1) Extra memory used by nodes (4 or 8 bytes depending on the type
of nodeIndex).
2) Possibly slower IndexOf in cases with few nodes due to decreased
locality of reference (have to actually fetch the node's nodeIndex
instead of just comparing the pointer give to all the pointers in
the array). On the other hand, fewer compare operations...
3) The renumbering step can end up being triggered reasonably often.
To be precise, consider a testcase with three nodes, A, B, C,
that come in the child list in that order. Remove A and insert
after B. Then remove B and insert after A. Continue doing that.
Since we average with the nodeIndex of C every time, we will run
out of precision after a number of inserts equal to the number
of digits in our mantissa (52 for doubles, 23 for floats, iirc).
This doesn't seem like a terribly contrived testcase where web
stuff is concerned. It would be easy for the renumbering to end
up slow or complicated or both...

Additional notes:

* If we use doubles, then instead of adding/subtracting 1 at the
start/end of the list we can use a bit more of the number space by
adding/subtracting a bigger number. Otherwise by default we'd be using
lessthan the 52 digits we have available for integer indices. I'd have
to think about this a bit more carefully to see whether this would
actually help reduce how often we have to renumber.

Thoughts? Comments? Votes up/down? ;)

-Boris

Boris Zbarsky

unread,

Jun 22, 2009, 9:55:41 AM6/22/09

to

Oh, and I ran some trial patches with this setup on try server last
night. Performance metrics seemed unchanged; Tp_rss might have been up
0.5% or so.

-Boris

Smaug

unread,

Jun 23, 2009, 10:49:05 AM6/23/09

to

On 6/22/09 4:54 PM, Boris Zbarsky wrote:
> I was looking at some profiles recently that happened to spend a lot of
> time in IndexOf because they thrashed our IndexOf cache.

Bug numbers?

>
> Thoughts? Comments? Votes up/down? ;)

The setup doesn't sound too complicated, but would be great to
have some worst-case-tests and also best-case-tests -
at least to see where the time is spent currently and where with the
approach.

(I wish someone would make also nsIFrame tree handling fast, especially
getting the last child - that shows up in the profiles pretty often.)

-Olli

Boris Zbarsky

unread,

Jun 23, 2009, 11:05:08 AM6/23/09

to

Smaug wrote:
> On 6/22/09 4:54 PM, Boris Zbarsky wrote:
>> I was looking at some profiles recently that happened to spend a lot of
>> time in IndexOf because they thrashed our IndexOf cache.
> Bug numbers?

https://bugzilla.mozilla.org/show_bug.cgi?id=440641 comes to mind.

> The setup doesn't sound too complicated, but would be great to
> have some worst-case-tests and also best-case-tests -
> at least to see where the time is spent currently and where with the
> approach.

Yeah, indeed. I'm not quite sure what he worst-case for my proposal is
yet; that depends on the details of the renumbering algorithm.

I guess the question I have is whether this is worth pursuing at all (as
in, whether we're willing to add a double member to nsINode).

> (I wish someone would make also nsIFrame tree handling fast, especially
> getting the last child - that shows up in the profiles pretty often.)

https://bugzilla.mozilla.org/show_bug.cgi?id=233463 if you care... It's
hard to do because all sorts of code assumes that frames are a
singly-linked list with the owner holding a pointer to the first node
and does manual surgery on this list. If we actually used the
nsFrameList abstraction throughout when dealing with a frame list, that
would make it much easier to make the last frame fast; the changes would
be localized to nsFrameList.

It would probably also make it easier to do other changes to the way we
store frames... So if someone wants to pick it up, that would be great.

-Boris

Boris Zbarsky

unread,

Aug 27, 2009, 9:46:52 PM8/27/09

to

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=513169 on
investigating alternate storage strategies for our DOM.

-Boris