Avoiding two-level atom lookups by violating atomicity

Henri Sivonen

unread,

Oct 26, 2009, 10:41:53 AM10/26/09

to

When enabling the HTML5 parser to create atoms off the main thread
(https://bugzilla.mozilla.org/show_bug.cgi?id=514661) I was thinking in
terms of breaking the rules as little as possible while still avoiding
locking.

It occurred to me that less code is needed if I approached the problem
from the point of view of what's the minimum that needs to happen so
that the parser doesn't break.

It turns out that guaranteeing atomicity across all atoms per parser per
thread is rather useless. That is, the property that a global static
atom table provides is rather useless. The parser only needs atomicity
across element names per parser per thread, atomicity across attribute
names per parser per thread and that the string "html" atomicizes to the
static atom for "html" as the doctype name.

OTOH, satisfying only the necessary constraints would violate the
expectations of how nsIAtom works even more--potentially handing a
footgun to whoever comes and changes the HTML5 parser later.

Should I simplify the code here when simplification would make atoms in
the HTML5 parser violate the nsIAtom assumptions even more?

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/

Jonas Sicking

unread,

Nov 12, 2009, 11:38:48 PM11/12/09

to

Why do you need to use nsIAtoms at all? The whole point of using
nsIAtoms is so that they maintain their identity with all other code
that uses atoms.

What do you use these atoms for?

/ Jonas

Henri Sivonen

unread,

Nov 16, 2009, 4:10:12 AM11/16/09

to

In article <p4adnXCHzt4kf2HX...@mozilla.org>,
Jonas Sicking <jo...@sicking.cc> wrote:

> Henri Sivonen wrote:
> > When enabling the HTML5 parser to create atoms off the main thread
> > (https://bugzilla.mozilla.org/show_bug.cgi?id=514661) I was thinking in
> > terms of breaking the rules as little as possible while still avoiding
> > locking.
> >
> > It occurred to me that less code is needed if I approached the problem
> > from the point of view of what's the minimum that needs to happen so
> > that the parser doesn't break.
> >
> > It turns out that guaranteeing atomicity across all atoms per parser per
> > thread is rather useless. That is, the property that a global static
> > atom table provides is rather useless. The parser only needs atomicity
> > across element names per parser per thread, atomicity across attribute
> > names per parser per thread and that the string "html" atomicizes to the
> > static atom for "html" as the doctype name.
> >
> > OTOH, satisfying only the necessary constraints would violate the
> > expectations of how nsIAtom works even more--potentially handing a
> > footgun to whoever comes and changes the HTML5 parser later.
> >
> > Should I simplify the code here when simplification would make atoms in
> > the HTML5 parser violate the nsIAtom assumptions even more?
>
> Why do you need to use nsIAtoms at all?

(We discussed this in email before I implemented the scheme used in the
pending off-the-main-thread patches.)

When the identifiers represented by these atoms exit the HTML5 parser,
they need to be nsIAtoms. The vast majority of these identifiers are
currently static atoms, so using nsIAtoms for scoped interned strings
inside the HTML5 parser makes the migration from the HTML5 parser's
interned strings into nsIAtoms simple in the most common case:
One virtual method call to IsStaticAtom() to decide that the object can
be passed outside the HTML5 parser as-is.

If the HTML5 parser had its own interned string type (or multiple
interned string types for different roles), either the common case would
require another lookup to migrate to nsIAtom or the static non-nsIAtom
interned strings for the HTML5 parser would need to wrap the
corresponding nsIAtoms.

There's a spectrum from more tricky and less bloated/inefficient to less
tricky and more bloated/inefficient as far as potential solutions go.

The least tricky way of doing things would be making the global nsIAtom
table thread-safe again by putting a mutex around it, but that was an
unwanted solution. The most tricky way of doing things is what I wrote
in the quoted part above.

I thought we already settled on at least the amount of trickiness
already proposed in bug 514661. My question quoted above was about
whether even more trickiness should be used in a future iteration in
order to cut RAM usage and startup time.

> The whole point of using
> nsIAtoms is so that they maintain their identity with all other code
> that uses atoms.

The idea here is to keep the usual atom table thread-unsafe and,
therefore, to keep the more weakly atomic atoms from coming into contact
with the strongly atomic atoms by having a single exit point
(nsHtml5TreeOperation.cpp) where dynamic parser-scoped atoms are
re-obtained as normal dynamic atoms (and static atoms pass as
themselves).

> What do you use these atoms for?

Element local names, attributes local names, attribute prefixes and
doctype root element names.