Thread-safe allocation and deallocation

Henri Sivonen

unread,

Mar 30, 2009, 9:08:59 AM3/30/09

to

I've pondered off-the-main-thread issues in the context of the HTML5
parser. I don't know how memory allocation and deallocation works across
threads in Gecko. Hence, some questions:

Are plain |new| and |delete| always thread-safe given the underlying
allocator in Gecko? If not, is there some boilerplate for importing
thread-safe variants?

If they are thread-safe, how is thread contention avoided upon
deallocation if another thread owns the malloc pool (or something like
that)? Is there a deferred deallocation mechanism that doesn't acquire a
lock cross-thread for each deallocation?

Are addref/release on nsISupport-derived objects (including cycle
collecting ones) thread-safe?

Do strings manage their backing buffer ownership in a thread-safe way?

Would it be feasible to make nsNodeInfoManager thread-safe (for
off-the-main-thread DOM node allocation)? (It seems non-thread-safe upon
first look.) Alternatively: Is it important that each document has only
one nodeinfo hashtable? Would be feasible to have a split
nsNodeInfoManager with separate hashtables for on-the-main-thread and
off-the-main-thread nodeinfo creation/lookups?

--
Henri Sivonen
hsiv...@iki.fi
http://hsivonen.iki.fi/

Robert O'Callahan

unread,

Mar 30, 2009, 8:23:23 AM3/30/09

to

On 31/3/09 2:08 AM, Henri Sivonen wrote:
> Are plain |new| and |delete| always thread-safe given the underlying
> allocator in Gecko? If not, is there some boilerplate for importing
> thread-safe variants?

They are thread-safe.

> If they are thread-safe, how is thread contention avoided upon
> deallocation if another thread owns the malloc pool (or something like
> that)? Is there a deferred deallocation mechanism that doesn't acquire a
> lock cross-thread for each deallocation?

I don't know, but it definitely depends on the allocator. There might be
one answer for jemalloc and another answer for the Mac allocator.

> Are addref/release on nsISupport-derived objects (including cycle
> collecting ones) thread-safe?

They are if you use NS_IMPL_THREADSAFE_ISUPPORTS, otherwise they aren't.

Rob

Neil

unread,

Mar 30, 2009, 11:37:21 AM3/30/09

to

Robert O'Callahan wrote:

> On 31/3/09 2:08 AM, Henri Sivonen wrote:
>
>> Are plain |new| and |delete| always thread-safe given the underlying
>> allocator in Gecko? If not, is there some boilerplate for importing
>> thread-safe variants?
>
> They are thread-safe.
>
>> If they are thread-safe, how is thread contention avoided upon
>> deallocation if another thread owns the malloc pool (or something
>> like that)? Is there a deferred deallocation mechanism that doesn't
>> acquire a lock cross-thread for each deallocation?
>
> I don't know, but it definitely depends on the allocator. There might
> be one answer for jemalloc and another answer for the Mac allocator.

And then there are the developers who have to use the MSVC allocator...

>> Are addref/release on nsISupport-derived objects (including cycle
>> collecting ones) thread-safe?
>
> They are if you use NS_IMPL_THREADSAFE_ISUPPORTS, otherwise they aren't.

Which basically excludes cycle-collecting ones, of course.

--
Warning: May contain traces of nuts.

Jonas Sicking

unread,

Mar 30, 2009, 9:58:30 PM3/30/09

to

Henri Sivonen wrote:
> Do strings manage their backing buffer ownership in a thread-safe way?

Yes. Well.. it depends on how you are planning on using it. The
refcounting of the buffer is done in a threadsafe way. However you can't
write to a string on one thread, while reading from it on another.

> Would it be feasible to make nsNodeInfoManager thread-safe (for
> off-the-main-thread DOM node allocation)? (It seems non-thread-safe upon
> first look.) Alternatively: Is it important that each document has only
> one nodeinfo hashtable? Would be feasible to have a split
> nsNodeInfoManager with separate hashtables for on-the-main-thread and
> off-the-main-thread nodeinfo creation/lookups?

Had a conversation with jst about this.

The main problem with what you are proposing is that you can't create
atoms when you're off the main thread.

But what we could do is to not give the node a nodeinfo at all (give it
a dummy nodeinfo), then store the node name outside of the node. This
information would have to make its way into the parse event that's
eventually sent to the main thread.

Once we're on the main thread we grab the correct nodeinfo and push that
into the node.

/ Jonas

Henri Sivonen

unread,

Mar 31, 2009, 11:30:37 AM3/31/09

to

In article <E6Odnc49P6rD5EzU...@mozilla.org>,
Jonas Sicking <jo...@sicking.cc> wrote:

> Henri Sivonen wrote:
> > Do strings manage their backing buffer ownership in a thread-safe way?
>
> Yes. Well.. it depends on how you are planning on using it. The
> refcounting of the buffer is done in a threadsafe way. However you can't
> write to a string on one thread, while reading from it on another.

The intent is to move attribute values, doctype public and system ids,
text node content and comment node content across the thread boundary by
letting a string created on one thread by Assign()ed to another on
another thread. There'd be no concurrent access to any given string
backing buffer.

> > Would it be feasible to make nsNodeInfoManager thread-safe (for
> > off-the-main-thread DOM node allocation)? (It seems non-thread-safe upon
> > first look.) Alternatively: Is it important that each document has only
> > one nodeinfo hashtable? Would be feasible to have a split
> > nsNodeInfoManager with separate hashtables for on-the-main-thread and
> > off-the-main-thread nodeinfo creation/lookups?
>
> Had a conversation with jst about this.
>
> The main problem with what you are proposing is that you can't create
> atoms when you're off the main thread.

That's bad. Fortunately, most atoms are well-known and are static atoms
created at apps startup. This includes all atoms for conforming
documents except atoms for HTML5 data-* attribute names.

When the tokenizer does need to create an atom for a name that isn't
well-known ahead of compile time, it seems very inconvenient not to be
able to do it using a method call that returns the atom pointer
synchronously. Would it be acceptable to make the parser thread
synchronize with the main thread to obtain an atom in this case? This
would happen for data-* attributes and for random non-conforming cruft.
Non-random non-conforming stuff such as MS Office <o:p> or the most
common Dojo custom attributes could be made 'well-known' for token
interning purposes.

> But what we could do is to not give the node a nodeinfo at all (give it
> a dummy nodeinfo), then store the node name outside of the node. This
> information would have to make its way into the parse event that's
> eventually sent to the main thread.
>
> Once we're on the main thread we grab the correct nodeinfo and push that
> into the node.

In that case, wouldn't it make sense to use some kind of node proxy
handles on the parser thread and defer the entire node allocation to the
main thread? In this case, instead of working with nsIContent*, the
parser would work with some kind of nsIContentProxyHandle* and an
nsIContentProxyHandle would have an nsIContent* field that starts out as
null and a refcount that can be incremented and decremented across
threads.

This would add complexity:

1) Another layer of indirection (the proxy handles vs. direct
nsIContent*)

2) Attributes would have to be transferred across the thread boundary
in a holder other than the element node itself. (In practice, the holder
would be the attribute holder from the tokenizer. Currently it is
recycled for each token, but if it traveled across the thread boundary,
a new attribute holder would have to be allocated for each start tag
token that has attributes.)

3) More items (deferred node creation operations) to put into the tree
builder operation queue.

In article <X9CdneLHy-qvek3U...@mozilla.org>,
Neil <ne...@parkwaycc.co.uk> wrote:

> >> Are addref/release on nsISupport-derived objects (including cycle
> >> collecting ones) thread-safe?
> >

> > They are if you use NS_IMPL_THREADSAFE_ISUPPORTS, otherwise they aren't.
>
> Which basically excludes cycle-collecting ones, of course.

This seems like another reason to defer node creation entirely. Stepping
through an element node constructor seemed to involve cycle collecting
stuff. :-(

Chris Jones

unread,

Mar 31, 2009, 12:28:30 PM3/31/09

to

> If they are thread-safe, how is thread contention avoided upon
> deallocation if another thread owns the malloc pool (or something like
> that)? Is there a deferred deallocation mechanism that doesn't acquire a
> lock cross-thread for each deallocation?
>

This is definitely something we should look into. There's a nice
"thread-caching allocator" (tcmalloc) from Google:

http://goog-perftools.sourceforge.net/doc/tcmalloc.html

The idea behind it is pretty simple, so we should be able to integrate
it into jemalloc fairly easily (or just use tcmalloc directly) for
linux systems at least. Might also work on OS X. Windows I don't
know about ...

Cheers,
Chris

Chris Jones

unread,

Mar 31, 2009, 2:18:50 PM3/31/09

to

Please ignore this. jemalloc already appears to do something similar
to tcmalloc, with some extra goodies. Benchmarking the two could be
interesting, though.

Cheers,
Chris

Jonas Sicking

unread,

Mar 31, 2009, 6:12:07 PM3/31/09

to

Henri Sivonen wrote:
> In article <E6Odnc49P6rD5EzU...@mozilla.org>,
> Jonas Sicking <jo...@sicking.cc> wrote:
>
>> Henri Sivonen wrote:
>>> Do strings manage their backing buffer ownership in a thread-safe way?
>> Yes. Well.. it depends on how you are planning on using it. The
>> refcounting of the buffer is done in a threadsafe way. However you can't
>> write to a string on one thread, while reading from it on another.
>
> The intent is to move attribute values, doctype public and system ids,
> text node content and comment node content across the thread boundary by
> letting a string created on one thread by Assign()ed to another on
> another thread. There'd be no concurrent access to any given string
> backing buffer.

Cool, should be fine then.

>>> Would it be feasible to make nsNodeInfoManager thread-safe (for
>>> off-the-main-thread DOM node allocation)? (It seems non-thread-safe upon
>>> first look.) Alternatively: Is it important that each document has only
>>> one nodeinfo hashtable? Would be feasible to have a split
>>> nsNodeInfoManager with separate hashtables for on-the-main-thread and
>>> off-the-main-thread nodeinfo creation/lookups?
>> Had a conversation with jst about this.
>>
>> The main problem with what you are proposing is that you can't create
>> atoms when you're off the main thread.
>
> That's bad. Fortunately, most atoms are well-known and are static atoms
> created at apps startup. This includes all atoms for conforming
> documents except atoms for HTML5 data-* attribute names.
>
> When the tokenizer does need to create an atom for a name that isn't
> well-known ahead of compile time, it seems very inconvenient not to be
> able to do it using a method call that returns the atom pointer
> synchronously. Would it be acceptable to make the parser thread
> synchronize with the main thread to obtain an atom in this case? This
> would happen for data-* attributes and for random non-conforming cruft.
> Non-random non-conforming stuff such as MS Office <o:p> or the most
> common Dojo custom attributes could be made 'well-known' for token
> interning purposes.

It's a matter of how common these random non-conforming values will be.
It takes a really long time to synchronize with the main thread.

One alternative might be to make the atom code threadsafe.

>> But what we could do is to not give the node a nodeinfo at all (give it
>> a dummy nodeinfo), then store the node name outside of the node. This
>> information would have to make its way into the parse event that's
>> eventually sent to the main thread.
>>
>> Once we're on the main thread we grab the correct nodeinfo and push that
>> into the node.
>
> In that case, wouldn't it make sense to use some kind of node proxy
> handles on the parser thread and defer the entire node allocation to the
> main thread? In this case, instead of working with nsIContent*, the
> parser would work with some kind of nsIContentProxyHandle* and an
> nsIContentProxyHandle would have an nsIContent* field that starts out as
> null and a refcount that can be incremented and decremented across
> threads.
>
> This would add complexity:
>
> 1) Another layer of indirection (the proxy handles vs. direct
> nsIContent*)
>
> 2) Attributes would have to be transferred across the thread boundary
> in a holder other than the element node itself. (In practice, the holder
> would be the attribute holder from the tokenizer. Currently it is
> recycled for each token, but if it traveled across the thread boundary,
> a new attribute holder would have to be allocated for each start tag
> token that has attributes.)
>
> 3) More items (deferred node creation operations) to put into the tree
> builder operation queue.

It's going to be very hard to make setting attributes work off the main
thread. The attribute parsing code touches a lot of code that isn't
threadsafe. For example style attributes use the CSS parser, and class
attributes are parsed into a list of atoms.

Then there is also sharing of attribute values going on (search for
"mappedattribute")

I think the idea was for now to *just* create the node off the main
thread, but then treat it as an opaque object after that. Then on the
main thread set all attributes and stuff like that. Ideal would of
course be if we didn't have to do that.

> In article <X9CdneLHy-qvek3U...@mozilla.org>,
> Neil <ne...@parkwaycc.co.uk> wrote:
>
>>>> Are addref/release on nsISupport-derived objects (including cycle
>>>> collecting ones) thread-safe?
>>> They are if you use NS_IMPL_THREADSAFE_ISUPPORTS, otherwise they aren't.
>> Which basically excludes cycle-collecting ones, of course.
>
> This seems like another reason to defer node creation entirely. Stepping
> through an element node constructor seemed to involve cycle collecting
> stuff. :-(

What we can do is to set a flag on the node indicating that it lives off
the main thread. The AddRef implementation would check this flag and not
call into the cycle collector when the flag is set.

We could also use that flag to avoid going into code paths that aren't
threadsafe, for example in the setAttribute code.

When the node is transitioned to the main thread we'd call the node to
let it know, at which point it would clear the flag and do all
appropriate cycle collection and attribute parsing stuff.

/ Jonas

Stuart Parmenter

unread,

Mar 31, 2009, 11:14:02 PM3/31/09

to

tcmalloc (at least a year ago) sucks bad at fragmentation and isn't
great with performance. If thread contention becomes an issue with
jemalloc, we can have it do thread specific arenas each with their own
locks to reduce most of the overhead with thread-heavy stuff. We've
got that turned off right now.

stuart

Henri Sivonen

unread,

Apr 1, 2009, 4:38:33 AM4/1/09

to

In article <NtGdnW-MTtxMCE_U...@mozilla.org>,
Jonas Sicking <jo...@sicking.cc> wrote:

> One alternative might be to make the atom code threadsafe.

Are there reasons not to do that except the lock overhead for the main
thread where the main thread now has no lock? Does the codebase already
have a hashtable that allows concurrent reads when there are no writes
going on?

> It's going to be very hard to make setting attributes work off the main
> thread. The attribute parsing code touches a lot of code that isn't
> threadsafe. For example style attributes use the CSS parser, and class
> attributes are parsed into a list of atoms.
>
> Then there is also sharing of attribute values going on (search for
> "mappedattribute")
>
> I think the idea was for now to *just* create the node off the main
> thread, but then treat it as an opaque object after that. Then on the
> main thread set all attributes and stuff like that. Ideal would of
> course be if we didn't have to do that.

If the attributes cannot be set on the parser thread, I think just
allocating the node memory on the parser thread isn't worthwhile.

The parser-thread handles to nsIContent* could be made lower overhead if
they weren't refcounted but instead were released at the end of the
parse. This way, the handles could be simple nsIContent** backed by
arrays of nsIContent*.

Would it be acceptable to defer handle deallocation like this? It seems
like a non-issue of desktop memory conditions. Would it be an issue on
mobile? Having refcounts on each handle would at least double the
footprint of each handle. Perhaps triple it if there's also a lock
around the refcount.

Robert O'Callahan

unread,

Apr 1, 2009, 8:36:51 PM4/1/09

to

On 1/4/09 9:38 PM, Henri Sivonen wrote:
> In article<NtGdnW-MTtxMCE_U...@mozilla.org>,
> Jonas Sicking<jo...@sicking.cc> wrote:
>
>> One alternative might be to make the atom code threadsafe.
>
> Are there reasons not to do that except the lock overhead for the main
> thread where the main thread now has no lock? Does the codebase already
> have a hashtable that allows concurrent reads when there are no writes
> going on?

I think it means that every addref/release of an nsIAtom (which happens
a lot) becomes an incredibly expensive atomic operation.

Rob

Boris Zbarsky

unread,

Apr 1, 2009, 8:59:49 PM4/1/09

to

Robert O'Callahan wrote:
>>> One alternative might be to make the atom code threadsafe.
>>
>> Are there reasons not to do that except the lock overhead for the main
>> thread where the main thread now has no lock? Does the codebase already
>> have a hashtable that allows concurrent reads when there are no writes
>> going on?
>
> I think it means that every addref/release of an nsIAtom (which happens
> a lot) becomes an incredibly expensive atomic operation.

For static atoms it wouldn't need to, but every string-to-atom
conversion would need to lock around the table. This also happens a lot.

-Boris

L. David Baron

unread,

Apr 1, 2009, 9:18:46 PM4/1/09

to dev-pl...@lists.mozilla.org

We were in that situation (AtomImpl had threadsafe refcounting,
PermanentAtomImpl and StaticAtomWrapper did not) for quite a while
(October 2001 (bug 92141) until August 2007 (bug 387445)), and I
don't think it was that expensive since most of the atoms that
mattered were StaticAtomWrapper. The part that was never threadsafe
was the atom lookup and atom table management.

-David

--
L. David Baron http://dbaron.org/
Mozilla Corporation http://www.mozilla.com/

Neil

unread,

Apr 2, 2009, 5:14:45 AM4/2/09

to

Robert O'Callahan wrote:

> I think it means that every addref/release of an nsIAtom (which
> happens a lot) becomes an incredibly expensive atomic operation.

Pun not intended ;-)

Henri Sivonen

unread,

Apr 2, 2009, 8:11:43 AM4/2/09

to

In article <veWdnXYnRfyYk0nU...@mozilla.org>,
Boris Zbarsky <bzba...@mit.edu> wrote:

But most often the atom already exists, right? Would it be feasible to
make atom creation only lock when a new atom needs to be created as
opposed to a pre-existing atom getting retrieved from the hashtable? So
that reads would be concurrent but upon needing to write to the
hashtable, threads would block on a lock writing one by one? Can
hashtable buckets be appended as a single pointer write so that readers
don't need to be locked against the hashtable being in an inconsistent
state during a write (that doesn't resize and rehash the whole
hashtable)? Could resizing and rehashing be done into a copy of the
hashtable so that readers can read the old one during the rehashing?

Mike Shaver

unread,

Apr 2, 2009, 7:21:35 AM4/2/09

to Henri Sivonen, dev-pl...@lists.mozilla.org

On Thu, Apr 2, 2009 at 8:11 AM, Henri Sivonen <hsiv...@iki.fi> wrote:
> But most often the atom already exists, right? Would it be feasible to
> make atom creation only lock when a new atom needs to be created as
> opposed to a pre-existing atom getting retrieved from the hashtable?

You mean to just fetch and if NULL is found, take a lock, recheck, and
add the new atom if necessary? I want to believe that could be made
safe with respect to other racing threads, especially if all mutations
are done in a copy of the table/bucket and then updated per below. I
don't always get what I want, though! Chris Jones around?

> Could resizing and rehashing be done into a copy of the
> hashtable so that readers can read the old one during the rehashing?

Possible using read-copy-update patterns, I expect, which would also
be useful for other common-read/rare-write cases like the various
component manager tables.

Mike

Robert O'Callahan

unread,

Apr 2, 2009, 4:26:11 PM4/2/09

to

On 3/4/09 12:21 AM, Mike Shaver wrote:
> On Thu, Apr 2, 2009 at 8:11 AM, Henri Sivonen<hsiv...@iki.fi> wrote:
>> But most often the atom already exists, right? Would it be feasible to
>> make atom creation only lock when a new atom needs to be created as
>> opposed to a pre-existing atom getting retrieved from the hashtable?
>
> You mean to just fetch and if NULL is found, take a lock, recheck, and
> add the new atom if necessary? I want to believe that could be made
> safe with respect to other racing threads, especially if all mutations
> are done in a copy of the table/bucket and then updated per below. I
> don't always get what I want, though! Chris Jones around?

That is basically "double-checked locking".

For correctness, it requires a memory barrier before the fetch.
Otherwise consider a situation where CPU 1 creates an atom and stores a
pointer in the hashtable. CPU 2 can see the write that fills in the
hashtable entry happen *before* the writes that initialize the atom, so
it sees an invalid atom.

>> Could resizing and rehashing be done into a copy of the
>> hashtable so that readers can read the old one during the rehashing?
>
> Possible using read-copy-update patterns, I expect, which would also
> be useful for other common-read/rare-write cases like the various
> component manager tables.

Cliff Click at Azul had a nice presentation about a massively parallel
hashtable like that.

Rob

Jason Duell

unread,

Apr 2, 2009, 4:44:36 PM4/2/09

to

> Would it be feasible to
> make atom creation only lock when a new atom needs to be created as
> opposed to a pre-existing atom getting retrieved from the hashtable? So
> that reads would be concurrent but upon needing to write to the
> hashtable, threads would block on a lock writing one by one? Can
> hashtable buckets be appended as a single pointer write so that readers
> don't need to be locked against the hashtable being in an inconsistent
> state during a write (that doesn't resize and rehash the whole
> hashtable)?

Yes. Assuming we're talking about a standard hashtable (hash into an
array of pointers to linked lists), and we're only adding (never
deleting) atoms (right?), you can make it so that readers never need
to perform an expensive memory flush or atomic operation (well, not on
most architectures: see below):

You can do this entirely without locks if there's only one writer at a
time, but it sounds like we've got concurrency there. So...

Possibly-concurrent Writer:
---------
1) Construct new atom
2) Issue Write_FLUSH (exact instruction varies by architecture:
sorry!)
3) Grab "hashtable write lock" (via compare and swap, or whatever)
4) change "next" pointer of last item in hash bucket linked list from
NULL to the addr of the new atom
5) Release lock

(note: if you use a pthread lock--probably a good idea--
pthread_mutex_lock() will perform steps 2 and 3 for you)
(2nd note: This doesn't handle case where two threads trying to
concurrently add the same atom, but adding a check once you've got the
lock ought to do).

Reader:
----------
1) Just do regular hashtable lookup as usual: will see either NULL or
addr of new atom, as pointer-sized writes are atomic on virtually all
systems (maybe not really old 16 bit windows when it supported >16 bit
addressing, but that's moot now, thank god).
2) EXCEPT: on some architectures, the writer's WRITE_FLUSH is not
enough to ensure that the reader sees the correct ordering (i.e. that
the writes to init the atom occur before the write to "next" that adds
it to the list): on the DEC alpha, for instance, you had to make the
reader issue a "read_flush" before you tried to read "next".

I've emailed the atomics guru at my last gig to ask if he knows of any
other chips out there that require read flushes, and/or whether they
are likely to show up in the future. (the latter, of course, would be
his opinion only. Possibly the former, too :). I'm not too worried
for desktops, but it might come up for us on certain mobile
archtitectures.

The way we handled this at my last project was with macros that
expanded into the correct arch-specific instructions (or to nothing
for read_flush on most arch's). If you use pthreads everywhere, this
is all handled for you, but then you'd have to have readers locking
and unlocking the list, which is exactly what you want to avoid. I'm
pretty sure there's no standard read/write "memory flush" call in
pthreads or elsewhere in the C/Posix standards.

> Could resizing and rehashing be done into a copy of the
> hashtable so that readers can read the old one during the rehashing?

Sure, with the exact same caveat about ensuring that the readers see
the writes in the correct order. On most systems, the creator of the
new hashtable issuing pthread_unlock() before writing the ptr to the
hashtable will ensure that following readers will see the new,
completely initialized hashtable, with no special handling by the
readers. For Alpha-like chips, you'd need a read flush. Of course,
you'd also have the problem that you'd need to figure out when it was
safe to delete the old hashtable (i.e. when all the readers had
exited). Can't you just make the hashtable index array big enough
that you don't need to worry about re-hashing? How often do you need
to add atoms?

Cheers,

Jason

Jason Duell

unread,

Apr 2, 2009, 4:56:08 PM4/2/09

to

Here's what my atomics guru friends said:

---
While Alpha need(ed) a specific read memory barrier, other modern
chips may still need a way to squash speculative/pre-fetched loads.
So, the reader should do something like:
p = p->next;
RMB();
if (p) {
do_something(p)

Most modern processors (incl Opteron, Itanium, PowerPC) need this.
---

so Roc is right: readers will need a read memory barrier (note that a
RMB is usually cheaper than a full read/write barrier, so it's worth
using if available).

If we want to implement this, my last project was open source (BSD
license) and very cross-platform, so we can steal the code if we don't
already have it:

http://upc.lbl.gov/

Cheers,

Jason

Jonas Sicking

unread,

Apr 2, 2009, 5:13:11 PM4/2/09

to

nsIAtom is already threadsafe actually. Which is why we have static
atoms whose addref/release are no-ops. So you can pass atoms around to
various threads, you just can't create new ones, or kill existing ones.

/ Jonas

Jonas Sicking

unread,

Apr 2, 2009, 5:14:22 PM4/2/09

to

Nevermind, apparently we changed that.

/ Jonas

Johnny Stenback

unread,

Apr 2, 2009, 6:11:18 PM4/2/09

to Jason Duell, dev-pl...@lists.mozilla.org

Jason Duell wrote:
>> Would it be feasible to
>> make atom creation only lock when a new atom needs to be created as
>> opposed to a pre-existing atom getting retrieved from the hashtable? So
>> that reads would be concurrent but upon needing to write to the
>> hashtable, threads would block on a lock writing one by one? Can
>> hashtable buckets be appended as a single pointer write so that readers
>> don't need to be locked against the hashtable being in an inconsistent
>> state during a write (that doesn't resize and rehash the whole
>> hashtable)?
>
> Yes. Assuming we're talking about a standard hashtable (hash into an
> array of pointers to linked lists), and we're only adding (never
> deleting) atoms (right?), you can make it so that readers never need
> to perform an expensive memory flush or atomic operation (well, not on
> most architectures: see below):

We do unfortunately delete atoms. But doing that is pretty rare, and the
deletion can be however slow it needs to be.

--
jst

Jason Duell

unread,

Apr 2, 2009, 8:09:41 PM4/2/09

to

> We do unfortunately delete atoms. But doing that is pretty rare, and the
> deletion can be however slow it needs to be.

Then you're pretty much stuck with expensive atomic refcounting, no?

I don't know how often we use atoms, or why we switched from having
them be threadsafe and non-deletable, but if they're used often,
making them non-deletable might save be a reasonable speed/size
tradeoff.

Boris Zbarsky

unread,

Apr 2, 2009, 8:16:58 PM4/2/09

to

Jason Duell wrote:
> I don't know how often we use atoms, or why we switched from having
> them be threadsafe and non-deletable

They were never non-deletable.

Or rather, some were deletable, and some were not. That's still the case.

> but if they're used often,
> making them non-deletable might save be a reasonable speed/size
> tradeoff.

The often-used ones are in fact non-deletable. The deletable atoms are
only those that are not in any of our static atom tables.

-Boris

Henri Sivonen

unread,

Apr 3, 2009, 5:25:38 AM4/3/09

to

In article <98mdnZCVqrz5gkjU...@mozilla.org>,

Robert O'Callahan <rob...@ocallahan.org> wrote:

> On 3/4/09 12:21 AM, Mike Shaver wrote:
> > On Thu, Apr 2, 2009 at 8:11 AM, Henri Sivonen<hsiv...@iki.fi> wrote:
> >> But most often the atom already exists, right? Would it be feasible to
> >> make atom creation only lock when a new atom needs to be created as
> >> opposed to a pre-existing atom getting retrieved from the hashtable?
> >
> > You mean to just fetch and if NULL is found, take a lock, recheck, and
> > add the new atom if necessary?

This was what I meant. With the assumption that the hashtable state that
the readers see is behind a single pointer change so that it's
impossible to see partial writes. (And that CPUs don't see memory writes
from other CPUs out of order.)

> > I want to believe that could be made
> > safe with respect to other racing threads, especially if all mutations
> > are done in a copy of the table/bucket and then updated per below. I
> > don't always get what I want, though! Chris Jones around?
>
> That is basically "double-checked locking".
>
> For correctness, it requires a memory barrier before the fetch.
> Otherwise consider a situation where CPU 1 creates an atom and stores a
> pointer in the hashtable. CPU 2 can see the write that fills in the
> hashtable entry happen *before* the writes that initialize the atom, so
> it sees an invalid atom.

Do you meant that CPU 1 has actually initialized atom but the cache on
CPU 2 doesn't know?

> >> Could resizing and rehashing be done into a copy of the
> >> hashtable so that readers can read the old one during the rehashing?
> >
> > Possible using read-copy-update patterns, I expect, which would also
> > be useful for other common-read/rare-write cases like the various
> > component manager tables.
>
> Cliff Click at Azul had a nice presentation about a massively parallel
> hashtable like that.

Without having seen the presentation, I wonder how the thread that
updates the hashtable knows that all the readers are gone from the old
copy so that the old copy can be deallocated.

Henri Sivonen

unread,

Apr 3, 2009, 5:42:49 AM4/3/09

to

In article
<fafca822-4cb5-414a...@r33g2000yqn.googlegroups.com>,
Jason Duell <jduell...@gmail.com> wrote:

> > Would it be feasible to
> > make atom creation only lock when a new atom needs to be created as
> > opposed to a pre-existing atom getting retrieved from the hashtable? So
> > that reads would be concurrent but upon needing to write to the
> > hashtable, threads would block on a lock writing one by one? Can
> > hashtable buckets be appended as a single pointer write so that readers
> > don't need to be locked against the hashtable being in an inconsistent
> > state during a write (that doesn't resize and rehash the whole
> > hashtable)?
>
> Yes. Assuming we're talking about a standard hashtable (hash into an
> array of pointers to linked lists),

This is what I assumed. I don't know what the atom table internals are
actually like.

> and we're only adding (neverdeleting) atoms (right?),

As was pointed out, atoms that didn't exist at app startup can also go
away before app shutdown.

> You can do this entirely without locks if there's only one writer at a
> time, but it sounds like we've got concurrency there. So...

Indeed, the expectation is that there can be thread attempting
concurrent writes.

> The way we handled this at my last project was with macros that
> expanded into the correct arch-specific instructions (or to nothing
> for read_flush on most arch's). If you use pthreads everywhere, this
> is all handled for you, but then you'd have to have readers locking
> and unlocking the list, which is exactly what you want to avoid. I'm
> pretty sure there's no standard read/write "memory flush" call in
> pthreads or elsewhere in the C/Posix standards.

I wonder if NSPR already has lower-level than pthreads operations
abstracted. At first glance, NSPR seems to higher-level than this
currently.

> Can't you just make the hashtable index array big enough
> that you don't need to worry about re-hashing?

Perhaps. I don't know.

> How often do you need to add atoms?

The HTML5 parser needs to add atoms relatively seldom, since all the
standard (apart from data-* attributes) and common non-standard
identifiers are static atoms. Furthermore, the HTML5 parser doesn't need
to look up static atoms from the shared atom table at all, because it
already has its own read-only mechanisms for getting pointers to the
static atoms it needs.

I don't know about the other parts of the app, though.

Henri Sivonen

unread,

Apr 3, 2009, 7:32:49 AM4/3/09

to

In article <hsivonen-690FA3...@news.mozilla.org>,
Henri Sivonen <hsiv...@iki.fi> wrote:

> Furthermore, the HTML5 parser doesn't need
> to look up static atoms from the shared atom table at all, because it
> already has its own read-only mechanisms for getting pointers to the
> static atoms it needs.

Oops. This isn't true. It is possible that another part of the app has
registered a static atom for a string that the HTML5 parser doesn't know
about so if such a string is used as an element or attribute name, the
HTML5 calls NS_NewAtom() with a string that resolves to a static atom.

Anyway, it seems that as far as locking the hash table goes, there's an
opportunity to have two hashtables: one completely unlocked for static
atoms and another write-locked for dynamic atoms (with lookup always
happening in static atoms first).

Neil

unread,

Apr 3, 2009, 6:37:42 AM4/3/09

to

Johnny Stenback wrote:

> We do unfortunately delete atoms. But doing that is pretty rare, and
> the deletion can be however slow it needs to be.

Deletion sounds very scary to me; what if one of the readers is trying
to look up the atom that you're trying to delete? Assuming that's a
solvable problem, is it worth making the hashtable own and periodically
expire unused atoms?

Robert O'Callahan

unread,

Apr 3, 2009, 7:57:32 AM4/3/09

to

On 3/4/09 10:25 PM, Henri Sivonen wrote:
> In article<98mdnZCVqrz5gkjU...@mozilla.org>,
> Robert O'Callahan<rob...@ocallahan.org> wrote:
>
>> On 3/4/09 12:21 AM, Mike Shaver wrote:
>>> On Thu, Apr 2, 2009 at 8:11 AM, Henri Sivonen<hsiv...@iki.fi> wrote:
>>>> But most often the atom already exists, right? Would it be feasible to
>>>> make atom creation only lock when a new atom needs to be created as
>>>> opposed to a pre-existing atom getting retrieved from the hashtable?
>>> You mean to just fetch and if NULL is found, take a lock, recheck, and
>>> add the new atom if necessary?
>
> This was what I meant. With the assumption that the hashtable state that
> the readers see is behind a single pointer change so that it's
> impossible to see partial writes. (And that CPUs don't see memory writes
> from other CPUs out of order.)

This is a bad assumption in general, partly due to the memory system and
partly due to optimizations that compilers might perform.

>>> I want to believe that could be made
>>> safe with respect to other racing threads, especially if all mutations
>>> are done in a copy of the table/bucket and then updated per below. I
>>> don't always get what I want, though! Chris Jones around?
>> That is basically "double-checked locking".
>>
>> For correctness, it requires a memory barrier before the fetch.
>> Otherwise consider a situation where CPU 1 creates an atom and stores a
>> pointer in the hashtable. CPU 2 can see the write that fills in the
>> hashtable entry happen *before* the writes that initialize the atom, so
>> it sees an invalid atom.
>
> Do you meant that CPU 1 has actually initialized atom but the cache on
> CPU 2 doesn't know?

Yes.

>>>> Could resizing and rehashing be done into a copy of the
>>>> hashtable so that readers can read the old one during the rehashing?
>>> Possible using read-copy-update patterns, I expect, which would also
>>> be useful for other common-read/rare-write cases like the various
>>> component manager tables.
>> Cliff Click at Azul had a nice presentation about a massively parallel
>> hashtable like that.
>
> Without having seen the presentation, I wonder how the thread that
> updates the hashtable knows that all the readers are gone from the old
> copy so that the old copy can be deallocated.

I think Cliff Click's version relied on garbage collection.

Rob

Robert O'Callahan

unread,

Apr 3, 2009, 7:58:28 AM4/3/09

to

On 3/4/09 1:09 PM, Jason Duell wrote:
> I don't know how often we use atoms, or why we switched from having
> them be threadsafe and non-deletable, but if they're used often,
> making them non-deletable might save be a reasonable speed/size
> tradeoff.

I think we atomize all attribute values shorter than a certain length;
failing to delete those atoms would mean leaks.

Rob