[Sbcl-help] Hash table thread-safety

313 views
Skip to first unread message

Victor Kryukov

unread,
Feb 18, 2008, 9:57:12 PM2/18/08
to sbcl...@lists.sourceforge.net
Hello,

I'm a bit confused about which hash tables operations are thread-safe
in SBCL - I know about locks, but I want to avoid them as much as
possible due to efficiency reasons. I'd appreciate if anybody would
tell me whether the following statements are true or false for the
latest x86-64 SBCL on the multi-core CPU. Even simple yes/no would be
fine; if you could provide some brief reasoning/implementation details
- that's even better. Feel free to refer me to any external sources of
information, of course.

1/ Multiple threads can safely read from the same hash table.

2/ Multiple threads can write to the same hash table. If two different
threads attempt to overwrite the same key simultaneously, the
resulting value (first or second one) is not specified, but it will be
one of the values provided, and no other problems would occur.

3/ Multiple threads can read and write to the same hash table. If one
thread attempts to read a value while another thread attempts to write
a new value, the value that reader gets (old or new) is not specified,
but the new value would be written, and no other problems would occur.

4/ Same as 2,3 above if one thread removes a key and another tries to
write/read it.

5/ Last, but not the least - I'd like to understand the behaviour of
maphash when other threads write to the hash-table. Is it true that
maphash is guaranteed about looping over all the _initial_ hash
elements, while behaviour for new added values is not specified?

6/ Same as above if maphash removes or updates some values while
walking along the hash table.

Regards,
Victor.

--
http://macrodefinition.blogspot.com


-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Sbcl-help mailing list
Sbcl...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sbcl-help

Nikodemus Siivola

unread,
Feb 19, 2008, 2:27:14 AM2/19/08
to Victor Kryukov, sbcl...@lists.sourceforge.net
On 2/19/08, Victor Kryukov <victor....@gmail.com> wrote:

First off:

CL-USER> (documentation 'make-hash-table 'function)
"Create and return a new hash table. The keywords are as follows:
:TEST -- Indicates what kind of test to use.
:SIZE -- A hint as to how many elements will be put in this hash
table.
:REHASH-SIZE -- Indicates how to expand the table when it fills up.
If an integer, add space for that many elements. If a floating
point number (which must be greater than 1.0), multiply the size
by that amount.
:REHASH-THRESHOLD -- Indicates how dense the table can become before
forcing a rehash. Can be any positive number <=1, with density
approaching zero as the threshold approaches 0. Density 1 means an
average of one entry per bucket.
:WEAKNESS -- If NIL (the default) it is a normal non-weak hash table.
If one of :KEY, :VALUE, :KEY-AND-VALUE, :KEY-OR-VALUE it is a weak
hash table.
Depending on the type of weakness the lack of references to the
key and the value may allow for removal of the entry. If WEAKNESS
is :KEY and the key would otherwise be garbage the entry is eligible
for removal from the hash table. Similarly, if WEAKNESS is :VALUE
the life of an entry depends on its value's references. If WEAKNESS
is :KEY-AND-VALUE and either the key or the value would otherwise be
garbage the entry can be removed. If WEAKNESS is :KEY-OR-VALUE and
both the key and the value would otherwise be garbage the entry can
be removed.
:SYNCHRONIZED -- If NIL (the default), the hash-table may have
multiple concurrent readers, but results are undefined if a
thread writes to the hash-table concurrently with another
reader or writer. If T, all concurrent accesses are safe, but
note that CLHS 3.6 (Traversal Rules and Side Effects) remains
in force. See also: SB-EXT:WITH-LOCKED-HASH-TABLE. This keyword
argument is experimental, and may change incompatibly or be
removed in the future."

CL-USER> (documentation 'sb-ext:with-locked-hash-table 'function)
"Limits concurrent accesses to HASH-TABLE for the duration of BODY.
If HASH-TABLE is synchronized, BODY will execute with exclusive
ownership of the table. If HASH-TABLE is not synchronized, BODY will
execute with other WITH-LOCKED-HASH-TABLE bodies excluded -- exclusion
of hash-table accesses not surrounded by WITH-LOCKED-HASH-TABLE is
unspecified."

> 1/ Multiple threads can safely read from the same hash table.

Yes, assuming there are no writers.

> 2/ Multiple threads can write to the same hash table. If two different
> threads attempt to overwrite the same key simultaneously, the
> resulting value (first or second one) is not specified, but it will be
> one of the values provided, and no other problems would occur.

No. Multiple unlocked writers are unsafe -- at the very least you can
expect type-errors. We try fairly hard not to corrupt your whole lisp
image, but the hash-table as a whole may be trashed.

> 3/ Multiple threads can read and write to the same hash table. If one
> thread attempts to read a value while another thread attempts to write
> a new value, the value that reader gets (old or new) is not specified,
> but the new value would be written, and no other problems would occur.

No.

> 4/ Same as 2,3 above if one thread removes a key and another tries to
> write/read it.

No.

> 5/ Last, but not the least - I'd like to understand the behaviour of
> maphash when other threads write to the hash-table. Is it true that
> maphash is guaranteed about looping over all the _initial_ hash
> elements, while behaviour for new added values is not specified?

No.

CL-USER> (documentation 'maphash 'function)
"For each entry in HASH-TABLE, call the designated two-argument function on
the key and value of the entry. Return NIL.

Consequences are undefined if HASH-TABLE is mutated during the call to
MAPHASH, except for changing or removing elements corresponding to the
current key. The applies to all threads, not just the current one --
even for synchronized hash-tables. If the table may be mutated by
another thread during iteration, use eg. SB-EXT:WITH-LOCKED-HASH-TABLE
to protect the MAPHASH call."

> 6/ Same as above if maphash removes or updates some values while
> walking along the hash table.

Same as above.

If you have tricky parallelism requirements, I recommend you look into
various lockless algorithms and SB-EXT:COMPARE-AND-SWAP.

Cheers,

-- Nikodemus

Victor Kryukov

unread,
Feb 19, 2008, 2:57:37 AM2/19/08
to sbcl...@lists.sourceforge.net
"Nikodemus Siivola" <niko...@random-state.net> writes:

> On 2/19/08, Victor Kryukov <victor....@gmail.com> wrote:
>
> First off:
>
> CL-USER> (documentation 'make-hash-table 'function)

> CL-USER> (documentation 'sb-ext:with-locked-hash-table 'function)

Thanks Nikodemus! I was also wondering whether SBCL support weak hash
tables, and was thinking the answer is no, because they are not
mentioned in the SBCL manual and Google is not your friend, either[1],
but apparently the answer was right in front of my nose.

BTW, are weak hash tables considered stable at this point?

>> 1/ Multiple threads can safely read from the same hash table.
>
> Yes, assuming there are no writers.
>
>> 2/ Multiple threads can write to the same hash table. If two different
>> threads attempt to overwrite the same key simultaneously, the
>> resulting value (first or second one) is not specified, but it will be
>> one of the values provided, and no other problems would occur.
>
> No. Multiple unlocked writers are unsafe -- at the very least you can
> expect type-errors. We try fairly hard not to corrupt your whole lisp
> image, but the hash-table as a whole may be trashed.

Is it also true if ":synchronized t" is set? Otherwise I don't really
understand how to parse "If T, all concurrent accesses are safe, but
note that CLHS 3.6 remains in force.", as CLHS 3.6 only talks about
hash-table traversal, not ordinary access.

In general, do you have any rough idea how much performance is
affected if a) ":SYNCHRONIZED T" is used, b) WITH-LOCKED-HASH-TABLE is
used around the code that changes hash, and c) an explicit mutex is
used to control access to the hash table? 'cause if the impact on
performance is not huge, I'd probably live with a) or b). I assume
you've created b) because c) was not fast enough?

> If you have tricky parallelism requirements, I recommend you look into
> various lockless algorithms and SB-EXT:COMPARE-AND-SWAP.

I'll look into them.

Thanks a lot for your fast and helpful response!

Regards,
Victor.

[1] http://www.google.com/search?q=sbcl+weak+hash+table

--
http://macrodefinition.blogspot.com

Nikodemus Siivola

unread,
Feb 19, 2008, 4:12:49 AM2/19/08
to Victor Kryukov, sbcl...@lists.sourceforge.net
On 2/19/08, Victor Kryukov <victor....@gmail.com> wrote:

> BTW, are weak hash tables considered stable at this point?

Yes, but I'm not sure how hard they have been pounded at by the world
at large. They are definitely expected to work, though.

> >> 1/ Multiple threads can safely read from the same hash table.
> >
> > Yes, assuming there are no writers.
> >
> >> 2/ Multiple threads can write to the same hash table. If two different
> >> threads attempt to overwrite the same key simultaneously, the
> >> resulting value (first or second one) is not specified, but it will be
> >> one of the values provided, and no other problems would occur.
> >
> > No. Multiple unlocked writers are unsafe -- at the very least you can
> > expect type-errors. We try fairly hard not to corrupt your whole lisp
> > image, but the hash-table as a whole may be trashed.
>
> Is it also true if ":synchronized t" is set? Otherwise I don't really
> understand how to parse "If T, all concurrent accesses are safe, but
> note that CLHS 3.6 remains in force.", as CLHS 3.6 only talks about
> hash-table traversal, not ordinary access.

No. If the hash-table is synchronized, concurrent readers and writers
are safe -- just not really concurrent. The documentation string tries
to say that modifications during traversal are not safe, except for
the cases specifically allowed by the CLHS. (The reason the keyword is
:synchronized, not :locked, is that if we at some point decide to
implement lockless hash-tables, the descriptive value of the keyword
argument remains the same.)

> In general, do you have any rough idea how much performance is
> affected if a) ":SYNCHRONIZED T" is used, b) WITH-LOCKED-HASH-TABLE is
> used around the code that changes hash, and c) an explicit mutex is
> used to control access to the hash table? 'cause if the impact on
> performance is not huge, I'd probably live with a) or b). I assume
> you've created b) because c) was not fast enough?

I sense a misunderstanding:

If you have concurrent readers and writers, it is not enough to stick
WITH-LOCKED-HASH-TABLE around writers but not readers.

At any rate, A and B are pretty much the same. C is currently
considerably more expensive unless the hash-table lock is heavily
contested, in which case it becomes cheaper -- the internal lock is an
unfair spinlock.

The reason we have A is that it is simple to use, and doesn't require
modifying multiple call-sites for safety. (You may still need to
modify _correctness_, though -- see below.)

The reason we have B is that it can use the internal lock which all
hash-tables have anyway, so that

(defvar *table* (make-hash-table :synchronized t))

(defun ensure-foo (key)
(let ((table *table*))
(with-locked-hash-table (table)
(or (gethash key table)
(setf (gethash key table) (make-foo key))))))

(defun get-foo (key)
(gethash key *table*))

doesn't need to acquire more then a single lock (also easier to audit
then having an external lock.) This is also an exemplar of the kind of
code that needs the WITH-LOCKED-HASH-TABLE for correctness even in the
presence of :SYNCHRONIZED T.

Cheers,

-- Nikodemus

Victor Kryukov

unread,
Feb 19, 2008, 1:23:40 PM2/19/08
to sbcl...@lists.sourceforge.net
"Nikodemus Siivola" <niko...@random-state.net> writes:

Thanks Nikodemus - it's clear now.

Reply all
Reply to author
Forward
0 new messages