temporal locking of strings via C API

25 views
Skip to first unread message

Eric Wong

unread,
Apr 4, 2013, 11:00:53 PM4/4/13
to rubini...@googlegroups.com
Hi all, MRI has rb_str_locktmp/rb_str_unlocktmp functions for preventing
several threads from writing to the same string. io.c in the MRI source
has examples of its use.

Is there something similar in Rubinius?

If not, I will probably hash the VALUE to a fixed set of mutexes (maybe
#locks will be scaled to processor count) and use pthread_mutex_trylock
to emulate that behavior...

Dirkjan Bussink

unread,
Apr 6, 2013, 8:40:33 AM4/6/13
to rubini...@googlegroups.com
Hi Eric,

No, there isn't something similar. The overhead of this would actually be non trivial in Rubinius, since we have actual concurrent threads, so an implementation similar to MRI will suffer from all kinds of race conditions in Rubinius. There is no guarantee whatsoever, that when reading the locked variable, it will still be there then when actually reading / writing something. So the only solution is to actually lock the entire string for every modifying operation.

This would also be necessary for every operation that happens when the string is not "locked", so it would mean a general performance impact for all string operations. This is necessary so no locking happens while another thread is modifying something, a problem MRI does not have because of the GIL. We don't want to add synchronization across every string operation, so there is nothing that works like this and probably never will be.

How is the string shared for this problem? If it's some output string you share, the only way to do it is to lock it. In Rubinius you can actually use every object as a lock, much like in Java:

Rubinius.synchronize(string) do

end

This functionality is not (yet) exposed to the C-API, but it might be an idea to expose it there too for extensions to use then.

--
Dirkjan Bussink
> --
> --
> --- !ruby/object:MailingList
> name: rubinius-dev
> view: http://groups.google.com/group/rubinius-dev?hl=en
> post: rubini...@googlegroups.com
> unsubscribe: rubinius-dev...@googlegroups.com
> ---
> You received this message because you are subscribed to the Google Groups "rubinius-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rubinius-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Eric Wong

unread,
Apr 6, 2013, 2:58:50 PM4/6/13
to rubini...@googlegroups.com
Dirkjan Bussink <d.bu...@gmail.com> wrote:
> No, there isn't something similar. The overhead of this would actually
> be non trivial in Rubinius, since we have actual concurrent threads,
> so an implementation similar to MRI will suffer from all kinds of race
> conditions in Rubinius. There is no guarantee whatsoever, that when
> reading the locked variable, it will still be there then when actually
> reading / writing something. So the only solution is to actually lock
> the entire string for every modifying operation.
>
> This would also be necessary for every operation that happens when the
> string is not "locked", so it would mean a general performance impact
> for all string operations. This is necessary so no locking happens
> while another thread is modifying something, a problem MRI does not
> have because of the GIL. We don't want to add synchronization across
> every string operation, so there is nothing that works like this and
> probably never will be.

Good to know, I hate unnecessary synchronization :)

Just curious, will using shared strings improperly raise a Ruby
exception, or will it just segfault the VM? I wouldn't even mind the
latter.

> How is the string shared for this problem? If it's some output string
> you share, the only way to do it is to lock it. In Rubinius you can
> actually use every object as a lock, much like in Java:

AFAIK, MRI only uses rb_str_locktmp/rb_str_unlocktmp to detect bugs
where strings are accidentally shared.

MRI tests a per-string bit and raises immediately if the string is
already locked by another thread. There's no waiting at all. Since
this bit is guarded by the GVL, no additional memory barrier is needed
in MRI.

If Rubinius were to implement it; it'd still need the per-string bit
flag and use atomic test/set operations. No waiting, but there'll
need to be memory barriers at least.

> Eric Wong wrote:
> > If not, I will probably hash the VALUE to a fixed set of mutexes (maybe
> > #locks will be scaled to processor count) and use pthread_mutex_trylock
> > to emulate that behavior...

On second thought, that approach would not work as-is, it would generate
false positives without a per-string bit.

Maybe something like this:

# will raise if lock bit is already set
Rubinius.synchronize(string) { string.set_lock_bit! }

io.read(8192, string)

# will raise if lock bit was not set
Rubinius.synchronize(string) { string.clear_lock_bit! }

Dirkjan Bussink

unread,
Apr 8, 2013, 10:27:43 AM4/8/13
to rubini...@googlegroups.com

On Apr 6, 2013, at 20:58 , Eric Wong <normal...@yhbt.net> wrote:

> Good to know, I hate unnecessary synchronization :)
>
> Just curious, will using shared strings improperly raise a Ruby
> exception, or will it just segfault the VM? I wouldn't even mind the
> latter.

Most probably just seeing mixed up string data. If you're able to crash the VM, we actually consider that to be a bug. Regardless of races / concurrency issues, we don't want people to be able to the crash the VM. So it could raise an exception, lose data, duplicate data, whatever, but no crashes.

If you're able to crash it in this way, please let us know because that means we should fix that :).


> AFAIK, MRI only uses rb_str_locktmp/rb_str_unlocktmp to detect bugs
> where strings are accidentally shared.
>
> MRI tests a per-string bit and raises immediately if the string is
> already locked by another thread. There's no waiting at all. Since
> this bit is guarded by the GVL, no additional memory barrier is needed
> in MRI.
>
> If Rubinius were to implement it; it'd still need the per-string bit
> flag and use atomic test/set operations. No waiting, but there'll
> need to be memory barriers at least.

Actually this is exactly what happens when an object is only locked by a single thread and there is no contention. We use CAS to set a locked flag together with the thread id and only inflate to a full mutex when there is contention on the lock.

This means locking around non-contended resources is a lot cheaper, but it's still not free of course.


> Maybe something like this:
>
> # will raise if lock bit is already set
> Rubinius.synchronize(string) { string.set_lock_bit! }
>
> io.read(8192, string)
>
> # will raise if lock bit was not set
> Rubinius.synchronize(string) { string.clear_lock_bit! }

Like I mentioned, if the string is not often contended, the easiest is just to lock the string itself, since that already does a cheaper CAS locking mechanism which only inflates when not contended. You can also use Rubinius.try_lock which will try to lock and return whether it succeeded.

So I would just write this code like this:

Rubinius.synchronize(string) { io.read(8192, string) }

This will be at least as fast as this example, since the example has the locking overhead + the overhead of adding some lock bits.

--
Dirkjan
Reply all
Reply to author
Forward
0 new messages