Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

An idea on fast multithreaded refcounted strings

0 views
Skip to first unread message

Sergey P. Derevyago

unread,
Jan 28, 2002, 6:31:39 AM1/28/02
to
Andrei Alexandrescu wrote:
> An idea just occured to me.
>
> Every threading system offers a "threadid()" system call through which you
> figure out in which thread you currently are in. Usually the thread id is an
> integer.
Actually, the things are much more complicated. In particular, POSIX has

pthread_t pthread_self(void);

function, where pthread_t is _not_ an integer (in particular, you can take a
look at
http://groups.google.com/groups?selm=tSgw7.80%24RL6.390%40news.cpqcorp.net)
We can obtain pthread_t with pthread_self() and compare with pthread_equal(),
but we don't have something like pthread_before() to create a map of
pthread_t.

> I was thinking, an effective refcounted strings implementation would be:
>
> 1. In all of string's constructors, save the current thread id in a member
> variable of the string.
>
> 2. When copying the string, compare the creator's thread id saved at point 1
> with the current thread id. If the thread ids are the same, increment the
> reference count. Otherwise (and here's the whole point), lock the string,
> perform a deep copy of it, and unlock it.
>
> The MT implementations of refcounted strings that I know of insist on *not*
> copying the string by using atomic counter operations etc. I think it's very
> effective to give up and do a deep copy upon first interthread access.
IMHO it's not guaranteed that a simple string copying is more expensive than
pthread_self() and pthread_equal() calls, so we can't speak about the
"effective strings implementation".

> This way:
>
> * In single-threaded applications, the cost of MT support is an extra data
> member and an extra integer comparison during copy construction.
>
> * In multithreaded applications, only strings shared between threads are not
> refcounted. All others will be nicely shared appropriately.
>
> Is there some mistake I'm making?
Probably, yes :)
--
With all respect, Sergey. http://cpp3.virtualave.net/
mailto : ders at skeptik.net

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

David Schwartz

unread,
Jan 28, 2002, 10:51:33 AM1/28/02
to

> > 2. When copying the string, compare the creator's thread id saved at point 1
> > with the current thread id. If the thread ids are the same, increment the
> > reference count. Otherwise (and here's the whole point), lock the string,
> > perform a deep copy of it, and unlock it.

This sounds impossible to me. If the thread that used it last didn't
lock it, what good is locking it in this thread going to do?

DS

Alexander Terekhov

unread,
Jan 28, 2002, 2:12:32 PM1/28/02
to
David Schwartz <dav...@webmaster.com> wrote in message news:<3C557385...@webmaster.com>...

ummm... I do not think that he really meant to lock the string
to perform a deep copy of it. The shared string is most likely
meant to be immutable (const) with respect to all threads (while
being shared, or read/write access fully synchronized on some
higher level). I guess that what he wants to achieve is that
copies of such string(s) in *other* (non-owner) threads would
NOT need synchronization being private to those other (non-owner)
threads... thread_shared_string/thread_private_string might
work better...

regards,
alexander.

David Schwartz

unread,
Jan 28, 2002, 5:31:59 PM1/28/02
to
Alexander Terekhov wrote:

> ummm... I do not think that he really meant to lock the string
> to perform a deep copy of it. The shared string is most likely
> meant to be immutable (const) with respect to all threads (while
> being shared, or read/write access fully synchronized on some
> higher level). I guess that what he wants to achieve is that
> copies of such string(s) in *other* (non-owner) threads would
> NOT need synchronization being private to those other (non-owner)
> threads... thread_shared_string/thread_private_string might
> work better...

It seems to me that you simply keep pushing the problem somewhere else
and it never goes away. So now, before we can mutate a string, we need
to make sure no other threads might be deep copying it. So we need a
table of threads that are or might be using the string. How do we lock
that?

I went through a brief period of time when I was addicted to lockless
algorithms and thought that locks were bad because they took time. Long
painful experience has taught me that this is just not the right way to
go. You do the lock, you hold it for as little time as possible, and you
keep in mind that locks that don't block or unblock threads are nearly
free.

I'm just tired of code that works perfectly on uniprocessor machines
and breaks subtly on multiprocessor. Or code that works fine on x86 but
blow up on a multi-processor R10K machine.

DS

Alexander Terekhov

unread,
Jan 28, 2002, 6:47:38 PM1/28/02
to
"Sergey P. Derevyago" <non-ex...@iobox.com> wrote in message
news:<3C552998...@iobox.com>...

> Andrei Alexandrescu wrote:
> > An idea just occured to me.
> >
> > Every threading system offers a "threadid()" system call through which you
> > figure out in which thread you currently are in. Usually the thread id is
an
> > integer.
> Actually, the things are much more complicated. In particular, POSIX
has
>
> pthread_t pthread_self(void);
>
> function, where pthread_t is _not_ an integer (in particular, you can take a
> look at
> http://groups.google.com/groups?selm=tSgw7.80%24RL6.390%40news.cpqcorp.net)
> We can obtain pthread_t with pthread_self() and compare with
pthread_equal(),
> but we don't have something like pthread_before() to create a map of
> pthread_t.

ummm... I think that something along the lines of
"thread_ptr" COULD work nicely for Andrei's strings:

http://groups.yahoo.com/group/boost/message/23741
(Pardon me for my poor English and late-night writting)

regards,
alexander.
(Recently degraded to "Moderated" status on boost.org for
the time being until "the tone of discussion improves")

Andrei Alexandrescu

unread,
Jan 28, 2002, 6:50:19 PM1/28/02
to
"Sergey P. Derevyago" <non-ex...@iobox.com> wrote in message
news:3C552998...@iobox.com...
> We can obtain pthread_t with pthread_self() and compare with
pthread_equal(),
> but we don't have something like pthread_before() to create a map of
> pthread_t.

I see. That's not really complicated. (You don't need an ordering on
threads, just to figure out whether you are in the same thread or
anther one).

> IMHO it's not guaranteed that a simple string copying is
more expensive than
> pthread_self() and pthread_equal() calls, so we can't speak about
the
> "effective strings implementation".

But I was talking about savings in other directions. My major gripe
about MT refcounted strings is that ST applications have to pay for
them. The scheme I am talking about would make that cost negligible.
Then, you didn't mention the savings of all refcount atomic increment
and decrement. As Herb Sutter has shown, they are significant.

> > Is there some mistake I'm making?
> Probably, yes :)

Waiting :o).


Andrei

---------------------------------
Check out THE C++ Seminar: 3 Days with 5 Experts
http://www.gotw.ca/cpp_seminar/

Alexander Terekhov

unread,
Jan 29, 2002, 3:11:25 AM1/29/02
to
David Schwartz <dav...@webmaster.com> wrote in message news:<3C55D15F...@webmaster.com>...

> Alexander Terekhov wrote:
>
> > ummm... I do not think that he really meant to lock the string
> > to perform a deep copy of it. The shared string is most likely
> > meant to be immutable (const) with respect to all threads (while
> > being shared, or read/write access fully synchronized on some
> > higher level). I guess that what he wants to achieve is that
> > copies of such string(s) in *other* (non-owner) threads would
> > NOT need synchronization being private to those other (non-owner)
> > threads... thread_shared_string/thread_private_string might
> > work better...
>
> It seems to me that you simply keep pushing the problem somewhere else
> and it never goes away. So now, before we can mutate a string, we need
> to make sure no other threads might be deep copying it. So we need a
> table of threads that are or might be using the string. How do we lock
> that?

Well, as I've indicated I did not fully capture the idea
of "thread-owned" strings myself. But what I meant with
thread_shared_string/thread_private_string is that the
thread_shared_string is supposed to be either non-ref.counted
or ref.counted-fully-synchronized. On the other hand,
a thread_private_string COULD be ref.counted *without*
need for any synchronization (its private/non-shared).

>
> I went through a brief period of time when I was addicted to lockless
> algorithms and thought that locks were bad because they took time. Long
> painful experience has taught me that this is just not the right way to
> go. You do the lock, you hold it for as little time as possible, and you
> keep in mind that locks that don't block or unblock threads are nearly
> free.

Not sure that I am really "addicted to lockless algorithms"...
other than strongly believing that a portable interface for
fast/non-blocking/lockless ref.counting would be really a nice
thing to have:

http://groups.google.com/groups?as_umsgid=3BC8463D.6E23002A%40web.de

Do you see any problems with it, David?

regards,
alexander.

Sergey P. Derevyago

unread,
Jan 29, 2002, 5:21:16 AM1/29/02
to
{WARNING: topic drift. This thread has drifted from C++ issues, please
remove clc++m from follow-ups. -mod/fwg}

1861235868Andrei Alexandrescu wrote:
> > We can obtain pthread_t with pthread_self() and compare with
> > pthread_equal(), but we don't have something like pthread_before() to
> > create a map of pthread_t.
> I see. That's not really complicated. (You don't need an ordering on
> threads, just to figure out whether you are in the same thread or
> anther one).

Sorry, I failed to put the pthread_t ordering issue into <BTW> tag :)

The point is:
1. pthread_t is not an integer.
2. pthread_ts must be compared using pthread_equal() function.
So the obtaining and comparing of pthread_t may not be quite reasonable for


the "effective strings implementation".

--
With all respect, Sergey. http://cpp3.virtualave.net/
mailto : ders at skeptik.net

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

C Nick Beaudrot

unread,
Jan 29, 2002, 5:29:15 PM1/29/02
to
No followup-to set yet, as we tread the murky waters in between threads
and C++.

On 28 Jan 2002 18:50:19 -0500, Andrei Alexandrescu
<andre...@hotmail.com> wrote:
: [snip]
:
: But I was talking about savings in other directions. My major gripe


: about MT refcounted strings is that ST applications have to pay for
: them.

Why doesn't
#ifndef _REENTRANT
// threadunsafe implementation
#else
// threadsafe
#endif
solve this problem? This seems like a perfectly legitimate use of
#ifdef. If you hate the preprocessor (like I do), and you love policy
classes so much, I'm sure you can cook up some ThreadingPolicy to select
between the two :-).

The STL implementation of std::string distributed with Sun Workshop 6.x
includes such ifdefs for whether or not threading is enabled/disabled.
you get different versions of the C++ runtime library depending on
whether or not you specify that the library or executable you are
building is multithreaded.

If you're going to distribute a shared library of some sort, you will
now have to distribute two versions, one that's threadsafe and one
that's not. This is a minor pain, I'll admit. But it's at least
manageable.


: The scheme I am talking about would make that cost negligible.


: Then, you didn't mention the savings of all refcount atomic increment
: and decrement. As Herb Sutter has shown, they are significant.

Sutter's tests were run on a Pentium 2. More modern processors (such as
the P3 or the Ultra-Sparc 3) have significantly less overhead for atomic
instructions.

Sutter's tests are also not const correct. This results in some
"performance degradation", though it might be bogus to speak of
performance degradation in such a test :-).

If you make Sutter's test const correct, and compile and run them on a
P3, the difference in performance between unsafe and save COW is
somewhere between 30 and 40%. But it's *still* faster than an
implementation that uses a deep copy every time, unless all you do is
iterate through your strings character by character.

Cheers,
Nick (long live the atomic cow)

--
Don't inflict upon me an idiotic belief of a God who doesn't answer the
prayers of sick children, at least in a tangible way that we understand.
Who doesn't intervene apparently in cases of war or poverty and famine.
But hey Mr self-absorbed NFL wide receiver, you need the first down on
third-and-seven? God's right there, he's got you covered.
-Bob Costas

David Bradley

unread,
Mar 4, 2002, 3:50:43 PM3/4/02
to
Andrei Alexandrescu wrote:

> them. The scheme I am talking about would make that cost negligible.
> Then, you didn't mention the savings of all refcount atomic increment
> and decrement. As Herb Sutter has shown, they are significant.

The "negligible" depends on the cost of obtaining the thread "identity".

David Bradley

unread,
Mar 4, 2002, 3:51:25 PM3/4/02
to
C Nick Beaudrot wrote:

> Why doesn't
> #ifndef _REENTRANT
> // threadunsafe implementation
> #else
> // threadsafe
> #endif
> solve this problem?


Even in a MT application, 95% of the strings may never cross thread
boundaries. So you're paying the price needlessly.

0 new messages