On Saturday, 25 February 2017 02:40:16 UTC+2, Alf P. Steinbach wrote:
> On 25.02.2017 01:01, Öö Tiib wrote:
> > On Saturday, 25 February 2017 01:30:23 UTC+2, Alf P. Steinbach wrote:
> >> On 24.02.2017 23:59, Öö Tiib wrote:
> >>> Most programs written nowadays are multi-threaded and so CoW
> >>> is not too good idea most of the time.
> >>
> >> Uhm, thread safety is an orthogonal issue. Adding illusory thread safety
> >> to a string class is going to cost, no matter the internal implementation.
> >>
> >> I write “illusory” because if one shares a mutable string object between
> >> threads, it would be impractical to lock down the object any time its
> >> `operator[]` returns a reference that possibly could be stored by the
> >> thread and used to modify the string at arbitrary times.
> >
> > Thread safety is not orthogonal when we assume that particular data can
> > be accessed only by one thread. That is "embarrassingly parallel".
> > For example if a task needs string then it is copied into input data of
> > task and then the task can be given to other thread to solve.
> > However if copy was lazy (with CoW) then we now either have race conditions
> > of naive CoW or inefficiency of non-naive CoW.
>
> To be clear, the situation you envision is that in thread B there is
>
> const Cow_string sb = ...
>
> initialized as a copy of tread A's
>
> Cow_string sa = ...
>
About like that. That sb is most likely non-const member of some
object (task input data) and may be later copied by B into some other
non-const string but in essence that was what I meant. Both sa and sb
are first owned by A then ownership of sb is transferred to b. CoW
however keeps that shared ownership alive behind scenes.
> Then when thread A modifies its `sa` it will have to stop the shared
> buffer ownership, which modifies some reference counter or whatever
> that's shared with `sb`, and might be concurrently accessed by B.
Yes and all what I'm saying that there is additional internal synchronized
check of reference counts in every write access just for case because
figuring out what two strings may have shared content is non-trivial
from outside. That makes a string slow compared to string that does not
do that but instead has small string optimization (up to 22 byte
strings have one level of indirection less).
>
> But this is only problem with an incorrect COW implementation, because
> the cost of ensuring atomicity of the reference counting is miniscule,
> completely insignificant, compared to the cost of copying the data,
> which is the other part of that un-sharing-of-buffer operation.
How recently you have profiled that? The lock-free atomic thingies seem
more expensive than good old mutexes in multiprocessor systems in my tests.
There is a surface somewhere in space of high frequency of copies and
rare writes into large sized objects from where CoW starts to win.
However making lot of copies of large objects can be likely reduced by
other means instead of making the copy lazy.
>
> >> The standard library's approach of most classes being thread-agnostic is
> >> fine, I think. It's the responsibility of the code that uses those
> >> classes, to ensure thread safe operation. E.g. by simply not sharing
> >> mutable objects between threads.
> >
> > We can't if the CoW is done behind scenes. Then that means locks at
> > wrong level of granularity inside of string AND locks outside as well
> > if we really plan to use same string from multiple threads concurrently.
>
> I think we can.
>
> Yes it involves atomic operations or locking for the reference counting.
> That's IMHO at the right level. It doesn't make the class generally
> thread safe, which I see as an impractical goal (and on the way to
> discovering how impractical, introducing awesome inefficiency), but it
> makes it just thread safe enough to allow simple logical copying over to
> another thread, without having to explicitly clone the object or check
> its reference count – which is of course another option, to just be
> completely explicit instead of relying on implicit, but then in a very
> non-ideal way “leaking implementation details”.
You are correct that the synchronized access of reference counts at low
level does not make the class thread safe. It still is mandatory for
CoW even if we have external locks outside.
I do not understand what scenario you meant with leaking details.
If something is thread-safe then that is not implementation detail but
important feature.