Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Lock Free Programming with Boost::shared_ptr instead of Hazard Pointers

572 views
Skip to first unread message

Roshan Naik

unread,
Mar 19, 2005, 5:47:23 AM3/19/05
to
After attending Andrei's talk at SDWest yesterday on Lock Free Prog (a
pretty good talk by the way), it left me wondering... Do we really need
the whole hazard pointer lists based pseudo garbage collection ? The
talk seemed to suggest that the purpose of the technique was to
collect/free unused pointers. So why can't boost::shared_ptr be put to
task here ?

I tried to think of a few reasons... but they didnt seem good enough...

1) One might say that shared_ptr does not solve the problem of cyclic
references. But i think the same problem exists anyway with raw pointers
used with in the Hazard pointers technique too.

2) Shared_ptr is not thread safe ?? I dont know. But it is possible that
it is currently not implemented to perform its operations in an
atomic/thread-safe fashion. But perhaps that can be fixed (with help
from the future standard..volatile/CAS). But ofcourse, hazard pointers
technique also relies on the similar requirements from the standard.

3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt it
however.

-Roshan

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Joe Seigh

unread,
Mar 19, 2005, 8:51:28 PM3/19/05
to
On 19 Mar 2005 05:47:23 -0500, Roshan Naik <rosha...@yahoo.com> wrote:

> After attending Andrei's talk at SDWest yesterday on Lock Free Prog (a
> pretty good talk by the way), it left me wondering... Do we really need
> the whole hazard pointer lists based pseudo garbage collection ? The
> talk seemed to suggest that the purpose of the technique was to
> collect/free unused pointers. So why can't boost::shared_ptr be put to
> task here ?
>
> I tried to think of a few reasons... but they didnt seem good enough...
>
> 1) One might say that shared_ptr does not solve the problem of cyclic
> references. But i think the same problem exists anyway with raw pointers
> used with in the Hazard pointers technique too.

The cyclic references problem for reference counting is a bit overstated.
When was the last time you actually ran into that problem? You either
have to be doing heavy graph theoritical programming or deliberately
not coding explicit dtors in collection classes.

Hazard pointers aren't shared, so you don't see them as part of a data
structure. Any cyclic reference problems have to be solved by conventional
means.

>
> 2) Shared_ptr is not thread safe ?? I dont know. But it is possible that
> it is currently not implemented to perform its operations in an
> atomic/thread-safe fashion. But perhaps that can be fixed (with help
> from the future standard..volatile/CAS). But ofcourse, hazard pointers
> technique also relies on the similar requirements from the standard.

I believe there's a parameter to make it mostly threadsafe. The non
thread-safe part is you have to know the refcount of any shared_ptr's
you copy won't go to zero during the copy operation. Basically, you
have to "own" the shared_ptr you're copying.

I did an experimental atomically thread-safe refcounted ptr which can be
found at http://atomic-ptr-plus.sourceforge.net/ . It's "not C++
as we know it, Jim", but just C++ being used for pointer abstraction
which you can't do in C. I'm not a C++ programmer. The project is
on hold until processors get multi-cored enough that lock-free gets
taken more seriously. Assuming that all the useful applications of
lock-free aren't locked up in patents by then. Anything involving
lock-free is considered novel by the patent office. I was going to do
a write up on how to do reader lock-free implementations of any data
structure, not just linked lists or simple objects, provided certain
conditions are met, until I realized it would be instructions for
creating lock-free applications to patent.

>
> 3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt it
> however.

It's possible. Thread-safe isn't the same as lock-free. It's not
a problem unless you need lock-free or async safety.

Getting back to the hazard pointers vs. refcounting issue. I recently
ported atomic_ptr to powerpc and ran the testcase I had which also
tested an implmentation of RCU. The testcase has reader threads
traversing a linked list while writer threads update the list.
With thread scheduling on Mac OS X changed to SCHED_RR and certain
preemption settings, while the mutex version ran about 900 reads or
writes per second per thread, the RCU version ran about 100000 reads
per second per thread and about 300+ writes per second per thread.
Even considering scheduling artifacts from running on a single cpu,
that's still pretty impressive. The atomic_ptr version only did
about 25000 reads per second per thread.

--
Joe Seigh

Alexander Terekhov

unread,
Mar 19, 2005, 9:00:05 PM3/19/05
to

Roshan Naik wrote:
[...]

> 3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt it
> however.

Boost::shared_ptr can be implemented without locks. But it provides
basic thread-safety, not strong. Hazard pointers stuff is about
lockless strong thread-safety, not basic. Another difference is that
hazard pointers stuff needs hazardously expensive store-load barrier
(not cheap even on IA32).

regards,
alexander.

Andrei Alexandrescu (See Website for Email)

unread,
Mar 19, 2005, 9:00:54 PM3/19/05
to
Roshan Naik wrote:
> After attending Andrei's talk at SDWest yesterday on Lock Free Prog (a
> pretty good talk by the way), it left me wondering... Do we really need
> the whole hazard pointer lists based pseudo garbage collection ? The
> talk seemed to suggest that the purpose of the technique was to
> collect/free unused pointers. So why can't boost::shared_ptr be put to
> task here ?
>
> I tried to think of a few reasons... but they didnt seem good enough...
>
> 1) One might say that shared_ptr does not solve the problem of cyclic
> references. But i think the same problem exists anyway with raw pointers
> used with in the Hazard pointers technique too.
>
> 2) Shared_ptr is not thread safe ?? I dont know. But it is possible that
> it is currently not implemented to perform its operations in an
> atomic/thread-safe fashion. But perhaps that can be fixed (with help
> from the future standard..volatile/CAS). But ofcourse, hazard pointers
> technique also relies on the similar requirements from the standard.
>
> 3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt it
> however.

4 and definitive) You can't CAS an entire smart pointer object. On
systems that support CAS2 (not all!), you can CAS two adjacent words
(a pointer and a pointer to the reference count next to it) but, as
the CUJ Oct/Dec articles by Maged Michael and myself explain, that
leaves only the reading lock-free, while the writing is locked by the
readers.

By the way, the sheer thought that boost::shared_ptr uses interlocked
operations whether you need that or not, makes me feel uncomfortable
performance-wise.


Andrei

Jeremy J

unread,
Mar 19, 2005, 9:02:10 PM3/19/05
to
> 3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt it however.

>From boost/detail/shared_count.hpp

void add_ref_copy()
{
#if defined(BOOST_HAS_THREADS)
mutex_type::scoped_lock lock(mtx_);
#endif
++use_count_;
}

It's because of this code that I use Loki::SmartPtr insted of shared_ptr.

Jeremy Jurksztowicz

Joshua Lehrer

unread,
Mar 20, 2005, 4:02:38 AM3/20/05
to

Roshan Naik wrote:
> 3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt
it
> however.


shared_ptr<> requries an OS specific way of atomically incrementing and
decrementing a value, while simultaneously providing the new (or old)
value.

For OS's that do not have atomic_increment and atomic_decrement, this
requires a lock.

For example, here is some pseudo-code:

shared_ptr<>::~shared_ptr<>() {
if (0==atomic_decrement(count))
delete ptr;
}

... where atomic_decrement returns the new value.

boost wraps up the fact that the increment and decrement are OS
specific in a class, which, IIRC, is called shared_count<>.

So, to answer your question, shared_ptr<> uses locks, or other OS
specific techniques, in order to atomically increment and decrement the
shared count.

joshua lehrer
factset research systems
NYSE:FDS

Eric Niebler

unread,
Mar 20, 2005, 4:04:37 AM3/20/05
to
Jeremy J wrote:
>>3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt it however.
>
>
>>From boost/detail/shared_count.hpp
>
> void add_ref_copy()
> {
> #if defined(BOOST_HAS_THREADS)
> mutex_type::scoped_lock lock(mtx_);
> #endif
> ++use_count_;
> }
>
> It's because of this code that I use Loki::SmartPtr insted of shared_ptr.
>

Two days ago, Peter Dimov made shared_ptr lock-free on Windows, and it's
coming on other platforms in the coming days. Check the boost CVS for
the latest. You can read Peter's announcement here:

http://aspn.activestate.com/ASPN/Mail/Message/boost/2524769


--
Eric Niebler
Boost Consulting
www.boost-consulting.com

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2005, 4:11:09 AM3/20/05
to
Jeremy J wrote:
>>3) Andrei felt that boost::shared_ptr _may_ be using locks. I doubt it however.
>
>
>>From boost/detail/shared_count.hpp
>
> void add_ref_copy()
> {
> #if defined(BOOST_HAS_THREADS)
> mutex_type::scoped_lock lock(mtx_);
> #endif
> ++use_count_;
> }
>
> It's because of this code that I use Loki::SmartPtr insted of shared_ptr.

Peter Dimov's post as of two days ago suggests that this problem has
been fixed on Windows. Other known SP designs that cater for
multithreading used atomic increment and decrement since time
immemorial, so I'm not sure why this delay.

Anyhow, for others the same preference is determined by shared_ptr's
compulsory presence of the deleter, whether or not you have a use for
it; for others it's the compulsory weak_ptr support, whether or not you
need that, etc.

I encourage you and others interested in adding a policy-based smart
pointer to boost (which in turn might influence seeing a policy-based
smart pointer in the next C++ standard), to stay tuned on boost's
mailing list. A policy-based smart pointer implemented by David B. Held
and Jonathan Turkanis is about to be reviewed. You may want to
participate to the review process.


Andrei

Roshan Naik

unread,
Mar 20, 2005, 6:28:41 PM3/20/05
to
> Boost::shared_ptr can be implemented without locks. But it provides
> basic thread-safety, not strong. Hazard pointers stuff is about
> lockless strong thread-safety, not basic. Another difference is that
> hazard pointers stuff needs hazardously expensive store-load barrier
> (not cheap even on IA32).

I looked for definitions of basic/strong thread safety but couldnt find
anything concrete. Could you please define them or provide links ?


Anyway, let me rephrase the question slightly and move away from
boost::shared_ptr... and focus on the underlying intent of my question.

Can we create a smart pointer that effectively replaces the hazard
pointers stuff ? And if yes, what properties would it need to have ?
Seems like strong thread safety would be one requirement from your above
statement.


On a side note... the one thing about hazard pointers that makes me
(only) slightly fussy, is that the destruction of objects is lazy and
order of destruction is not really deterministic. C++ programmers
essentially loose control over the order in which objects should be freed.

-Roshan

Joe Seigh

unread,
Mar 20, 2005, 6:40:21 PM3/20/05
to
On 19 Mar 2005 20:51:28 -0500, Joe Seigh <jsei...@xemaps.com> wrote:

A correction here


>
> Getting back to the hazard pointers vs. refcounting issue.

> ... The atomic_ptr version only did


> about 25000 reads per second per thread.

That was an atomic_ptr proxy collector scheme. A linked list
with atomic_ptr linking each element of the list would have
run much slower. Slower than the mutex based version though
it still would have been lock-free.

There isn't any one best scheme. All the schemes have trade offs.
It all depends on your particular situation.

Andrei Alexandrescu (See Website For Email)

unread,
Mar 20, 2005, 6:43:23 PM3/20/05
to
Joshua Lehrer wrote:
> shared_ptr<> requries an OS specific way of atomically incrementing and
> decrementing a value, while simultaneously providing the new (or old)
> value.
>
> For OS's that do not have atomic_increment and atomic_decrement, this
> requires a lock.

What OSs (I think you mean threading systems) would be those?

Andrei

Jonathan Turkanis

unread,
Mar 20, 2005, 6:52:25 PM3/20/05
to
Andrei Alexandrescu (See Website For Email) wrote:

Hi Andrei,

> I encourage you and others interested in adding a policy-based smart
> pointer to boost (which in turn might influence seeing a policy-based
> smart pointer in the next C++ standard), to stay tuned on boost's
> mailing list. A policy-based smart pointer implemented by David B.
> Held and Jonathan Turkanis

Thanks for the credit, but I'd say "implemented by Andrei Alexandrescu and David
B.Held (and others)". My contribution was miniscule. ;-)

> is about to be reviewed. You may want to
> participate to the review process.

Jonathan

Peter Dimov

unread,
Mar 20, 2005, 6:50:13 PM3/20/05
to
"Andrei Alexandrescu (See Website For Email)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<423D309D...@moderncppdesign.com>...

>
> Peter Dimov's post as of two days ago suggests that this problem has
> been fixed on Windows. Other known SP designs that cater for
> multithreading used atomic increment and decrement since time
> immemorial, so I'm not sure why this delay.

shared_ptr is unique in that in the general case it needs to maintain
two reference counts. It took some time for Alexander Terekhov to
discover a lock-free algorithm. The algorithm is overhead-free when
shared_ptr is used as an ordinary strong smart pointer (i.e. it only
touches one count in this case). The delay in incorporating his
algorithm in the Boost code base is entirely my fault, of course.

It should also be noted that many other SP implementations ignore the
need for the atomic decrement to be a memory barrier. Without a
barrier, the destructor may not see the latest object state. This
makes lock-free reference counting hard to implement portably. On
non-Windows platforms we're pretty much limited to g++'s
__exchange_and_add as a portable primitive, which is an implementation
detail of libstdc++ and nobody knows for sure what memory
synchronization semantics it's supposed to have.

> Anyhow, for others the same preference is determined by shared_ptr's
> compulsory presence of the deleter, whether or not you have a use for
> it;

Fixed.

> for others it's the compulsory weak_ptr support, whether or not you
> need that,

Cost: one word. (Aside: you need that. You just don't know it yet.)

> etc.

Cost: two words. ;-)

Andrei Alexandrescu (See Website For Email)

unread,
Mar 21, 2005, 2:00:36 AM3/21/05
to
Roshan Naik wrote:
>>Boost::shared_ptr can be implemented without locks. But it provides
>>basic thread-safety, not strong. Hazard pointers stuff is about
>>lockless strong thread-safety, not basic. Another difference is that
>>hazard pointers stuff needs hazardously expensive store-load barrier
>>(not cheap even on IA32).
>
> I looked for definitions of basic/strong thread safety but couldnt find
> anything concrete. Could you please define them or provide links ?
>
>
> Anyway, let me rephrase the question slightly and move away from
> boost::shared_ptr... and focus on the underlying intent of my question.
>
> Can we create a smart pointer that effectively replaces the hazard
> pointers stuff ? And if yes, what properties would it need to have ?
> Seems like strong thread safety would be one requirement from your above
> statement.
>
>
> On a side note... the one thing about hazard pointers that makes me
> (only) slightly fussy, is that the destruction of objects is lazy and
> order of destruction is not really deterministic. C++ programmers
> essentially loose control over the order in which objects should be freed.

But that should make you glad, not fussy. That pointers are not deleted
immediately is fundamental, and it is exactly the feature that gives you
the joy of lock-free operation and wait-free retiring: reader and writer
threads are humming around undisturbed, and as a natural effect, some
readers will be in effect while writers need to do their job; whenever
you update a pointer, some readers might linger reading the obsoleted data.

Any attempt to introduce compulsive retiring would necessarily introduce
one thread waiting for another, phenomenon known as... locking :o).


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Mar 21, 2005, 2:00:56 AM3/21/05
to
Peter Dimov wrote:
> "Andrei Alexandrescu (See Website For Email)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<423D309D.4080700@moderncppdesign
.com>...

> shared_ptr is unique in that in the general case it needs to maintain
> two reference counts.

What does "in the general case" mean? Does it, or does it not? I didn't
know shared_ptr is compile-time configurable in any way except the
#define'd multithreading aspect.

>>Anyhow, for others the same preference is determined by shared_ptr's
>>compulsory presence of the deleter, whether or not you have a use for
>>it;
>
> Fixed.

What does "fixed" mean? Is there space allocated for a deleter for each
smart pointer? Is there runtime overhead induced by the deleter?

>>for others it's the compulsory weak_ptr support, whether or not you
>>need that,
>
> Cost: one word. (Aside: you need that. You just don't know it yet.)

No. You don't need it. You just don't know it yet :o).

>>etc.
>
> Cost: two words. ;-)

So what's the total cost in size of shared_ptr, itemized as
sizeof(shared_ptr), plus the extra memory allocated per shared_ptr instance?


Andrei

David Abrahams

unread,
Mar 21, 2005, 2:03:44 AM3/21/05
to
Roshan Naik <rosha...@yahoo.com> writes:

>> Boost::shared_ptr can be implemented without locks. But it provides
>> basic thread-safety, not strong. Hazard pointers stuff is about
>> lockless strong thread-safety, not basic. Another difference is that
>> hazard pointers stuff needs hazardously expensive store-load barrier
>> (not cheap even on IA32).
>
> I looked for definitions of basic/strong thread safety but couldnt find
> anything concrete. Could you please define them or provide links ?

Yeah, I was wondering myself!

> Anyway, let me rephrase the question slightly and move away from
> boost::shared_ptr... and focus on the underlying intent of my question.
>
> Can we create a smart pointer that effectively replaces the hazard
> pointers stuff ? And if yes, what properties would it need to have ?

I think I know the answer: it would need to do all the hazard pointer
stuff underneath the covers. That is, upon construction with a
raw pointer, it would have to place the raw pointer in the "in use"
list, and upon destruction (or when the reference count goes to zero
in case it's counted), it would have to place the raw pointer on the
"done" list [I hope I remembered the procedure correctly ;-)].

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

David Abrahams

unread,
Mar 22, 2005, 6:42:47 PM3/22/05
to
"Joe Seigh" <jsei...@xemaps.com> writes:

> On 21 Mar 2005 21:03:26 -0500, David Abrahams <da...@boost-consulting.com> wrote:
>
>> "Andrei Alexandrescu (See Website for Email)"
>> <SeeWebsit...@moderncppdesign.com> writes:
>>
>>> David Abrahams wrote:


>>>> Roshan Naik <rosha...@yahoo.com> writes:
>>>>
>>>>>
>>>>> Can we create a smart pointer that effectively replaces the hazard
>>>>> pointers stuff ? And if yes, what properties would it need to have ?
>>>>
>>>>
>>>> I think I know the answer: it would need to do all the hazard pointer
>>>> stuff underneath the covers. That is, upon construction with a
>>>> raw pointer, it would have to place the raw pointer in the "in use"
>>>> list, and upon destruction (or when the reference count goes to zero
>>>> in case it's counted), it would have to place the raw pointer on the
>>>> "done" list [I hope I remembered the procedure correctly ;-)].
>>>

>>> No. That would totally defeat the whole point of lock-free, which is to
>>> keep pointers "occupied" as little as possible, not as long as possible.
>>>
>>> Readers would need to use a smart pointer that adds the raw pointer to
>>> the hazard list in the constructor, and removes it off the hazard list
>>> in the destructor.
>>
>> I think that's what I meant. I just forgot the correct procedure ;-)
>> My point was that the smart pointers involved would have to do the
>> exact same procedure required to do "lock free" with raw pointers,
>> just under-the-covers.
>
> Hazard pointers can't be shared writeable pointers. They're
> strictly thread local. So you can't do what you're thinking of
> here.

?? I wasn't proposing to share them.

Alexander Terekhov

unread,
Mar 22, 2005, 5:44:49 PM3/22/05
to

"Andrei Alexandrescu (See Website for Email)" wrote:
[...]
> I think it's very nice that you added terminology, but it would be even
> nicer if existing terminology could be used, if it exists. (As a
> criticising aside, not referring to you, too much terminology has been
> invented by people who refused to look up existing work.) So my question
> is, is there some existing terminology in threadese that you could use
> for your notions?

Sure. Roguewave.com's MT-0, MT-1 and MT-2. ;-)

http://www.google.de/groups?selm=3F799BC5.5DE6F778%40web.de

regards,
alexander.

Andrei Alexandrescu (See Website For Email)

unread,
Mar 22, 2005, 4:28:04 AM3/22/05
to
Peter Dimov wrote:
> "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsit...@moderncppdesign.com> wrote in message
> news:<423E21E4...@moderncppdesign.com>...

>
>>Peter Dimov wrote:
>>
>>>"Andrei Alexandrescu (See Website For Email)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<423D309D.4080700@moderncppdesig
n
>>
>>.com>...
>>
>>>shared_ptr is unique in that in the general case it needs to maintain
>>>two reference counts.
>>
>>What does "in the general case" mean? Does it, or does it not? I didn't
>>know shared_ptr is compile-time configurable in any way except the
>>#define'd multithreading aspect.
>
>
> In this context, "in the general case" means "when weak pointers are
> used".

But there is space and time dedicated to the weak pointers' existence,
even when they aren't effectively used, isn't it?

>>>>Anyhow, for others the same preference is determined by shared_ptr's
>>>>compulsory presence of the deleter, whether or not you have a use for
>>>>it;
>>>
>>>Fixed.
>>
>>What does "fixed" mean?
>

> "Fixed" means that as part of the other changes I eliminated the
> deleter overhead when deleters aren't used.

Ok, I need a thorough explanation here. My understanding is that there
is someplace, somewhere, something stored that tells you whether or not
a deleter is used, and if it's used, there is an indirect call. Both of
these are called "overhead" in my dictionary.

This is something that mildly annoyed me when reading in the past about
shared_ptr, including the proposal for standardization. There is no
place that clearly discusses the exact cost of each feature. There's a
lot of glossing over, a lot of sail and no anchor. I'd love to see a
clear, honest discussion: this incurs this space penalty and that
runtime penalty.

Otherwise, I can't build a model of the world. If I were to believe the
docs and the peremptory "Fixed" statements, I'd be like, "wow, they are
using some sheer magic there" and leave it at that. But it's not that.
It can't be. There is a dynamically-made decision whether you have a
deleter or not, and there is space allocated for the deleter, at least
if anything else than delete is used. If so, then there is an indirect
call for destruction, which is again overhead that could be eliminated
if the deleter is statically bound.

How does "fixed" go with that?

One of the unpleasant side effects of not discussing performance is that
a magic shared_ptr makes policy-based smart pointers look bad. I mean,
if shared_ptr somehow magically is the dessert topping and floor wax
that has no overhead, then why go through the hoops of defining a
policy-based smart pointer?

>>Is there space allocated for a deleter for each
>>smart pointer?
>

> No, if you don't use the deleter constructor.

Where is the bit that makes you decide whether or not you used the
deleter constructor?

>>Is there runtime overhead induced by the deleter?
>

> No, unless you use the deleter constructor. (I'm not sure that I
> understand your question properly, because deleters never had time
> costs, only space costs.)

Where is the indirection or the "if" statement that decides whether a
deleter constructor was used or not?

I can't understand. This is not quantic decoherence. Nobody can stand
there with a straight face and say there's no overhead. If it were that
way, everybody and their sister would do it that way. It's like
perpetual motion.

You must figure out dynamically during destruction whether a deleter was
used, and if it was used, there must be a dynamic (= indirect, slow)
call to the deleter. I want the truth. (I know, I can't handle the
truth. :o))

>>So what's the total cost in size of shared_ptr, itemized as
>>sizeof(shared_ptr), plus the extra memory allocated per shared_ptr instance?
>
>

> sizeof(shared_ptr) is two pointers.
>
> A default-constructed shared_ptr does not allocate anything.
>
> A shared_ptr constructed from a pointer allocates a control block that
> contains two pointers (one of them the vtable pointer) and two
> integers.

Aha. That's what a cop would call a good hint. What does the vtable take
care of? When is that pointer inspected? When is there an indirect call
through the vtable? (And by the way, a better design is a pointer to a
function which only asks for one indirection, not two.)

I really apologize for coming off this strong, but really, this whole
"shared_ptr does it all without overhead" thing can't go on forever.

David Abrahams

unread,
Mar 22, 2005, 6:34:51 PM3/22/05
to
"Andrei Alexandrescu (See Website For Email)" <SeeWebsit...@moderncppdesign.com> writes:

>>>So what's the total cost in size of shared_ptr, itemized as
>>>sizeof(shared_ptr), plus the extra memory allocated per shared_ptr instance?
>>
>>
>> sizeof(shared_ptr) is two pointers.
>>
>> A default-constructed shared_ptr does not allocate anything.
>>
>> A shared_ptr constructed from a pointer allocates a control block that
>> contains two pointers (one of them the vtable pointer) and two
>> integers.
>
> Aha. That's what a cop would call a good hint. What does the vtable take
> care of?

Destruction of the held resource.

> When is that pointer inspected?

The vtable pointer is never inspected. The other one is passed to a
virtual function in the vtable when use_count drops to zero.

> When is there an indirect call through the vtable?

When the use_count drops to zero.

> (And by the way, a better design is a pointer to a function which
> only asks for one indirection, not two.)

Yes, that's been pointed out to Peter long ago; he made some judgement
calls about which optimizations were worthwhile for the complexity
they add to the code. IIRC his feeling was that resource release is
generally so expensive that the extra indirection didn't matter. One
could also reduce the size of the code image somewhat by avoiding
virtual functions and generating a function pointer. But I think
we're talking about intrinsic overheads in this thread, so that's
really irrelevant.

> I really apologize for coming off this strong, but really, this whole
> "shared_ptr does it all without overhead" thing can't go on forever.

Well, nobody ever claimed it did. I think (at least in this thread)
Peter has been entirely up-front about overheads. He is
distinguishing storage for the custom deleter object itself -- which
can be optimized away completely when no deleter was specified -- from
the storage for the control block, which will always contain a vtable
in this design. It's easy to see why you might be confused by the
fact that the control block invokes the deleter, but to imply that
he's being dishonest is really inappropriate here. If you have
anything to apologize for, it would be that implication, and not
"coming on strong."

The fact is that having the "deleter-invoking" virtual function in the
control block yields a few benefits of its own, even if you leave out
the custom deleter capability and always have it doing "delete px":

- Immunity to destruction in the wrong heap
- Immunity to leaving out a virtual destructor on a base class
[anything else? I may have forgotten soemthing]

If you analyze the design carefully, *those* things are what incur the
cost of that vtable, and not the custom deleter capability. Once
you've got the vtable, the custom deleter is just "low-hanging fruit."
I covered some of these things in my SD West talk.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

"Philipp Bachmann" <"reverse email address"

unread,
Mar 22, 2005, 5:46:39 PM3/22/05
to
> Strong thread safety means that concurrent accesses are safe (the
> observable behavior of concurrent accesses is as if they occurred one
> after another in an unspecified order.)

This is sometimes also called "Serial Equivalence" - just to follow Andrei's
advice to refer to terminology well established.

Cheers,
Philipp.

SenderX

unread,
Mar 22, 2005, 6:47:22 PM3/22/05
to
> Is a "weighted reference count" the same as that described in [lins91c]
> on http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbibL.html ?

I'm not sure. Here is the algorithm:

http://cs.nyu.edu/goldberg/pubs/gold89.pdf

look at the generational reference count as well. I guess you could argue
that Joes atomic pointer is a little bit similar.

Ronald Landheer-Cieslak

unread,
Mar 22, 2005, 6:21:50 PM3/22/05
to
David Abrahams wrote:
> "Andrei Alexandrescu (See Website For Email)" <SeeWebsit...@moderncppdesign.com> writes:
>>Any attempt to introduce compulsive retiring would necessarily introduce
>>one thread waiting for another, phenomenon known as... locking :o).
> Seems to me all you have to do is maintain the retired objects list in
> the right order by always pushing at the front, and then do the actual
> destruction by recursing all the way to the end of the list and
> destroying on the way back out of the recursion. Ah, well, I guess
> there's a potential issue if you have to clean up the retired list
> while an older retired list is already being processed. Can that
> happen?
What can happen is that some threads may still be accessing the data
your pointers point to while you're hummingly freeing the associated
memory, thus pleasently invalidating whatever data they're reading and
happily making those threads crash. The sound of a crashing thread isn't
quite the same as the humming they do when you use hazard pointers
(which sounds more like "The sound of music" ;)

Hazard pointers make sure you cannot free memory that is still being
referenced, but without reference counts or other such GC mechanisms. It
is certainly not the best way to go in all cases and may sometimes be
slower than locking solutions (as Chris Thomasson can confirm for you
from his tests with Appcore), but it is certainly better than just
putting the pointers on a stack and freeing them with no regard for
other threads that may be accessing the data they point to.

rlc

Carl Barron

unread,
Mar 22, 2005, 6:17:08 PM3/22/05
to
In article <423FA766...@moderncppdesign.com>, See Website For
Email <SeeWebsit...@moderncppdesign.com> wrote:

> But there is space and time dedicated to the weak pointers' existence,
> even when they aren't effectively used, isn't it?

Yes , for the release version as I see no conditional compliations
to handle no weak_ptrs, and you would need runtime overhead to handle
the case otherwise, with a larger loss than just providing it used or
not.

Off the cuff just skimming the headers for private: shows this,

Space is one extra counter and time is manipulating it, unless you can
impliment a sharing pointer in fewer than two pointers and thread
control.

Alexander Terekhov

unread,
Mar 22, 2005, 5:48:49 PM3/22/05
to

"Andrei Alexandrescu (See Website For Email)" wrote:
[...]

> > In this context, "in the general case" means "when weak pointers are
> > used".
>
> But there is space and time dedicated to the weak pointers' existence,
> even when they aren't effectively used, isn't it?

Space for extra count and time to initialize it, a branch in release(),
and extra cpunt read/check (count::may_not_store_min decrement()) for
final release().

class sp_counted_base {
/* ... */
typedef refcount<std::size_t, basic> count;
count use_count_, self_count_;
/* ... */
public:
/* ... */
sp_counted_base() : use_count_(1), self_count_(1) { }

std::size_t use_count() const throw() {
return use_count_.get();
}

void add_ref() throw() {
use_count_.increment();
}

bool lock() throw() {
return use_count_.increment_if_not_min();
}

void weak_add_ref() throw() {
self_count_.increment();
}

void weak_release() throw() {
if (!self_count_.decrement(msync::acq, count::may_not_store_min))
destruct();
}

void release() throw() {
if (!use_count_.decrement()) {
dispose();
if (!self_count_.decrement(msync::rel, count::may_not_store_min))
destruct();
}
}

/* ... */

};

http://www.terekhov.de/pthread_refcount_t/experimental/refcount.cpp

regards,
alexander.

David Abrahams

unread,
Mar 22, 2005, 6:27:24 PM3/22/05
to
"Andrei Alexandrescu (See Website for Email)" <SeeWebsit...@moderncppdesign.com> writes:

> David Abrahams wrote:


>> "Andrei Alexandrescu (See Website For Email)" <SeeWebsit...@moderncppdesign.com> writes:
>>
>>
>>>Any attempt to introduce compulsive retiring would necessarily introduce
>>>one thread waiting for another, phenomenon known as... locking :o).
>>

>> Seems to me all you have to do is maintain the retired objects list in
>> the right order by always pushing at the front, and then do the actual
>> destruction by recursing all the way to the end of the list and
>> destroying on the way back out of the recursion. Ah, well, I guess
>> there's a potential issue if you have to clean up the retired list
>> while an older retired list is already being processed. Can that
>> happen?
>

> I can't understand your point so as to in the context of mine above, or
> in the context of the hazard pointers algorithm. At any rate, to restate
> my point above, compulsive destruction

I'm not suggesting "compulsive destruction"; just what I thought the
OP asked for: compulsive destruction _order_, at least within a single
thread.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Andrei Alexandrescu (See Website for Email)

unread,
Mar 22, 2005, 4:05:58 AM3/22/05
to
Alexander Terekhov wrote:

> David Abrahams wrote:
>
>>Roshan Naik <rosha...@yahoo.com> writes:
>>
>>
>>>>Boost::shared_ptr can be implemented without locks. But it provides
>>>>basic thread-safety, not strong. Hazard pointers stuff is about
>>>>lockless strong thread-safety, not basic. Another difference is that
>>>>hazard pointers stuff needs hazardously expensive store-load barrier
>>>>(not cheap even on IA32).
>>>
>>>I looked for definitions of basic/strong thread safety but couldnt find
>>>anything concrete. Could you please define them or provide links ?
>>
>>Yeah, I was wondering myself!
>
>
> Just like with exception-safety, there are multiple "levels" of thread-
> safety. I call it "unsafe", "basic", and "strong." For details, see
>
> http://www.google.de/groups?selm=3ECD2B7D.BBC526FC%40web.de
> (Subject: Re: shared_ptr/weak_ptr and thread-safety)

I think it's very nice that you added terminology, but it would be even
nicer if existing terminology could be used, if it exists. (As a
criticising aside, not referring to you, too much terminology has been
invented by people who refused to look up existing work.) So my question
is, is there some existing terminology in threadese that you could use
for your notions?

Andrei

Larry Evans

unread,
Mar 22, 2005, 4:27:02 AM3/22/05
to
On 03/21/2005 08:03 PM, SenderX wrote:
[snip]
> it truly atomic. There is more too it than that. You need a differential or
> weighted reference count.

Is a "weighted reference count" the same as that described in [lins91c]
on http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbibL.html ?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Larry Evans

unread,
Mar 22, 2005, 6:22:54 PM3/22/05
to
On 03/20/2005 03:11 AM, Andrei Alexandrescu (See Website For Email) wrote:
[snip]

> Anyhow, for others the same preference is determined by shared_ptr's
> compulsory presence of the deleter, whether or not you have a use for
[snip]

Last time I looked, there was also the need for an extra pointer to
handle multi-inherited pointee's. IOW, there's one pointer exposed
to the user, and another for use by the deleter. This second one
is needed in cast the first one doesn't point to the start of the
allocated object, IIUC.

[snip]

> mailing list. A policy-based smart pointer implemented by David B. Held
> and Jonathan Turkanis is about to be reviewed.

Hopefully, they've found a solution to the problem since, IIRC, one
of the policies is a conversion policy. For a pointer to a vector,
no-conversion would be needed; hence, no need for a 2nd pointer for
use by the deleter.

Joe Seigh

unread,
Mar 22, 2005, 6:10:12 PM3/22/05
to
On 21 Mar 2005 21:03:26 -0500, David Abrahams <da...@boost-consulting.com> wrote:

> "Andrei Alexandrescu (See Website for Email)"


> <SeeWebsit...@moderncppdesign.com> writes:
>
>> David Abrahams wrote:
>>> Roshan Naik <rosha...@yahoo.com> writes:
>>>
>>>>

>>>> Can we create a smart pointer that effectively replaces the hazard
>>>> pointers stuff ? And if yes, what properties would it need to have ?
>>>
>>>
>>> I think I know the answer: it would need to do all the hazard pointer
>>> stuff underneath the covers. That is, upon construction with a
>>> raw pointer, it would have to place the raw pointer in the "in use"
>>> list, and upon destruction (or when the reference count goes to zero
>>> in case it's counted), it would have to place the raw pointer on the
>>> "done" list [I hope I remembered the procedure correctly ;-)].
>>

>> No. That would totally defeat the whole point of lock-free, which is to
>> keep pointers "occupied" as little as possible, not as long as possible.
>>
>> Readers would need to use a smart pointer that adds the raw pointer to
>> the hazard list in the constructor, and removes it off the hazard list
>> in the destructor.
>
> I think that's what I meant. I just forgot the correct procedure ;-)
> My point was that the smart pointers involved would have to do the
> exact same procedure required to do "lock free" with raw pointers,
> just under-the-covers.
>

Hazard pointers can't be shared writeable pointers. They're strictly
thread local. So you can't do what you're thinking of here.

Are you sure Maged or somebody isn't proposing that you use distributed
reference counting by queueing the refcount increment and decrement ops on
thread local queues to be reconciled by a separate thread? That would
explain the CAS2 comment about a ptr to the object and ptr to the refcount.
Hazard pointers would show up if you used them for the queue head to
handle refcount increment may be pending case. Though I'd try to think
of some other scheme to avoid that expensive store/load membar that hazard
pointers have.

--
Joe Seigh

Ronald Landheer-Cieslak

unread,
Mar 22, 2005, 6:23:43 PM3/22/05
to
Joe Seigh wrote:
> Roshan Naik wrote:
>>2) Shared_ptr is not thread safe ?? I dont know. But it is possible that
>> it is currently not implemented to perform its operations in an
>>atomic/thread-safe fashion. But perhaps that can be fixed (with help
>>from the future standard..volatile/CAS). But ofcourse, hazard pointers
>>technique also relies on the similar requirements from the standard.
> I believe there's a parameter to make it mostly threadsafe. The non
> thread-safe part is you have to know the refcount of any shared_ptr's
> you copy won't go to zero during the copy operation. Basically, you
> have to "own" the shared_ptr you're copying.
Which basically conforms to the "an object can't protect against it's
own destruction" mantra. Doesn't that also hold true for the atomic_ptr?
(Haven't checked yet, so really don't know).

rlc

Ronald Landheer-Cieslak

unread,
Mar 22, 2005, 6:30:34 PM3/22/05
to
David Abrahams wrote:

> Roshan Naik <rosha...@yahoo.com> writes:
>
>
>>>Boost::shared_ptr can be implemented without locks. But it provides
>>>basic thread-safety, not strong. Hazard pointers stuff is about
>>>lockless strong thread-safety, not basic. Another difference is that
>>>hazard pointers stuff needs hazardously expensive store-load barrier
>>>(not cheap even on IA32).
>>
>>I looked for definitions of basic/strong thread safety but couldnt find
>>anything concrete. Could you please define them or provide links ?
>
>
> Yeah, I was wondering myself!
>
>
>>Anyway, let me rephrase the question slightly and move away from
>>boost::shared_ptr... and focus on the underlying intent of my question.
>>
>>Can we create a smart pointer that effectively replaces the hazard
>>pointers stuff ? And if yes, what properties would it need to have ?
> I think I know the answer: it would need to do all the hazard pointer
> stuff underneath the covers. That is, upon construction with a
> raw pointer, it would have to place the raw pointer in the "in use"
> list, and upon destruction (or when the reference count goes to zero
> in case it's counted), it would have to place the raw pointer on the
> "done" list [I hope I remembered the procedure correctly ;-)].

No: you don't have to put a pointer in your list of hazard pointers
until you have a hazardous reference to it. Basically, it works (a but)
like this:

1 do
2 {
3 p = global_ptr;
4 hptr_register(0, p);
5 } while (p != global_ptr);
6 // use p
7 hptr_free(0);
On line three, you read the global pointer for the first time, into some
local pointer for your own use (which you'll do on line 6). On line 4,
you put whatever value you just read into one of your thread's hazard
pointers (the first, in this case). On line 5, you check that the
pointer you read in line 3 is still the pointer you want, as it may have
changed in the mean time. If that is not the case, you start over. When
you reach line 6, you know that a) you have the right value in p (the
one in global_ptr) and b) the memory p points to is as valid as the
memory global_ptr pointed to. Now, all you have to be sure of is that 1)
global_ptr always points to something valid (meaning, among other
things, that changes to global_ptr are always atomic) and 2) you keep
your hazard pointer as long as you use the value now in p. Note that
global_ptr may change at any time - that is not a problem as having the
hazard pointer assures you, through the magic of safe memory
reclamation, that p points to valid memory. It *does not* assure you
that p == global_ptr at all times.
At line 7, after doing whatever you did on line 6, you free the hazard
pointer containing the value of p, thus allowing whatever memory p
pointed to to be freed. The amount of code between lines 5 and 7 should
be as short as possible.

You can check out an SMR implementation at http://jail-ust.sf.net (look
for libmemory and, for a set of containers that uses the SMR
implementation, libcontain).

HTH

rlc

Larry Evans

unread,
Mar 23, 2005, 5:11:09 AM3/23/05
to
On 03/22/2005 05:47 PM, SenderX wrote:
>>Is a "weighted reference count" the same as that described in [lins91c]
>>on http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbibL.html ?
>
> I'm not sure. Here is the algorithm:
>
> http://cs.nyu.edu/goldberg/pubs/gold89.pdf
>
It is the same. The weights are kept on both the pointers and
the pointee's, and each time a pointer is copied, the weight is
divided evenly between the two pointers. The [Jones92] reference
on http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbibJ.html even
references the same papers ([3] and [17] in gold89.pdf) about
weighted refcounting.

Andrei Alexandrescu (See Website for Email)

unread,
Mar 23, 2005, 5:09:04 AM3/23/05
to
(FYI - Dave and I have had a private email exchange since, so this post
will explain and include some of that exchange.)

David Abrahams wrote:
>>Aha. That's what a cop would call a good hint. What does the vtable take
>>care of?
>
> Destruction of the held resource.

Ok. So basically if there were no need to dynamically bind the
destruction of the held resource, that vtable could be missing. Right?

If I understand our email exchange correctly, that vtable also
dispatches to functions that take care of the weak_ptr part. Fine. That
means the overhead is shared between the weak_ptr and the
dynamically-bound deleter.

But that's a very different image of the world than what was promoted so
far, which was along the lines of:

Q: "Does weak_ptr add overhead to shared_ptr?"
A: "Not if you don't use any."
Q: "Does a dynamically-bound deleter have overhead?"
A: "Not if you don't use one."

So I'm being sent from one desk to another, and in the end nobody is
accountable for the vptr in there. See my point?

>>When is that pointer inspected?
>
> The vtable pointer is never inspected. The other one is passed to a
> virtual function in the vtable when use_count drops to zero.

That's a bummer because a comparison is cheaper than two indirections.

>>When is there an indirect call through the vtable?
>
> When the use_count drops to zero.

Great. So there *is* time overhead.

>>(And by the way, a better design is a pointer to a function which
>>only asks for one indirection, not two.)
>
>
> Yes, that's been pointed out to Peter long ago; he made some judgement
> calls about which optimizations were worthwhile for the complexity
> they add to the code. IIRC his feeling was that resource release is
> generally so expensive that the extra indirection didn't matter.

In wake of the news you sent to me today, it looks like it does matter,
and that the judgment call above was therefore revised.

>>I really apologize for coming off this strong, but really, this whole
>>"shared_ptr does it all without overhead" thing can't go on forever.
>
> Well, nobody ever claimed it did. I think (at least in this thread)
> Peter has been entirely up-front about overheads. He is
> distinguishing storage for the custom deleter object itself -- which
> can be optimized away completely when no deleter was specified -- from
> the storage for the control block, which will always contain a vtable
> in this design.

There was claim in this thread, black on white on my screen. The deleter
overhead was "fixed" and weak_ptr had overhead only if weak_ptr were used.

I do agree with the following statement: "If the deleter is the default
deleter, and if we ignore the inherent overhead of the dynamically-bound
deleter design (the vptr), then there is no extra futile space overhead."

I also agree with the following statement: "There is time overhead
brought by the dynamically-bound deleter in the form of a virtual
function call. We thought (with or without measuring in realistic
applications) that that overhead was small enough compared to the actual
deletion in many cases."

You see, that we can understand, measure, and discuss. I can't
understand "Fixed". I can't talk about "Fixed". Similarly, I can't
understand how a design accounting for weak pointers has zero overhead
until at least one weak_ptr appears.

My thesis is that all of these overheads are measurable and, in some
cases, eminent.

> It's easy to see why you might be confused by the
> fact that the control block invokes the deleter, but to imply that
> he's being dishonest is really inappropriate here. If you have
> anything to apologize for, it would be that implication, and not
> "coming on strong."

I do apologize if I sounded like implying dishonesty. I definitely
didn't mean that, and there's nothing personal either. I'm discussing
facts. And the facts tell that the following statements are not accurate:

1. "Fixed" means that as part of the other changes I eliminated the


deleter overhead when deleters aren't used.

2. No [space is allocated for the deleter], if you don't use the deleter
constructor.

3. No [time overhead is brought about by the deleter], unless you use


the deleter constructor. (I'm not sure that I understand your question
properly, because deleters never had time costs, only space costs.)

Those are copied and pasted text. Revised versions of the statements
above that I believe are way more accurate representations of reality are:

1. "Fixed" means that there was some extra slack space allocated even
when deleters weren't used, slack space that now is eliminated; of
course the vptr that is imposed by the design must be there so that
shared_ptr's destructor figures out destruction dynamically.

2. No slack space is occupied by the default deleter beyond the vptr.

3. The time overhead of the deleter is a virtual call, which is
mandatory. That could be optimized down to a failed comparison in the
default deleter case if shared_ptr is redesigned to use a pointer to
function instead of a vtable.

> The fact is that having the "deleter-invoking" virtual function in the
> control block yields a few benefits of its own, even if you leave out
> the custom deleter capability and always have it doing "delete px":
>
> - Immunity to destruction in the wrong heap
> - Immunity to leaving out a virtual destructor on a base class
> [anything else? I may have forgotten soemthing]
>
> If you analyze the design carefully, *those* things are what incur the
> cost of that vtable, and not the custom deleter capability. Once
> you've got the vtable, the custom deleter is just "low-hanging fruit."
> I covered some of these things in my SD West talk.

I understand the benefits of the design, and I understand how things
come together. It's a good design. No need to argue that. The point was,
it's not a *magic* design. Features have costs that must be known and
understood.


Andrei

David Abrahams

unread,
Mar 23, 2005, 5:10:40 AM3/23/05
to
Ronald Landheer-Cieslak <ron...@landheer.com> writes:

>> I think I know the answer: it would need to do all the hazard pointer
>> stuff underneath the covers. That is, upon construction with a
>> raw pointer, it would have to place the raw pointer in the "in use"
>> list, and upon destruction (or when the reference count goes to zero
>> in case it's counted), it would have to place the raw pointer on the
>> "done" list [I hope I remembered the procedure correctly ;-)].
>
> No: you don't have to put a pointer in your list of hazard pointers
> until you have a hazardous reference to it. Basically, it works (a but)
> like this:
>
> 1 do
> 2 {
> 3 p = global_ptr;
> 4 hptr_register(0, p);
> 5 } while (p != global_ptr);
> 6 // use p
> 7 hptr_free(0);

Given all of the contradicting responses, I must've made myself really
unclear. I failed to see "shared_ptr" in the subject line; and
thought the OP was asking if a smart pointer in general could be used
to manage the hazard pointer process. Even so you can use shared_ptr
with a custom deleter, though a custom smart pointer -- what I thought
the OP was asking about -- would be much more appropriate.

I had in mind something like:

// Just a sketch:
template <class T>
struct hazard_ptr : boost::noncopyable
{
hazard_ptr(T*& p)
{
do
{
px = p;
hptr_register(0, px);
}
while (px != p);
}

~hazard_ptr() { hptr_free(0); }

T* operator->() const { return px; }
T& operator*() const { return *px; }
private:
T* px;
};

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Thorsten Ottosen

unread,
Mar 23, 2005, 12:21:19 PM3/23/05
to
"David Abrahams" <da...@boost-consulting.com> wrote in message
news:ull8f7...@boost-consulting.com...

| "Andrei Alexandrescu (See Website For Email)"
<SeeWebsit...@moderncppdesign.com> writes:

| The fact is that having the "deleter-invoking" virtual function in the
| control block yields a few benefits of its own, even if you leave out
| the custom deleter capability and always have it doing "delete px":
|
| - Immunity to destruction in the wrong heap
| - Immunity to leaving out a virtual destructor on a base class
| [anything else? I may have forgotten soemthing]
|

I get the latter, but what do you mean by the first?

Thanks

-Thorsten

Peter Dimov

unread,
Mar 23, 2005, 12:26:50 PM3/23/05
to
"Peter Dimov" <pdi...@gmail.com> wrote in message
news:abefd130.05032...@posting.google.com...

> David Abrahams <da...@boost-consulting.com> wrote in message
> news:<ur7i9i...@boost-consulting.com>...

>> Roshan Naik <rosha...@yahoo.com> writes:
>>
>>> I looked for definitions of basic/strong thread safety but couldnt find
>>> anything concrete. Could you please define them or provide links ?
>>
>> Yeah, I was wondering myself!
>
> I think I coined these terms.

Nope, I'm wrong. It was Alexander who coined them. I was using "thread
neutral" and "as thread safe as an int" for basic before that.

Dave Harris

unread,
Mar 23, 2005, 12:47:40 PM3/23/05
to
da...@boost-consulting.com (David Abrahams) wrote (abridged):

> The fact is that having the "deleter-invoking" virtual function in the
> control block yields a few benefits of its own, even if you leave out
> the custom deleter capability and always have it doing "delete px":
>
> - Immunity to destruction in the wrong heap
> - Immunity to leaving out a virtual destructor on a base class
> [anything else? I may have forgotten soemthing]

Isn't it that bit of indirection which lets shared_ptr be so free with
incomplete types? That is the feature which is worth paying for, in my
view.

-- Dave Harris, Nottingham, UK

Peter Dimov

unread,
Mar 23, 2005, 12:52:30 PM3/23/05
to
"David Abrahams" <da...@boost-consulting.com> wrote in message
news:ull8f7...@boost-consulting.com...

>
> The fact is that having the "deleter-invoking" virtual function in the
> control block yields a few benefits of its own, even if you leave out
> the custom deleter capability and always have it doing "delete px":
>
> - Immunity to destruction in the wrong heap
> - Immunity to leaving out a virtual destructor on a base class
> [anything else? I may have forgotten soemthing]

- ability to destroy a shared_ptr when the template argument is incomplete
- ability to destroy a shared_ptr<void>
- ability to destroy a shared_ptr<X> when ~X is inaccessible

> If you analyze the design carefully, *those* things are what incur the
> cost of that vtable, and not the custom deleter capability. Once
> you've got the vtable, the custom deleter is just "low-hanging fruit."

Exactly.

Peter Dimov

unread,
Mar 23, 2005, 12:57:55 PM3/23/05
to
"Andrei Alexandrescu (See Website For Email)"
<SeeWebsit...@moderncppdesign.com> wrote in message
news:423FA766...@moderncppdesign.com...

> Peter Dimov wrote:
>>
>> In this context, "in the general case" means "when weak pointers are
>> used".
>
> But there is space and time dedicated to the weak pointers' existence,
> even when they aren't effectively used, isn't it?

There is a space cost of one integer, we already covered that.

At runtime, when a shared_ptr is copied, the lock-free implementation never
updates the extra count. This means that the weak_ptr feature no longer
affects the copy times as it did when a lock was used instead of atomic
increment.

When the use count drops to zero, there is one extra atomic decrement and
one extra indirection. The context in which these operations occur make them
hard to measure: the object and the control block are destroyed at the same
time. Unless you have a very, very efficient allocator and a trivial object
destructor or custom deleter.

>> "Fixed" means that as part of the other changes I eliminated the
>> deleter overhead when deleters aren't used.
>
> Ok, I need a thorough explanation here. My understanding is that there
> is someplace, somewhere, something stored that tells you whether or not
> a deleter is used, and if it's used, there is an indirect call. Both of
> these are called "overhead" in my dictionary.

OK, let's go with your definition of overhead. My point was that using a
custom deleter does not incur any extra overhead above what is already
present in the "baseline" case, when no custom deleters are used.

Yes, when the use count reaches zero, there is an indirect call through the
vtable pointer, and this indirect call invokes the deleter.

> This is something that mildly annoyed me when reading in the past about
> shared_ptr, including the proposal for standardization. There is no
> place that clearly discusses the exact cost of each feature. There's a
> lot of glossing over, a lot of sail and no anchor. I'd love to see a
> clear, honest discussion: this incurs this space penalty and that
> runtime penalty.

I'm not really sure what is there to discuss, given that the source is
available. Discussions will not change the implementation.

> Otherwise, I can't build a model of the world. If I were to believe the
> docs and the peremptory "Fixed" statements, I'd be like, "wow, they are
> using some sheer magic there" and leave it at that. But it's not that.
> It can't be. There is a dynamically-made decision whether you have a
> deleter or not, and there is space allocated for the deleter, at least
> if anything else than delete is used. If so, then there is an indirect
> call for destruction, which is again overhead that could be eliminated
> if the deleter is statically bound.
>
> How does "fixed" go with that?

"Fixed" means that in the previous implementation, the "missing" deleter
occupied space even when not used. This raised the memory footprint by one
word due to alignment. This has been now fixed.

> One of the unpleasant side effects of not discussing performance is that
> a magic shared_ptr makes policy-based smart pointers look bad. I mean,
> if shared_ptr somehow magically is the dessert topping and floor wax
> that has no overhead, then why go through the hoops of defining a
> policy-based smart pointer?

If you want to show that shared_ptr has overhead, which it does, you merely
need to point people at the implementation. Point them at the old
implementation, if you like. It has much more overhead.

You don't even need to do that. C++ programmers are incredibly sensitive to
overhead. If anything, they overestimate its significance slightly.

>>>Is there space allocated for a deleter for each
>>>smart pointer?
>>
>> No, if you don't use the deleter constructor.
>
> Where is the bit that makes you decide whether or not you used the
> deleter constructor?

The vtable pointer is different.

>>>Is there runtime overhead induced by the deleter?
>>
>> No, unless you use the deleter constructor. (I'm not sure that I
>> understand your question properly, because deleters never had time
>> costs, only space costs.)
>
> Where is the indirection or the "if" statement that decides whether a
> deleter constructor was used or not?

The vtable pointer is different.

> I can't understand. This is not quantic decoherence. Nobody can stand
> there with a straight face and say there's no overhead. If it were that
> way, everybody and their sister would do it that way. It's like
> perpetual motion.
>
> You must figure out dynamically during destruction whether a deleter was
> used, and if it was used, there must be a dynamic (= indirect, slow)
> call to the deleter. I want the truth. (I know, I can't handle the
> truth. :o))

Your problem is that you want to associate the overhead with a particular
shared_ptr feature: dynamically-bound deleters. It doesn't work that way.
There is a certain overhead in the control block, which, as I have already
said twice, is two pointers. This overhead is present regardless of whether
a custom deleter is used. This is why someone can stand with a straight face
and say that a custom deleter adds no overhead - because it adds no overhead
relative to the "baseline" case, not relative to some other smart pointer.

>>>So what's the total cost in size of shared_ptr, itemized as
>>>sizeof(shared_ptr), plus the extra memory allocated per shared_ptr
>>>instance?
>>
>>
>> sizeof(shared_ptr) is two pointers.
>>
>> A default-constructed shared_ptr does not allocate anything.
>>
>> A shared_ptr constructed from a pointer allocates a control block that
>> contains two pointers (one of them the vtable pointer) and two
>> integers.
>
> Aha. That's what a cop would call a good hint. What does the vtable take
> care of? When is that pointer inspected? When is there an indirect call
> through the vtable?

An indirect call is made through the vtable when the use count drops to
zero. It either calls "delete p" or invokes the deleter, when there is one.
The stored pointer is passed as an argument.

Another indirect call is made when all weak pointers to the object are gone.
This indirect call ("delete this") destroys the control block itself.

A third indirect call is made when get_deleter<> is used.

> (And by the way, a better design is a pointer to a
> function which only asks for one indirection, not two.)
>
> I really apologize for coming off this strong, but really, this whole
> "shared_ptr does it all without overhead" thing can't go on forever.

Well, a good first step in that direction would be for you to point me at a
place where this claim has been made, so that we can put the issue to rest.

Peter Dimov

unread,
Mar 23, 2005, 1:05:32 PM3/23/05
to
"Andrei Alexandrescu (See Website for Email)"
<SeeWebsit...@moderncppdesign.com> wrote in message
news:4240BB13...@moderncppdesign.com...

> If I understand our email exchange correctly, that vtable also
> dispatches to functions that take care of the weak_ptr part. Fine. That
> means the overhead is shared between the weak_ptr and the
> dynamically-bound deleter.

Not exactly true.

weak_ptr, by itself, does not need the vtable. Its presence splits the
teardown part into two events, one that occurs when all shared_ptr instances
are gone, and a second when all weak_ptr instances are gone. As explained,
in the current design this translates to two virtual calls instead of one.
But weak_ptr doesn't need the vtable to operate.

That is, if we assume a traditional shared_ptr that doesn't use virtual
dispatch, adding weak_ptr support to it will add one integer and will not
need a vtable to be added. For most intents and purposes, if you aren't
being extremely picky, which you are, the cost of weak_ptr is one integer.

This usually comes as a surprise to most people.

> But that's a very different image of the world than what was promoted so
> far, which was along the lines of:
>
> Q: "Does weak_ptr add overhead to shared_ptr?"
> A: "Not if you don't use any."

Except the one integer.

> Q: "Does a dynamically-bound deleter have overhead?"
> A: "Not if you don't use one."

Your original Q wasn't really like this Q. This Q has "dynamically-bound" in
it. The original asked about deleters, period, something I interpreted as
"custom deleters".

Q: Does the fact that shared_ptr use dynamic dispatch have overhead?
A: Duh. Of course.

Q: Do custom deleters add any extra overhead beyond the above?
A: No.

Happy now?

> So I'm being sent from one desk to another, and in the end nobody is
> accountable for the vptr in there. See my point?

Eh, who is responsible for the vptr has been explained by Dave in that same
post you are replying to.

>>>When is there an indirect call through the vtable?
>>
>> When the use_count drops to zero.
>
> Great. So there *is* time overhead.

Yes, in theory. It is usually hidden (to the point of unmeasurable) by the
atomic decrement at one side and the delete p call at the other. You may be
able to demonstrate it with a benchmark if you try, but I don't guarantee
success.

>>>(And by the way, a better design is a pointer to a function which
>>>only asks for one indirection, not two.)
>>
>> Yes, that's been pointed out to Peter long ago; he made some judgement
>> calls about which optimizations were worthwhile for the complexity
>> they add to the code. IIRC his feeling was that resource release is
>> generally so expensive that the extra indirection didn't matter.
>
> In wake of the news you sent to me today, it looks like it does matter,
> and that the judgment call above was therefore revised.

I have no idea what you two are talking about, sorry. ;-)

David Abrahams

unread,
Mar 23, 2005, 5:43:00 PM3/23/05
to
"Andrei Alexandrescu (See Website for Email)" <SeeWebsit...@moderncppdesign.com> writes:

> (FYI - Dave and I have had a private email exchange since, so this post
> will explain and include some of that exchange.)
>
> David Abrahams wrote:
>>>Aha. That's what a cop would call a good hint. What does the vtable take
>>>care of?
>>
>> Destruction of the held resource.
>
> Ok. So basically if there were no need to dynamically bind the
> destruction of the held resource, that vtable could be missing. Right?
>
> If I understand our email exchange correctly, that vtable also
> dispatches to functions that take care of the weak_ptr part.

No. The only virtual functions are dispose() and get_deleter().

> Fine. That means the overhead is shared between the weak_ptr and the
> dynamically-bound deleter.

No, the overhead can be viewed as being shared between three features:

[View A]
- immunity from deletion in wrong heap
- immunity from delete slicing
- custom deleters


But the fact that using a custom deleter imposes additional space
overhead above the first two features listed there means that it's
also a valid viewpoint to say

[View P] that the vtable is required for the other two features and
the custom deleter is low-hanging fruit that just imposes a space
overhead when it's actually used. And that view may be understandable
in light of the way the design developed.

> But that's a very different image of the world than what was promoted so
> far, which was along the lines of:
>
> Q: "Does weak_ptr add overhead to shared_ptr?"
> A: "Not if you don't use any."

Let's be precise. Summing it up that way isn't fair to anyone. It
both makes Peter seem dishonest and glosses over the real costs.

Peter:


"shared_ptr is unique in that in the general case it needs to

maintain two reference counts. It took some time for Alexander
^^^^^^^^
Terekhov to discover a lock-free algorithm. The algorithm is
overhead-free when shared_ptr is used as an ordinary strong smart
pointer (i.e. it only touches one count in this case)."
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

From this quote "maintain" obviously means "change." He's saying that
if you don't use a weak_ptr, you are going to pay for storage (but not
modification) of the weak count. I think you probably do touch the
weak count once, when use_count goes to zero. Otherwise you can't
decide whether to deallocate the control block. As a matter of fact I
think Peter could have been more careful here; he doesn't tend to
consider time costs associated with use_count going to zero, as the
cost of resource release tends to swamp other costs.

[BTW, if you saw my talk I discussed a more primitive algorithm in
which you had to change the weak count every time the regular count
was changed]

You:


"What does "in the general case" mean? Does it, or does it not? I
didn't know shared_ptr is compile-time configurable in any way
except the #define'd multithreading aspect."

Apparently misinterpreting Peter, thinking that he's trying to say that
you don't even need to store that count.

Peter:
``In this context, "in the general case" means "when weak pointers
are used".''

Obviously failing to catch and correct your misinterpretation.

> Q: "Does a dynamically-bound deleter have overhead?"
> A: "Not if you don't use one."
>
> So I'm being sent from one desk to another, and in the end nobody is
> accountable for the vptr in there. See my point?

No. Peter quite clearly said:

"A shared_ptr constructed from a pointer allocates a control block
that contains two pointers (one of them the vtable pointer) and
two integers."

And it has been pointed out to you multiple times that the vtable
pointer enables other features besides custom deleters. Although I
think the exact overhead dependencies could have been spelled out more
clearly, I know you're smart enough to understand what's been said in
this thread. This just looks like willful misinterpretation on your
part.

>>>When is that pointer inspected?
>>
>> The vtable pointer is never inspected. The other one is passed to a
>> virtual function in the vtable when use_count drops to zero.
>
> That's a bummer because a comparison is cheaper than two indirections.

Actually a comparison is done first:

~shared_count() // nothrow
{
if( pi_ != 0 ) pi_->release();
}

Or do you mean that you want to inspect the "custom deleter" function
pointer? I wouldn't do that; it would be bad for code size.

>>>When is there an indirect call through the vtable?
>>
>> When the use_count drops to zero.
>
> Great. So there *is* time overhead.

Yes, but depending on whether you take View A or P, that overhead may
or may not be associated with the deleter.

>>>(And by the way, a better design is a pointer to a function which
>>>only asks for one indirection, not two.)
>>
>>
>> Yes, that's been pointed out to Peter long ago; he made some judgement
>> calls about which optimizations were worthwhile for the complexity
>> they add to the code. IIRC his feeling was that resource release is
>> generally so expensive that the extra indirection didn't matter.
>
> In wake of the news you sent to me today, it looks like it does matter,
> and that the judgment call above was therefore revised.

But that's all irrelevant to this discussion if we're talking about
intrinsic design overheads.

>>>I really apologize for coming off this strong, but really, this whole
>>>"shared_ptr does it all without overhead" thing can't go on forever.
>>
>> Well, nobody ever claimed it did. I think (at least in this thread)
>> Peter has been entirely up-front about overheads. He is
>> distinguishing storage for the custom deleter object itself -- which
>> can be optimized away completely when no deleter was specified -- from
>> the storage for the control block, which will always contain a vtable
>> in this design.
>
> There was claim in this thread, black on white on my screen. The deleter
> overhead was "fixed"

Ah, no, look again:

You:


> > Anyhow, for others the same preference is determined by
> > shared_ptr's compulsory presence of the deleter, whether or not

^^^^^^^^


> > you have a use for it;

Peter:
> Fixed.

It seems obvious to me that you were talking about storage at that
point -- at least you have to admit that "presence" might lead someone
to think so. You can't change the game now and claim that Peter was
saying there was no time overhead.

> and weak_ptr had overhead only if weak_ptr were used.
>
> I do agree with the following statement: "If the deleter is the default
> deleter, and if we ignore the inherent overhead of the dynamically-bound
> deleter design (the vptr), then there is no extra futile space overhead."
>
> I also agree with the following statement: "There is time overhead
> brought by the dynamically-bound deleter in the form of a virtual
> function call. We thought (with or without measuring in realistic
> applications) that that overhead was small enough compared to the actual
> deletion in many cases."
>
> You see, that we can understand, measure, and discuss. I can't
> understand "Fixed". I can't talk about "Fixed". Similarly, I can't
> understand how a design accounting for weak pointers has zero overhead
> until at least one weak_ptr appears.
>
> My thesis is that all of these overheads are measurable and, in some
> cases, eminent.

Agreed.

>> It's easy to see why you might be confused by the
>> fact that the control block invokes the deleter, but to imply that
>> he's being dishonest is really inappropriate here. If you have
>> anything to apologize for, it would be that implication, and not
>> "coming on strong."
>
> I do apologize if I sounded like implying dishonesty. I definitely
> didn't mean that, and there's nothing personal either. I'm discussing
> facts. And the facts tell that the following statements are not accurate:
>
> 1. "Fixed" means that as part of the other changes I eliminated the
> deleter overhead when deleters aren't used.

There was context. Right above that:

> >>Anyhow, for others the same preference is determined by
> >>shared_ptr's compulsory presence of the deleter, whether or not

^^^^^^^^


> >>you have a use for it;

So maybe a bit too terse.

> 2. No [space is allocated for the deleter], if you don't use the
> deleter constructor.

Absolutely true, if you take View P.

> 3. No [time overhead is brought about by the deleter], unless you use
> the deleter constructor. (I'm not sure that I understand your question
> properly, because deleters never had time costs, only space costs.)

Again, true if you take view P.

> Those are copied and pasted text. Revised versions of the statements
> above that I believe are way more accurate representations of reality are:
>
> 1. "Fixed" means that there was some extra slack space allocated even
> when deleters weren't used, slack space that now is eliminated; of
> course the vptr that is imposed by the design must be there so that
> shared_ptr's destructor figures out destruction dynamically.

That's better. I would say "so that static destruction information is
captured at the point of initialization." (even better)

> 2. No slack space is occupied by the default deleter beyond the vptr.

True.

> 3. The time overhead of the deleter is a virtual call, which is
> mandatory. That could be optimized down to a failed comparison in
> the default deleter case

It's only an optimization if the shared_ptr is NULL.

> if shared_ptr is redesigned to use a pointer to function instead of
> a vtable.

I'm not sure you've thought that one through completely, or else I'm
misunderstanding you.

>> The fact is that having the "deleter-invoking" virtual function in the
>> control block yields a few benefits of its own, even if you leave out
>> the custom deleter capability and always have it doing "delete px":
>>
>> - Immunity to destruction in the wrong heap
>> - Immunity to leaving out a virtual destructor on a base class
>> [anything else? I may have forgotten soemthing]
>>
>> If you analyze the design carefully, *those* things are what incur the
>> cost of that vtable, and not the custom deleter capability. Once
>> you've got the vtable, the custom deleter is just "low-hanging fruit."
>> I covered some of these things in my SD West talk.
>
> I understand the benefits of the design, and I understand how things
> come together. It's a good design. No need to argue that.

And I wasn't. I was trying to point out that custom deleter support
itself doesn't impose the vtable unless you also leave out other
features. It's easy to imagine a design where you only pay for the
vtable when a custom deleter is used, but where you don't get the
benefits of those other two features.

> The point was, it's not a *magic* design. Features have costs that
> must be known and understood.

Yes. It would be good to have a more explicit description of the
design's inherent costs somewhere, probably in the Boost
documentation.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Peter Dimov

unread,
Mar 23, 2005, 5:47:59 PM3/23/05
to
Andrei Alexandrescu (See Website For Email) wrote:

> Peter Dimov wrote:
>> A shared_ptr constructed from a pointer allocates a control block
>> that contains two pointers (one of them the vtable pointer) and two
>> integers.
>
> Aha. That's what a cop would call a good hint. What does the vtable
> take care of? When is that pointer inspected? When is there an
> indirect call through the vtable? (And by the way, a better design is
> a pointer to a function which only asks for one indirection, not two.)

I don't know about "better", but if you have performance in mind:

100M indirect calls to delete px, where px is new X and ~X is virtual, take
about 24.1-24.5 seconds for me. 100M virtual calls to same take 22.25-22.65
seconds. I have verified that the code generated by the compiler does, in
fact, represent what I have in mind.

No fair bringing virtual destructors into this? Be my guest. 100M indirect
calls to delete new int: 22 seconds. 100M virtual calls: 21.8 seconds. Hm.
We ought to be able to do better. Everyone knows that virtual dispatch is
slow.

One billion indirect calls to delete 0: 6.8 seconds. One billion virtual
calls: 7.75 seconds. Victory.

So... if you destroy one billion pointers that store a NULL (and keep in
mind that a default-constructed shared_ptr is not that, you'd need
shared_ptr<X> px( (X*)0 ) ), you'll be able to gain almost one second by
using a pointer to function instead of virtual dispatch.

Yeah.

Peter Dimov

unread,
Mar 23, 2005, 5:45:03 PM3/23/05
to
"Stefan Strasser" <sstr...@systemhaus-gruppe.de> wrote in message
news:d1m05b$pm2$02$1...@news.t-online.com...
>
> why's this OS or threading system specific? of course those can provide
> functions to do this but a non-inlined function call for this isn't a
> good idea in something like boost shared_ptr anyway, so it'll probably
> use CPU-specific code.

FWIW, my experiments (on x86) have shown that an atomic operation that
synchronizes memory usually hides a virtual call. I'd be surprised if this
is not true on other platforms where the cost of synchronizing memory is
even higher.

> (atm it uses locks, at least on x86/linux)

Lock-free is coming. ;-)

Joe Seigh

unread,
Mar 23, 2005, 5:47:22 PM3/23/05
to
On 22 Mar 2005 18:42:47 -0500, David Abrahams <da...@boost-consulting.com> wrote:

> "Joe Seigh" <jsei...@xemaps.com> writes:
>
>> On 21 Mar 2005 21:03:26 -0500, David Abrahams <da...@boost-consulting.com> wrote:
>>
>>> "Andrei Alexandrescu (See Website for Email)"
>>> <SeeWebsit...@moderncppdesign.com> writes:
>>>
>>>> Readers would need to use a smart pointer that adds the raw pointer to
>>>> the hazard list in the constructor, and removes it off the hazard list
>>>> in the destructor.
>>>
>>> I think that's what I meant. I just forgot the correct procedure ;-)
>>> My point was that the smart pointers involved would have to do the
>>> exact same procedure required to do "lock free" with raw pointers,
>>> just under-the-covers.
>>
>> Hazard pointers can't be shared writeable pointers. They're
>> strictly thread local. So you can't do what you're thinking of
>> here.
>

> ?? I wasn't proposing to share them.
>

Ah, I see. You're proposing a new smart pointer class, not redoing one
of the current smart pointer classes. So, for example since this may
not be exactly what you're talking about, you'd have a smr_ptr<T>, which
would be the shared part, mostly threadsafe as int. It looks a lot like
a raw ptr except for the dtor part and some atomicity guarantees. And
you'd have a local_smr_ptr<T> which is meant to be local non-shared, which
uses the hazard pointer logic you'd posted here.

Even though smr_ptr<> looks like a raw pointer, it's not. So you can't just
start using local_smr_ptr<> on raw pointers. So to use local_smr_ptr<> with
something like shared_ptr<T>, you either have to hard code smr_ptr<T> instead
of T* in shared_ptr<>, or have shared_ptr<T> recognise pointer types and
use T instead of T* in that case, assuming that's what you wanted.

You're probably aware of all this already. I'm just making sure I know what
you're talking about. It might help if you listed the operations that you want
to be possible with these new smart pointers, e.g.

shared_ptr<T> sp1, sp2;
smr_ptr<T> smrp;
local_smr_ptr<T> lp; // must be local in all cases (local implies ownership)
T * p; // local raw pointer

sp1 = sp2; // both dest and src must be "owned"

// operations with local_smr_ptr<>
lp = sp1; // src need not be "owned"
lp = smrp; // src need not be "owned"
sp1 = lp; // dest must be owned
smrp = lp; // dest must be owned

// operations with T*
p = sp1; // src must be owned
p = smrp; // src must be owned
sp1 = p; // dest must be owned
smrp = p; // dest must be owned

That's a little brief. More elaboration of the safety guarantees would be better,
but you should get the idea.

Ronald Landheer-Cieslak

unread,
Mar 23, 2005, 5:45:48 PM3/23/05
to
David Abrahams wrote:
> Ronald Landheer-Cieslak <ron...@landheer.com> writes:
>
>
>>>I think I know the answer: it would need to do all the hazard pointer
>>>stuff underneath the covers. That is, upon construction with a
>>>raw pointer, it would have to place the raw pointer in the "in use"
>>>list, and upon destruction (or when the reference count goes to zero
>>>in case it's counted), it would have to place the raw pointer on the
>>>"done" list [I hope I remembered the procedure correctly ;-)].
>>
>>No: you don't have to put a pointer in your list of hazard pointers
>>until you have a hazardous reference to it. Basically, it works (a but)
oops: s/but/bit/ ;)
That would work, except that I see no way for this class to know which
hazard pointer to register in (but it could just find the first NULL in
the local hazard pointers and register there, keeping the index of the
hazard pointer it stored in.

Actually, something like this would make a nice addition to libmemory.. :)

rlc

Alexander Terekhov

unread,
Mar 23, 2005, 5:56:08 PM3/23/05
to

David Abrahams wrote:
[...]

> // Just a sketch:
> template <class T>
> struct hazard_ptr : boost::noncopyable
> {
/*
> hazard_ptr(T*& p)
*/
hazard_ptr(atomic<T*>& p)
> {
> do
> {
/*
> px = p;
*/
px = p.load(msync::naked_competing);
> hptr_register(0, px);
barrier(msync::slfence);
> }
/*
> while (px != p);
*/
while (px != p.load(msync::ddacq));

> }
>
> ~hazard_ptr() { hptr_free(0); }
>
> T* operator->() const { return px; }
> T& operator*() const { return *px; }
> private:
> T* px;
> };

regards,
alexander.

David Abrahams

unread,
Mar 23, 2005, 5:57:15 PM3/23/05
to
"Peter Dimov" <pdi...@gmail.com> writes:

>> In wake of the news you sent to me today, it looks like it does matter,
>> and that the judgment call above was therefore revised.
>
> I have no idea what you two are talking about, sorry. ;-)


We may have misinterpreted your post in this thread:
http://tinyurl.com/47qao

(http://news.gmane.org/find-root.php?message_id=%3cADDF481D8720EE4D887949674663491801AB2EB4%40hermes.macrovisioneurope.com%3e)

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Andrei Alexandrescu (See Website for Email)

unread,
Mar 23, 2005, 5:59:51 PM3/23/05
to
Dave Harris wrote:
> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>
>>The fact is that having the "deleter-invoking" virtual function in the
>>control block yields a few benefits of its own, even if you leave out
>>the custom deleter capability and always have it doing "delete px":
>>
>>- Immunity to destruction in the wrong heap
>>- Immunity to leaving out a virtual destructor on a base class
>> [anything else? I may have forgotten soemthing]
>
>
> Isn't it that bit of indirection which lets shared_ptr be so free with
> incomplete types? That is the feature which is worth paying for, in my
> view.

It is - when it's being used. The key is to be able to take it out of
the package deal when performance matters to a portion of your code.

By the way, if one did not forget to define the destructor virtual in
the base class, are we looking at two virtual calls there during
destruction?


Andrei

David Abrahams

unread,
Mar 23, 2005, 5:59:06 PM3/23/05
to
"Thorsten Ottosen" <nes...@cs.auc.dk> writes:

> "David Abrahams" <da...@boost-consulting.com> wrote in message
> news:ull8f7...@boost-consulting.com...
> | "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsit...@moderncppdesign.com> writes:
>
> | The fact is that having the "deleter-invoking" virtual function in the
> | control block yields a few benefits of its own, even if you leave out
> | the custom deleter capability and always have it doing "delete px":
> |
> | - Immunity to destruction in the wrong heap
> | - Immunity to leaving out a virtual destructor on a base class
> | [anything else? I may have forgotten soemthing]
> |
>
> I get the latter, but what do you mean by the first?

A not-uncommon problem for windows programmers is that separate
dynamic libraries (or a dynamic library and the application) are
linked statically to the runtime. Each static runtime gets its own
heap. If you allocate memory from one heap, then pass a pointer
across the dynamic library boundary, and deallocate in the other heap,
you corrupt memory.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Ben Hutchings

unread,
Mar 23, 2005, 6:01:05 PM3/23/05
to
Andrei Alexandrescu (See Website For Email) wrote:
> Joshua Lehrer wrote:
>> shared_ptr<> requries an OS specific way of atomically incrementing and
>> decrementing a value, while simultaneously providing the new (or old)
>> value.

It also seems to need CAS or similar for converting weak to strong
pointers, since this must never increment a strong count of 0.

>> For OS's that do not have atomic_increment and atomic_decrement, this
>> requires a lock.
>
> What OSs (I think you mean threading systems) would be those?

Pthreads for a start.

--
Ben Hutchings
Having problems with C++ templates? Your questions may be answered by
<http://womble.decadentplace.org.uk/c++/template-faq.html>.

David Abrahams

unread,
Mar 23, 2005, 6:05:41 PM3/23/05
to
"Peter Dimov" <pdi...@gmail.com> writes:

> "Peter Dimov" <pdi...@gmail.com> wrote in message
> news:abefd130.05032...@posting.google.com...
>> David Abrahams <da...@boost-consulting.com> wrote in message
>> news:<ur7i9i...@boost-consulting.com>...
>>> Roshan Naik <rosha...@yahoo.com> writes:
>>>
>>>> I looked for definitions of basic/strong thread safety but couldnt find
>>>> anything concrete. Could you please define them or provide links ?
>>>
>>> Yeah, I was wondering myself!
>>
>> I think I coined these terms.
>
> Nope, I'm wrong. It was Alexander who coined them. I was using "thread
> neutral" and "as thread safe as an int" for basic before that.

I actually think Alexander's terms may be more useful to us as C++
programmers, because they are highly analogous to the exception-safety
guarantees of the same names... unless, of course, we want to make up
new terms for the exception guarantees driven by terms like "Serial
Equivalence" ;-)

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Jeff Flinn

unread,
Mar 23, 2005, 6:09:24 PM3/23/05
to

"Peter Dimov" <pdi...@gmail.com> wrote in message
news:d1rnis$lhn$1...@domitilla.aioe.org...

> "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsit...@moderncppdesign.com> wrote in message
>
>> I really apologize for coming off this strong, but really, this whole
>> "shared_ptr does it all without overhead" thing can't go on forever.
>
> Well, a good first step in that direction would be for you to point me at
> a
> place where this claim has been made, so that we can put the issue to
> rest.

Perhaps it was, IIRC, a Herb Sutter C/C++UJ article, where he said something
akin to ...you might as well just use shared_ptr, and make your life
simpler..., paraphrasing of course.

Jeff Flinn

David Abrahams

unread,
Mar 23, 2005, 6:10:08 PM3/23/05
to
bran...@cix.co.uk (Dave Harris) writes:

> da...@boost-consulting.com (David Abrahams) wrote (abridged):
>> The fact is that having the "deleter-invoking" virtual function in the
>> control block yields a few benefits of its own, even if you leave out
>> the custom deleter capability and always have it doing "delete px":
>>
>> - Immunity to destruction in the wrong heap
>> - Immunity to leaving out a virtual destructor on a base class
>> [anything else? I may have forgotten soemthing]
>
> Isn't it that bit of indirection which lets shared_ptr be so free with
> incomplete types? That is the feature which is worth paying for, in my
> view.

Yes!! I forgot that one!

It's very important for minimizing compilation dependencies: a
shared_ptr<T> can be destroyed without having T's definition in the
translation unit.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Larry Evans

unread,
Mar 23, 2005, 6:09:46 PM3/23/05
to
On 03/21/2005 07:55 PM, Peter Dimov wrote:
[snip]

>
>>Is there space allocated for a deleter for each
>>smart pointer?
>
> No, if you don't use the deleter constructor.

>
>>Is there runtime overhead induced by the deleter?
>
> No, unless you use the deleter constructor. (I'm not sure that I

> understand your question properly, because deleters never had time
> costs, only space costs.)
>
>
>>So what's the total cost in size of shared_ptr, itemized as
>>sizeof(shared_ptr), plus the extra memory allocated per shared_ptr instance?
>
>
> sizeof(shared_ptr) is two pointers.

Looked at the code and thought more detail might clarify things a bit.
Here's the detail:

The two pointers in sizeof(shared_ptr) are:

* One pointer to the pointee, P, after possibly any conversion by
shared_ptr<Q> copy CTOR's to some supertype, Q, for example.
* One pointer to an instance of sp_counted_base or "control block"
as Peter calls it below.

sp_counted_base contains the two integers Peter mentions below:

use_count_
weak_count_

sp_counted_base has 2 subclasses:

sp_counted_impl_p //holds pointer
sp_counted_impl_pd //holds pointer and deleter

>
> A default-constructed shared_ptr does not allocate anything.
>

> A shared_ptr constructed from a pointer allocates a control block that
> contains two pointers (one of them the vtable pointer)

And the other is the original pointer passed to the shared_ptr<P>::
shared_ptr(P*) CTOR. This assures the deleter deletes the right
object. The vtable pointer is needed by shared_ptr to call the
dispose method, and, of course, so shared_ptr can delete the correct
sp_counted_base.

> and two integers.

stored in the sp_counted_base.

In this case, the "control block" is a sp_counted_impl_p
defined here:

http://cvs.sourceforge.net/viewcvs.py/boost/boost/boost/detail/sp_counted_impl.hpp?rev=1.2&view=auto

>
> A shared_ptr constructed from a pointer and a deleter has the same
> costs as above plus the size of the deleter. Currently empty deleters
> are not optimized away.

In this case, the "control block" is a sp_counted_impl_pd defined
in the same file as sp_counted_impl_p.

...
Hope I didn't miss anything, and HTH.

Regards,
Larry

Andrei Alexandrescu (See Website for Email)

unread,
Mar 23, 2005, 6:11:14 PM3/23/05
to
Peter Dimov wrote:
> "Andrei Alexandrescu (See Website for Email)"
> <SeeWebsit...@moderncppdesign.com> wrote in message
> news:4240BB13...@moderncppdesign.com...
>
>>If I understand our email exchange correctly, that vtable also
>>dispatches to functions that take care of the weak_ptr part. Fine. That
>>means the overhead is shared between the weak_ptr and the
>>dynamically-bound deleter.
>
>
> Not exactly true.
>
> weak_ptr, by itself, does not need the vtable. Its presence splits the
> teardown part into two events, one that occurs when all shared_ptr instances
> are gone, and a second when all weak_ptr instances are gone. As explained,
> in the current design this translates to two virtual calls instead of one.
> But weak_ptr doesn't need the vtable to operate.

Great. Who needs the table then, in addition to the dynamically-bound
deleter?

> That is, if we assume a traditional shared_ptr that doesn't use virtual
> dispatch, adding weak_ptr support to it will add one integer and will not
> need a vtable to be added. For most intents and purposes, if you aren't
> being extremely picky, which you are, the cost of weak_ptr is one integer.
>
> This usually comes as a surprise to most people.

I understand (It doesn't come as a surprise to me. I've been toying with
smart pointers until I got blue in the face. Actually I've been through
all of the rainbow colors, sigh.)

One thing that I've noticed during my chromatic avatars is that with
smart pointers everything matters, because they (sp's) are so pervasive.
It's not about pickiness. One extra allocation matters. One extra branch
matters (e.g., Dave Held and my CUJ article shows that null checking
incurs significant overhead). One extra indirection matters. Two extra
indirection matter more. One extra futile interlocked operation could be
lethal. And so on.

>>But that's a very different image of the world than what was promoted so
>>far, which was along the lines of:
>>
>>Q: "Does weak_ptr add overhead to shared_ptr?"
>>A: "Not if you don't use any."
>
> Except the one integer.

Including some housekeeping for maintaining it, as Alex Terekhov clearly
showed in this thread with code and all.

>>Q: "Does a dynamically-bound deleter have overhead?"
>>A: "Not if you don't use one."
>
> Your original Q wasn't really like this Q. This Q has "dynamically-bound" in
> it. The original asked about deleters, period, something I interpreted as
> "custom deleters".

As I said in a previous post, this is a misunderstanding. I consistently
referred to the design of dynamically binding the deleter to the smart
pointer.

> Q: Does the fact that shared_ptr use dynamic dispatch have overhead?
> A: Duh. Of course.
>
> Q: Do custom deleters add any extra overhead beyond the above?
> A: No.
>
> Happy now?

Not happy. Beaming.

>>Great. So there *is* time overhead.
>
> Yes, in theory. It is usually hidden (to the point of unmeasurable) by the
> atomic decrement at one side and the delete p call at the other. You may be
> able to demonstrate it with a benchmark if you try, but I don't guarantee
> success.

Au contraire, do you have numbers supporting your conjecture that the
overhead is hidden to the point of unmeasurable?

One thing that makes our views of the world a tad different is that you
only talk about the overhead of one feature when the overhead of all
features is already there, which misses important overheads imposed by
the design that are there. For example, given that your design mandates
for *two* dynamic allocations for each pointee object makes your
baseline start there; my baseline starts from the most lean incarnation,
which is: no extra dynamic allocation.

Also, as you say above, you always have the interlocked operations
decided on a per-application basis, so you count the cost of the extra
virtual call when those are already in. And so on.

Now I understand why you misunderstood my references to "deleter" as
"custom deleter". Your view was that the dynamically-bound deleter is a
given, and you start counting from there. My view is that the
dynamically-bound deleter is unnecessary in a large class of
applications, so I start counting from a different watermark.

>>>Yes, that's been pointed out to Peter long ago; he made some judgement
>>>calls about which optimizations were worthwhile for the complexity
>>>they add to the code. IIRC his feeling was that resource release is
>>>generally so expensive that the extra indirection didn't matter.
>>
>>In wake of the news you sent to me today, it looks like it does matter,
>>and that the judgment call above was therefore revised.
>
> I have no idea what you two are talking about, sorry. ;-)

I'm referring to the recent changes you've made to replace the vptr to a
pointer to a regular function.


Andrei

Chris Thomasson

unread,
Mar 23, 2005, 6:10:51 PM3/23/05
to
> 1 do
> 2 {
> 3 p = global_ptr;
> 4 hptr_register(0, p);

store/load in step 4, after store to hptr[0].

> 5 } while (p != global_ptr);
> 6 // use p
> 7 hptr_free(0);


This heavy memory barrier can possibly cause some problems when used with
say, a lock-free stack pop function:


do
{ do /* expensive loop */
{ local = anchor;
*hptr = local;
(store/load)
} while ( local != anchor );

...

} while ( ! cas_release( &anchor, ... ) );


Without hazard pointers, you would just need to do this:


do
{ local = anchor;
read_barrier_depends();
...

} while ( ! dwcas_release( &anchor, ... ) );


The dependant load barriers more light-weight than (store/load)'s...

Andrei Alexandrescu (See Website for Email)

unread,
Mar 23, 2005, 6:11:58 PM3/23/05
to
Peter Dimov wrote:
> "David Abrahams" <da...@boost-consulting.com> wrote in message
> news:ull8f7...@boost-consulting.com...
>
>>The fact is that having the "deleter-invoking" virtual function in the
>>control block yields a few benefits of its own, even if you leave out
>>the custom deleter capability and always have it doing "delete px":
>>
>>- Immunity to destruction in the wrong heap
>>- Immunity to leaving out a virtual destructor on a base class
>> [anything else? I may have forgotten soemthing]
>
>
> - ability to destroy a shared_ptr when the template argument is incomplete
> - ability to destroy a shared_ptr<void>
> - ability to destroy a shared_ptr<X> when ~X is inaccessible
>
>
>>If you analyze the design carefully, *those* things are what incur the
>>cost of that vtable, and not the custom deleter capability. Once
>>you've got the vtable, the custom deleter is just "low-hanging fruit."
>
>
> Exactly.

First, I see there's been a misunderstanding through this thread. When I
was saying "deleter" I meant "dynamically bound deleter support as
provided by design". I thought the context made that clear.

Also, when I meant "overhead" I meant "overhead brought by the design
decisions" and not "overhead compared to previous implementations of
shared_ptr".

With these corrections made, all of the above is crystal-clear to me,
and was from day one. These are all nice advantages of a design that
binds the deleter dynamically to the smart pointer.

That design decision (1) has a cost, (2) has advantages that aren't
always used. For example, intrusive reference counting
(COM/CORBA/Mozilla...) solves that problem by making the target object
responsible for it.


Andrei

Chris Thomasson

unread,
Mar 23, 2005, 7:02:51 PM3/23/05
to
Sorry for the last post, it only posted to c.p.t. ignore it.
damn outlook! ;)


> Joe Seigh wrote:
>> Roshan Naik wrote:
>>>2) Shared_ptr is not thread safe ?? I dont know. But it is possible that
>>> it is currently not implemented to perform its operations in an
>>>atomic/thread-safe fashion. But perhaps that can be fixed (with help
>>>from the future standard..volatile/CAS). But ofcourse, hazard pointers
>>>technique also relies on the similar requirements from the standard.
>> I believe there's a parameter to make it mostly threadsafe. The non
>> thread-safe part is you have to know the refcount of any shared_ptr's
>> you copy won't go to zero during the copy operation. Basically, you
>> have to "own" the shared_ptr you're copying.
> Which basically conforms to the "an object can't protect against it's
> own destruction" mantra. Doesn't that also hold true for the atomic_ptr?
> (Haven't checked yet, so really don't know).


I believe he is talking about the "classic" race-condition inherit with all
"basic" reference counting schemes. Here is a description of the data-race:

http://groups-beta.google.com/group/comp.lang.c++.moderated/msg/b8c1e7bd6b54f541

Joes algorithm overcomes this nasty and subtle condition. It allows for
"strong" thread-safety via. real atomic pointer.


p.s.

I am leaving the SenderX alias behind. It looks like lock-free is slowly
being taken out of the "X-Files Status"...

;)


--
http://appcore.home.comcast.net/
(portable lock-free data-structures)

Andrei Alexandrescu (See Website for Email)

unread,
Mar 24, 2005, 6:48:08 AM3/24/05
to
{Would both you and David cool things a bit. Others may not understand
and think you are starting a flame war. -mod}

David Abrahams wrote:
> And it has been pointed out to you multiple times that the vtable
> pointer enables other features besides custom deleters. Although I
> think the exact overhead dependencies could have been spelled out more
> clearly, I know you're smart enough to understand what's been said in
> this thread. This just looks like willful misinterpretation on your
> part.

I'd say that's a deeply unfair manner to continue the discussion. You
put me in the situation of either saying "I'm stupid" or "I'm
dishonest". I've seen that in the past, it's a classic, and I don't
think it's nice to use it.

There's a third one: "Peter and I mistakenly thought that whenever you
say 'deleter' you meant 'custom deleter', not the general design notion
'dynamically bound deleter'".

Once and for good: When I talk about deleters, I talk about the
following: the decision on how the resource is being disposed is
fundamentally made dynamically in shared_ptr, and that has a cost in
time, and a cost in space.

In that view, and barring my misunderstanding that the vptr has
additional roles in there, the vptr in there caters 100% to the decision
of dynamically binding the deleter to the smart pointer. I understand
the good consequences of that decision.

I hope we are on the same page starting now and not leave myself to
solve the dilemma of whether I'm stupid or
"willfully-misinterpretingly". Thank you.

(Actually, let me think for a minute...) :o)

>>>The vtable pointer is never inspected. The other one is passed to a
>>>virtual function in the vtable when use_count drops to zero.
>>
>>That's a bummer because a comparison is cheaper than two indirections.
>
>
> Actually a comparison is done first:
>
> ~shared_count() // nothrow
> {
> if( pi_ != 0 ) pi_->release();
> }
>
> Or do you mean that you want to inspect the "custom deleter" function
> pointer? I wouldn't do that; it would be bad for code size.

Obviously. I think it's more efficient to:

* Not use a vptr, but instead a regular pointer to function

* Have that pointer to function be null if the default deleter is in
vigor for a shared_ptr object;

* Test that pointer against zero first, and if so, inline the call
"delete whatever;"

* If the pointer is not zero, do the indirect call, passing whatever
extra information is necessary to the custom deleter (such as, a pointer
to that deleter's state).

* If people ask get_deleter() and you have a null pointer to function,
you can't give them that. But you can always create an object that does
what the default deleter does, so everybody's happy.

>>>>When is there an indirect call through the vtable?
>>>
>>>When the use_count drops to zero.
>>
>>Great. So there *is* time overhead.
>
> Yes, but depending on whether you take View A or P, that overhead may
> or may not be associated with the deleter.

Again, by "deleter" I don't mean "custom deleter". I mean
"dynamically-bound deletion of the resource". My viewpoint is: is there
overhead compared to a design that doesn't dynamically bind the deleter
to the pointer? The answer is "yes".

>>In wake of the news you sent to me today, it looks like it does matter,
>>and that the judgment call above was therefore revised.
>
> But that's all irrelevant to this discussion if we're talking about
> intrinsic design overheads.

Yes, I've reread that email now. Somehow I had thought Peter has done
what I mention above (replace the vptr with a pointer to function). We
can abandon that line of discussion.

>>>Anyhow, for others the same preference is determined by
>>>shared_ptr's compulsory presence of the deleter, whether or not
>
> ^^^^^^^^
>
>>>you have a use for it;
>
>
> Peter:
>
>>Fixed.
>
>
> It seems obvious to me that you were talking about storage at that
> point -- at least you have to admit that "presence" might lead someone
> to think so. You can't change the game now and claim that Peter was
> saying there was no time overhead.

You see, here's the same misunderstanding again. When I say "deleter", I
don't mean "custom deleter". I mean "design to bind resource deletion
dynamically". That design is compulsive in the sense that it costs you
the vptr's presence (space) and use (time) even when you actually know
statically how you dispose of the resource.

I really find it hard to explain myself better. Sorry, if I can't clear
the confusion with this post, I don't have the ability to do much more.

So (hoping that my view has been understood), "fixed" really should be
rewritten as "within the constraints of our design, which necessarily
asks for a dynamic decision on how to dispose the resource, we were
having a suboptimal implementation that wasted some extra space, and
that implementation now has been optimized to use less space".

Now I understand that Peter was congruent in his "Fixed" statement
because he was referring to his previous implementation, not to the
intrinsic tradeoffs of the design - which obviously cannot be "fixed".
It is those intrinsic tradeoffs that my entire discussion has focused
and will focus on, not whatever slack word that was shaved off.

Do I make sense?

Now, I hope you understand my annoyance when seeing "Fixed" without any
qualification. A casual reader might read that post and think, "cool,
shared_ptr somehow is just as efficient as any other design, they found
out *some* way! Cool!"

>>My thesis is that all of these overheads are measurable and, in some
>>cases, eminent.
>
> Agreed.

Awesome (I quote this just for the joy of consensus).

>>if shared_ptr is redesigned to use a pointer to function instead of
>>a vtable.
>
> I'm not sure you've thought that one through completely, or else I'm
> misunderstanding you.

I hope this post will clarify the issues. It's all about replacing the
vptr with a pointer to a regular function, thus avoiding the second
indirection. It's a simple redesign or refactoring if you wish. A clear
example of that design is in my Variant implementation.

> It's easy to imagine a design where you only pay for the
> vtable when a custom deleter is used, but where you don't get the
> benefits of those other two features.

Hmmm... how would that design look like? At the minimum, you must store
a bit somewhere "the default deleter is/isn't to be used".

>>The point was, it's not a *magic* design. Features have costs that
>>must be known and understood.
>
>
> Yes. It would be good to have a more explicit description of the
> design's inherent costs somewhere, probably in the Boost
> documentation.

Amen to that.


Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Mar 24, 2005, 6:50:10 AM3/24/05
to
David Abrahams wrote:
> "Thorsten Ottosen" <nes...@cs.auc.dk> writes:
>
>
>>"David Abrahams" <da...@boost-consulting.com> wrote in message
>>news:ull8f7...@boost-consulting.com...
>>| "Andrei Alexandrescu (See Website For Email)"
>><SeeWebsit...@moderncppdesign.com> writes:
>>
>>| The fact is that having the "deleter-invoking" virtual function in the
>>| control block yields a few benefits of its own, even if you leave out
>>| the custom deleter capability and always have it doing "delete px":
>>|
>>| - Immunity to destruction in the wrong heap
>>| - Immunity to leaving out a virtual destructor on a base class
>>| [anything else? I may have forgotten soemthing]
>>|
>>
>>I get the latter, but what do you mean by the first?
>
>
> A not-uncommon problem for windows programmers is that separate
> dynamic libraries (or a dynamic library and the application) are
> linked statically to the runtime. Each static runtime gets its own
> heap. If you allocate memory from one heap, then pass a pointer
> across the dynamic library boundary, and deallocate in the other heap,
> you corrupt memory.

Also, it should be noted that a not-uncommon solution for Windows
programmers is to use COM objects that obviate the problem above.

Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Mar 24, 2005, 6:49:35 AM3/24/05
to
Peter Dimov wrote:
> 100M indirect calls to delete px, where px is new X and ~X is virtual, take
> about 24.1-24.5 seconds for me. 100M virtual calls to same take 22.25-22.65
> seconds. I have verified that the code generated by the compiler does, in
> fact, represent what I have in mind.
>
> No fair bringing virtual destructors into this? Be my guest. 100M indirect
> calls to delete new int: 22 seconds. 100M virtual calls: 21.8 seconds. Hm.
> We ought to be able to do better. Everyone knows that virtual dispatch is
> slow.
>
> One billion indirect calls to delete 0: 6.8 seconds. One billion virtual
> calls: 7.75 seconds. Victory.
>
> So... if you destroy one billion pointers that store a NULL (and keep in
> mind that a default-constructed shared_ptr is not that, you'd need
> shared_ptr<X> px( (X*)0 ) ), you'll be able to gain almost one second by
> using a pointer to function instead of virtual dispatch.
>
> Yeah.

Nah.

Did you make the memory the swiss cheese that it is in a normal
application, or you just allocated stuff in a loop? Did you simulate the
cost realistically, where cache misses do happen, or you just deleted
stuff in a loop, thus bringing a smile to processor's L1 cache?


Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Mar 24, 2005, 6:50:32 AM3/24/05
to
David Abrahams wrote:
> bran...@cix.co.uk (Dave Harris) writes:
>
>
>>da...@boost-consulting.com (David Abrahams) wrote (abridged):
>>
>>>The fact is that having the "deleter-invoking" virtual function in the
>>>control block yields a few benefits of its own, even if you leave out
>>>the custom deleter capability and always have it doing "delete px":
>>>
>>>- Immunity to destruction in the wrong heap
>>>- Immunity to leaving out a virtual destructor on a base class
>>> [anything else? I may have forgotten soemthing]
>>
>>Isn't it that bit of indirection which lets shared_ptr be so free with
>>incomplete types? That is the feature which is worth paying for, in my
>>view.
>
>
> Yes!! I forgot that one!
>
> It's very important for minimizing compilation dependencies: a
> shared_ptr<T> can be destroyed without having T's definition in the
> translation unit.

For the record: that's indeed a nice feature. I never contended the
advantages of shared_ptr. But I'd like to clarify what the tradeoffs of
shared_ptr particular design are: advantages *and disadvantages*.

Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Mar 24, 2005, 6:51:29 AM3/24/05
to
Larry Evans wrote:
> Looked at the code and thought more detail might clarify things a bit.

Awesome! This is the kind of transparency that's beneficial. I
understand now how "fixed" is self-consistent, but that's not as beneficial.

> Here's the detail:
>
> The two pointers in sizeof(shared_ptr) are:
>
> * One pointer to the pointee, P, after possibly any conversion by
> shared_ptr<Q> copy CTOR's to some supertype, Q, for example.
> * One pointer to an instance of sp_counted_base or "control block"
> as Peter calls it below.

[Counting on my fingers...]

So now we have one extra compulsive dynamic allocation for any given
shared resource, whether the resource supports intrusive refcounting
(COM, Corba, Mozilla, in-house designed objects...) or not.

> sp_counted_base contains the two integers Peter mentions below:
>
> use_count_
> weak_count_

So now we have one weak_count_ in there, and time dedicated to it, as
Alexander Terekhov very nicely explained it (thanks Alexander, didn't
have the chance to reply to your post because the "thank you" would have
been devoid of relevant content). To quote Alexander:

"Space for extra count and time to initialize it, a branch in release(),
and extra cpunt read/check (count::may_not_store_min decrement()) for
final release()."

We also have sp_counted_base's vptr.

Good. I'm glad things are on the table.

> sp_counted_base has 2 subclasses:
>
> sp_counted_impl_p //holds pointer
> sp_counted_impl_pd //holds pointer and deleter

Yes. Oui. Igen. Ja. Da. Jag. Si. Ya.

(One of the words above does not mean "yes". Which one?)

So there is overhead to initialize the vtable pointer, and often twice,
because most compilers actually initialize it twice. (I wonder if they
optimize that away nowadays. They do have enough information because
they have access to sp_counted_base's constructors.)

>>A default-constructed shared_ptr does not allocate anything.
>>
>>A shared_ptr constructed from a pointer allocates a control block that
>>contains two pointers (one of them the vtable pointer)
>
> And the other is the original pointer passed to the shared_ptr<P>::
> shared_ptr(P*) CTOR. This assures the deleter deletes the right
> object. The vtable pointer is needed by shared_ptr to call the
> dispose method, and, of course, so shared_ptr can delete the correct
> sp_counted_base.

Excellent. So here are the compulsive virtual calls we were talking about.

>>A shared_ptr constructed from a pointer and a deleter has the same
>>costs as above plus the size of the deleter. Currently empty deleters
>>are not optimized away.
>
> In this case, the "control block" is a sp_counted_impl_pd defined
> in the same file as sp_counted_impl_p.

I believe it's doable to eliminate the overhead of empty deleters.
That's not inherent to shared_ptr's design, and I won't focus on it.

> Hope I didn't miss anything, and HTH.

It sure did! Thanks Larry. That makes everything crystal-clear.


Andrei

Peter Dimov

unread,
Mar 24, 2005, 6:56:55 AM3/24/05
to
David Abrahams wrote:
> "Peter Dimov" <pdi...@gmail.com> writes:
>
>>> In wake of the news you sent to me today, it looks like it does
>>> matter, and that the judgment call above was therefore revised.
>>
>> I have no idea what you two are talking about, sorry. ;-)
>
>
> We may have misinterpreted your post in this thread:
> http://tinyurl.com/47qao

Yes. As a side effect of the recent changes, the get_deleter helper virtual
function is no longer instantiated when a custom deleter is not used; a stub
is instantiated that just returns NULL as per the TR1 specification. This
takes care of the extra type_info instance that has been generated for
boost::checked_deleter<>. The vtable is still in place.

Peter Dimov

unread,
Mar 24, 2005, 6:56:34 AM3/24/05
to
Andrei Alexandrescu (See Website for Email) wrote:

> Peter Dimov wrote:
>> Yes, in theory. It is usually hidden (to the point of unmeasurable)
>> by the atomic decrement at one side and the delete p call at the
>> other. You may be able to demonstrate it with a benchmark if you
>> try, but I don't guarantee success.
>
> Au contraire, do you have numbers supporting your conjecture that the
> overhead is hidden to the point of unmeasurable?

If you insist...

100M "px = new int, if( --pn ) delete px" operations take 25.5-27.5 seconds
for me. 100M "px = new int, if( --pn ) virtual call to delete px" operations
also take 25.5 or 27.5 seconds, depending on which of the two is measured
first. The decrement isn't even atomic, and the runtime is single-threaded.
The allocator isn't exactly known for its speed, granted.

Why did I include the "new int" in the measurements, you'll ask? Because
it's necessarily a part of the lifecycle of a shared_ptr ownership group,
which includes construction from a new X, zero or more copies, and a
destruction. The indirections occur only once in this lifecycle.

If you think that this is not fair, the numbers without the 'new int' are:
5.5 seconds with the virtual call, 5.0 seconds without, for 25M operations
(25M because I needed to preallocate an array of 25M new ints beforehand,
and 100M was a bit much for the VMM). Making the decrement atomic adds about
0.2 seconds to each of these numbers.

So yes, the difference is measurable in a synthetic benchmark. If you are
destroying 25M shared_ptr<int> instances in a tight loop (~vector) you might
be able to gain half a second by removing the indirection. The same can also
be presented as "the indirection adds up to 10% to the cost of destroying a
shared_ptr<int>". ;-)

> One thing that makes our views of the world a tad different is that
> you only talk about the overhead of one feature when the overhead of
> all features is already there, which misses important overheads
> imposed by the design that are there. For example, given that your
> design mandates for *two* dynamic allocations for each pointee object
> makes your baseline start there; my baseline starts from the most
> lean incarnation, which is: no extra dynamic allocation.

It's difficult to make meaningful comparisons between the most lean
incarnation (boost::intrusive_ptr or an equivalent) and shared_ptr, because
they differ too much. shared_ptr belongs to a specific class: it is a
non-intrusive reference counted smart pointer.

David Abrahams

unread,
Mar 24, 2005, 6:57:31 AM3/24/05
to
David Abrahams <da...@boost-consulting.com> writes:

>> My thesis is that all of these overheads are measurable and, in some
>> cases, eminent.
>
> Agreed.

Actually I think I agreed too fast. What you say seems plausible, but
I have not got enough data to judge one way or another.

Alexander Terekhov

unread,
Mar 24, 2005, 9:23:53 AM3/24/05
to

Chris Thomasson wrote:
>
>> 1 do
>> 2 {
>> 3 p = global_ptr;
>> 4 hptr_register(0, p);
>
> store/load in step 4, after store to hptr[0].
>
>> 5 } while (p != global_ptr);

Ignoring data-dependency and branching, that store/load between (4)
and (5) doesn't affect subsequent stores caused by mutating-something
operations on pointed object (in general case, not when client's
operations on pointed object are limited to read-only accesses).

Another problem is that that constraint on load reordering preceding
validation doesn't really help when the pointed object is destroyed
and reclaimed "in between" (3) and (4), and the same address *but of
the new object*, is published in the global_ptr in the meantime
before (5). ABA, y'know. So,

T* fresh = global_ptr.load(msync::naked_competing);
do {
hptr_register(0, p = fresh);
barrier(msync::slfence);
} while (p == (fresh = global_ptr.load(msync::ddacq)));

uhmm, or rather

p = global_ptr.load(msync::naked_competing);
do {
hptr_register(0, p);
// implied msync::slfence with p passed by reference below
} while (!global_ptr.validate(p, msync::ddacq));

(in read-only case, msync::ddacq can be "degraded" to msync::ddhlb)
to save a bit on keystrokes and make it easier for implementations
on alpha-like architectures to sink "mb"/"rmb" instruction (caused
by msync::ddacq/msync::ddhlb above) out of the loop.

Oder?

regards,
alexander.

Alexander Terekhov

unread,
Mar 24, 2005, 9:32:16 AM3/24/05
to
< this message supersedes 4242946...@web.de >

Chris Thomasson wrote:
>
>> 1 do
>> 2 {
>> 3 p = global_ptr;
>> 4 hptr_register(0, p);
>
> store/load in step 4, after store to hptr[0].
>
>> 5 } while (p != global_ptr);

Ignoring data-dependency and branching, that store/load between (4)


and (5) doesn't affect subsequent stores caused by mutating-something
operations on pointed object (in general case, not when client's
operations on pointed object are limited to read-only accesses).

Another problem is that that constraint on load reordering preceding

validation doesn't really help to ensure proper visibility/ordering
(apart from correct validation) when the pointed object is destroyed
and reclaimed "in between" (3) and (4), and the same address, BUT of
the a object, is published in the global_ptr in the meantime before
(5). ABA, y'know. You really need to "separate" subsequent accesses
from validation (e.g. to ensure "correct value prediction"*** across
validation... ensuring its correctness across registration as result
of store/load fencing is not enough).

T* fresh = global_ptr.load(msync::naked_competing);
do {
hptr_register(0, p = fresh);
barrier(msync::slfence);
} while (p == (fresh = global_ptr.load(msync::ddacq)));

uhmm, or rather

p = global_ptr.load(msync::naked_competing);
do {
hptr_register(0, p);
// implied msync::slfence with p passed by reference below
} while (!global_ptr.validate(p, msync::ddacq));

(in read-only case, msync::ddacq can be "degraded" to msync::ddhlb)
to save a bit on keystrokes and make it easier for implementations
on alpha-like architectures to sink "mb"/"rmb" instruction (caused
by msync::ddacq/msync::ddhlb above) out of the loop.

Oder?

regards,
alexander.

***) http://www.cs.wisc.edu/~cain/pubs/micro01_correct_vp.pdf

Dave Harris

unread,
Mar 24, 2005, 9:52:35 AM3/24/05
to
SeeWebsit...@moderncppdesign.com (Andrei Alexandrescu (See Website
for Email)) wrote (abridged):

>> Isn't it that bit of indirection which lets shared_ptr be so free with
>> incomplete types? That is the feature which is worth paying for, in my
>> view.
>
> It is - when it's being used. The key is to be able to take it out of
> the package deal when performance matters to a portion of your code.

Sure.


> By the way, if one did not forget to define the destructor virtual in
> the base class, are we looking at two virtual calls there during
> destruction?

Yes, I think so.

It'd be nice to eliminate one of them by having the managed object inherit
from the control block (which would also avoid a memory allocation).
However, I suspect the support for weak pointers rules that out, at least
as currently implemented.

We have a cluster of features which are not very orthogonal.

-- Dave Harris, Nottingham, UK

Alexander Terekhov

unread,
Mar 24, 2005, 9:49:57 AM3/24/05
to
< gack. this message supersedes 4242A785...@web.de which
superseses 4242946...@web.de. sorry for inconvenience. >

Chris Thomasson wrote:
>
>> 1 do
>> 2 {
>> 3 p = global_ptr;
>> 4 hptr_register(0, p);
>
> store/load in step 4, after store to hptr[0].
>
>> 5 } while (p != global_ptr);

Ignoring data-dependency and branching, that store/load between (4)


and (5) doesn't affect subsequent stores caused by mutating-something
operations on pointed object (in general case, not when client's
operations on pointed object are limited to read-only accesses).

Another problem is that that constraint on load reordering preceding
validation doesn't really help to ensure proper visibility/ordering
(apart from correct validation) when the pointed object is destroyed
and reclaimed "in between" (3) and (4), and the same address, BUT of

a new object, is published in the global_ptr in the meantime before


(5). ABA, y'know. You really need to "separate" subsequent accesses
from validation (e.g. to ensure "correct value prediction"*** across
validation... ensuring its correctness across registration as result

of store/load fencing is not enough). So,

T* fresh = global_ptr.load(msync::naked_competing);
do {
hptr_register(0, p = fresh);
barrier(msync::slfence);
} while (p == (fresh = global_ptr.load(msync::ddacq)));

uhmm, or rather

p = global_ptr.load(msync::naked_competing);
do {
hptr_register(0, p);
// implied msync::slfence with p passed by reference below
} while (!global_ptr.validate(p, msync::ddacq));

(in read-only case, msync::ddacq can be "degraded" to msync::ddhlb)
to save a bit on keystrokes and make it easier for implementations
on alpha-like architectures to sink "mb"/"rmb" instruction (caused
by msync::ddacq/msync::ddhlb above) out of the loop.

Oder?

regards,
alexander.

***) http://www.cs.wisc.edu/~cain/pubs/micro01_correct_vp.pdf

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

David Abrahams

unread,
Mar 24, 2005, 9:59:00 AM3/24/05
to
"Andrei Alexandrescu (See Website for Email)" <SeeWebsit...@moderncppdesign.com> writes:

> {Would both you and David cool things a bit. Others may not understand
> and think you are starting a flame war. -mod}

To everyone else: Andrei and I are friends (at least I still think so
;->); there's no flame war going on here.

> David Abrahams wrote:
>> And it has been pointed out to you multiple times that the vtable
>> pointer enables other features besides custom deleters. Although I
>> think the exact overhead dependencies could have been spelled out more
>> clearly, I know you're smart enough to understand what's been said in
>> this thread. This just looks like willful misinterpretation on your
>> part.
>
> I'd say that's a deeply unfair manner to continue the discussion. You
> put me in the situation of either saying "I'm stupid" or "I'm
> dishonest". I've seen that in the past, it's a classic, and I don't
> think it's nice to use it.

Sorry, it wasn't meant to be offensive, and it wasn't meant as a
tactic. I was just saying that I was trying hard to figure out how
you could have been persisting along the line that prompted it and
felt like I was left no alternative interpretations at that point.
It's a good opening for you to offer another explanation as you did
below. You're hardly forced into a choice between two ugly
admissions.

> There's a third one: "Peter and I mistakenly thought that whenever
> you say 'deleter' you meant 'custom deleter', not the general design
> notion 'dynamically bound deleter'".

It's worth noting that the Boost documentation and discussions
established a meaning for the term deleter in this context -- at the
very least, in relation to the topic at hand: shared_ptr -- long ago
and I think you could take at least a little responsibility for the
misunderstanding here. What Peter and I "mistakenly thought you
meant" could just as easily be described as what you "carelessly
called a deleter."

>>>>The vtable pointer is never inspected. The other one is passed to a
>>>>virtual function in the vtable when use_count drops to zero.
>>>
>>>That's a bummer because a comparison is cheaper than two indirections.
>>
>>
>> Actually a comparison is done first:
>>
>> ~shared_count() // nothrow
>> {
>> if( pi_ != 0 ) pi_->release();
>> }
>>
>> Or do you mean that you want to inspect the "custom deleter" function
>> pointer? I wouldn't do that; it would be bad for code size.
>
> Obviously.

Not to me.

> I think it's more efficient to:
>
> * Not use a vptr, but instead a regular pointer to function
>
> * Have that pointer to function be null if the default deleter is in
> vigor for a shared_ptr object;
>
> * Test that pointer against zero first, and if so, inline the call
> "delete whatever;"

It's not more efficient if you mostly care about code size.

Furthermore, and much more importantly, you lose the benefits of what
you're calling "dynamically bound deleters" except in the case where a
custom deleter is specified.

>>>>>When is there an indirect call through the vtable?
>>>>
>>>>When the use_count drops to zero.
>>>
>>>Great. So there *is* time overhead.
>>
>> Yes, but depending on whether you take View A or P, that overhead may
>> or may not be associated with the deleter.
>
> Again, by "deleter" I don't mean "custom deleter". I mean
> "dynamically-bound deletion of the resource".

Of course I understand what you're saying; that's essentially View A.

> My viewpoint is: is there overhead compared to a design that doesn't
> dynamically bind the deleter to the pointer? The answer is "yes".

Of course.

>>>That could be optimized down to a failed comparison in
>>> the default deleter case


>>>if shared_ptr is redesigned to use a pointer to function instead of
>>>a vtable.
>>
>> I'm not sure you've thought that one through completely, or else I'm
>> misunderstanding you.
>
> I hope this post will clarify the issues. It's all about replacing the
> vptr with a pointer to a regular function, thus avoiding the second
> indirection. It's a simple redesign or refactoring if you wish. A clear
> example of that design is in my Variant implementation.

I'm familiar with the technique; Boost.Function has been doing that
since its inception. I still don't think you've completely thought
through the potential optimization here. I'm thinking of code size
and loss of the benefits of binding deletion at the moment of
construction.

>> It's easy to imagine a design where you only pay for the
>> vtable when a custom deleter is used, but where you don't get the
>> benefits of those other two features.
>
> Hmmm... how would that design look like? At the minimum, you must store
> a bit somewhere "the default deleter is/isn't to be used".

I didn't say there was no cost whatsoever in that case; I just said
there's no vtable. But if we must, let's steal a bit from the
reference count ;-)

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Peter Dimov

unread,
Mar 24, 2005, 10:00:32 AM3/24/05
to

Allocated and deleted stuff in a loop. Go L1 cache! I'll leave it to others
with a vested interest to demonstrate the superiority of a pointer to
function in a realistic situation. Good luck.

Peter Dimov

unread,
Mar 24, 2005, 7:58:08 PM3/24/05
to
Andrei Alexandrescu (See Website for Email) wrote:
> Larry Evans wrote:
>> Looked at the code and thought more detail might clarify things a
>> bit.
>
> Awesome! This is the kind of transparency that's beneficial. I
> understand now how "fixed" is self-consistent, but that's not as
> beneficial.
>
>> Here's the detail:
>>
>> The two pointers in sizeof(shared_ptr) are:
>>
>> * One pointer to the pointee, P, after possibly any conversion by
>> shared_ptr<Q> copy CTOR's to some supertype, Q, for example.
>> * One pointer to an instance of sp_counted_base or "control block"
>> as Peter calls it below.

The "control block" terminology is due to Pete Becker.

> [Counting on my fingers...]
>
> So now we have one extra compulsive dynamic allocation for any given
> shared resource, whether the resource supports intrusive refcounting
> (COM, Corba, Mozilla, in-house designed objects...) or not.

Nope, you have one extra compulsive dynamic allocation for any given shared
resource managed by a shared_ptr. Obviously if you don't need a shared_ptr
(or a weak_ptr) to that resource, you are free to use a different smart
pointer, like boost::intrusive_ptr.

Larry Evans

unread,
Mar 24, 2005, 7:57:46 PM3/24/05
to
On 03/23/2005 05:09 PM, Larry Evans wrote:
The following includes a little *more* detail which
maybe helps more.
[snip]

> The two pointers in sizeof(shared_ptr) are:

Actually, to be more specific:

The two pointers insizeof(shared_ptr<T>) are:

>
> * One pointer to the pointee, P, after possibly any conversion by
> shared_ptr<Q> copy CTOR's to some supertype, Q, for example.

T * px; // contained pointer

> * One pointer to an instance of sp_counted_base or "control block"
> as Peter calls it below.

detail::shared_count pn; // reference counter

and shared_count just contains:

sp_counted_base * pi_;

[snip]


> sp_counted_base has 2 subclasses:
>
> sp_counted_impl_p //holds pointer
> sp_counted_impl_pd //holds pointer and deleter
>

Actually, this is more accurate:

sp_counted_base has 2 *families* subclasses formed from templates:

sp_counted_impl_p <X> //holds pointer
sp_counted_impl_pd<X,D> //holds pointer and deleter

where X is the type used in the:

template<class X>
shared_ptr<X>::shared_ptr(X*);

and

template<class X,class D>
shared_ptr<X>::shared_ptr(X*, D);

CTORS. The X template parameter is what enables the
sp_counted_impl_p* to "do the right thing" with respect to
deleting the pointee.

[snip]


>>A shared_ptr constructed from a pointer allocates a control block that
>>contains two pointers (one of them the vtable pointer)
>
> And the other is the original pointer passed to the shared_ptr<P>::

> shared_ptr(P*) CTOR:

And the other is the original pointer passed to the shared_ptr<X>::
shared_ptr(X*) CTOR:

The actual declaration in:

http://cvs.sourceforge.net/viewcvs.py/boost/boost/boost/detail/sp_counted_impl.hpp?rev=1.2&view=auto

is:

template<class X> class sp_counted_impl_p: public sp_counted_base
{
private:

X * px_;
...
};

[snip]


>
>>A shared_ptr constructed from a pointer and a deleter has the same
>>costs as above plus the size of the deleter.
>

> In this case, the "control block" is a sp_counted_impl_pd defined
> in the same file as sp_counted_impl_p.
>

The relevant part of sp_counted_impl_pd is:

template<class P, class D> class sp_counted_impl_pd
: public sp_counted_base
{
private:

P ptr; // copy constructor must not throw
D del; // copy constructor must not throw

...
public:
...
sp_counted_impl_pd( P p, D d ): ptr(p), del(d)
...
};


The use of P instead of P* in the above may seem non-intuitive,
but that's the way it is. It actually resolves to an X* where
X is the template arg to shared_ptr, which makes the comment
after 'P ptr' mysterious also. Peter, could you explain
the rationale behind using P instead of P*?

Andrei Alexandrescu (See Website for Email)

unread,
Mar 24, 2005, 8:06:18 PM3/24/05
to

Of course. I was just replying to state clearly that your numbers are
irrelevant. So will the jury please disregard those numbers :o).

Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Mar 24, 2005, 8:04:39 PM3/24/05
to

As I pointed out in my previous post, these numbers are irrelevant.

Granted, it's hard to come up with salient numbers. These really offer
about zero information. Dave Held and I in our tests on smart pointers
(C++ Users Journal, April 2004) tried at least to fragment the memory
before testing anything, so we give the allocator a run for its money.

Incidentally, I've just been teaching assistant for an architecture
class (http://www.cs.washington.edu/education/courses/cse378/05wi/),
with which occasion I got up to snuff with the latest and greatest in
terms of speculative execution, branch prediction, caching etc. I can
say that virtual calls in a loop will be an easy lunch for any of
today's processors. The thing is, in a real application things don't
happen that way; many other activites happen intermingled with the
smart-ptr related stuff, and that will trip the cache and the branch
predictor.

>>One thing that makes our views of the world a tad different is that
>>you only talk about the overhead of one feature when the overhead of
>>all features is already there, which misses important overheads
>>imposed by the design that are there. For example, given that your
>>design mandates for *two* dynamic allocations for each pointee object
>>makes your baseline start there; my baseline starts from the most
>>lean incarnation, which is: no extra dynamic allocation.
>
> It's difficult to make meaningful comparisons between the most lean
> incarnation (boost::intrusive_ptr or an equivalent) and shared_ptr, because
> they differ too much. shared_ptr belongs to a specific class: it is a
> non-intrusive reference counted smart pointer.

I guess I'd say "my point exactly!" Hm, I'd actually add: "...with
dynamically-bound deletion". :o)


Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Mar 25, 2005, 5:59:53 AM3/25/05
to
David Abrahams wrote:
>>There's a third one: "Peter and I mistakenly thought that whenever
>>you say 'deleter' you meant 'custom deleter', not the general design
>>notion 'dynamically bound deleter'".
>
> It's worth noting that the Boost documentation and discussions
> established a meaning for the term deleter in this context -- at the
> very least, in relation to the topic at hand: shared_ptr -- long ago
> and I think you could take at least a little responsibility for the
> misunderstanding here. What Peter and I "mistakenly thought you
> meant" could just as easily be described as what you "carelessly
> called a deleter."

Upon thinking about it, now I understand how my misuse of "deleter"
(when probably the closest term that I should have used was "deleTION
function or functor") engendered the whole confusion that we needed to
spend some dozen posts to clarify. For that, Dave and Peter, please
accept my public apologies.

Now I understand how Peter's "Fixed" really mean: "You must be referring
to an older implementation of shared_ptr, which had some slack data even
for an empty deleter. That was fixed." (For the record, I don't stay to
date with shared_ptr's implementation; I only focus on its design
decisions and their consequences.)

>>>It's easy to imagine a design where you only pay for the
>>>vtable when a custom deleter is used, but where you don't get the
>>>benefits of those other two features.
>>
>>Hmmm... how would that design look like? At the minimum, you must store
>>a bit somewhere "the default deleter is/isn't to be used".
>
>
> I didn't say there was no cost whatsoever in that case; I just said
> there's no vtable. But if we must, let's steal a bit from the
> reference count ;-)

That's more time to manipulate :o).

Ok, so sorry again for the long detour in aligning our terminologies.
Allow me to summarize below the disadvantages of shared_ptr. Its
advantages have been mentioned here and elsewhere.

* sizeof is double compared to a regular pointer. The extra pointer
points to a control block.

* There is one extra dynamic allocation hit per shared object (hopefully
that could be avoided as per the idea I mentioned in another thread,
idea that I see wasn't foreign to Dave Harris. He, however, opined that
weak_ptr support rules that optimization out.)

* Users cannot substitute a custom dynamic memory allocator for the
control block, even if they have one specializing in small objects of a
specific size.

* There is no way to exploit a platform-specific, framework-specific, or
application-specific (COM, CORBA, Mozilla, etc.) intrusive reference
counting, even when that exists.

* The control block maintains one pointer that caters for deletion of
the shared object. "Maintains" implies in this case two assignments
(could be reduced to one if the compiler's smart) during construction,
and one more assignment during destruction (could be reduced to zero if
the compiler's smart).

* The deletion of the shared resource costs one extra virtual call into
a vtable that's distinct from the managed object's vtable (cache miss
opportunity), in addition to the managed object's (often virtual)
destructor call. It also costs one more dynamic memory deallocation
corresponding to the extra memory allocation mentioned above.

* The control block maintains (= spends space and time on) one integer
that caters for weak pointers, even in designs that don't use weak
pointers (most).

* The code performs atomic increment and decrement of the reference
count. That uses the Windows Interlocked*** API on Windows and currently
a full-blown mutex (gasp!) on the other systems. The atomic operations
can be replaced with unsynchronized operations on a per-build basis. On
systems that do support some system-specific way of doing atomic
increment, there is no way for users to plug in custom code for the
atomic operations.

* There is no checking for null upon dereference, or there is checking
on a per-build basis.

Anything I forgot?


Andrei

Paavo Helde

unread,
Mar 25, 2005, 6:00:20 AM3/25/05
to
David Abrahams <da...@boost-consulting.com> wrote in
news:u64zit...@boost-consulting.com:

> A not-uncommon problem for windows programmers is that separate
> dynamic libraries (or a dynamic library and the application) are
> linked statically to the runtime. Each static runtime gets its own
> heap. If you allocate memory from one heap, then pass a pointer
> across the dynamic library boundary, and deallocate in the other heap,
> you corrupt memory.

This problem can be partially fixed by using DLL runtimes. However, there
are at least 2 DLL runtimes - Debug and Release build. If you want to mix
the DLL-s using different builds, then the problem reappears.

That's one of the reasons why in my projects allocated memory chunks always
move only together with a pointer to the releasing function. I could have
used Boost smartpointers, but I was also afraid of locking overhead - my
objects (i.e. smartpointers to them) pass thread boundaries only at
dedicated delivery points and do not need the thread-locking at all.

Paavo

Peter Dimov

unread,
Mar 25, 2005, 6:10:08 AM3/25/05
to
Andrei Alexandrescu (See Website for Email) wrote:
> Peter Dimov wrote:
>> Allocated and deleted stuff in a loop. Go L1 cache! I'll leave it to
>> others with a vested interest to demonstrate the superiority of a
>> pointer to function in a realistic situation. Good luck.
>
> Of course. I was just replying to state clearly that your numbers are
> irrelevant. So will the jury please disregard those numbers :o).

Eh no, it doesn't work that way. In the absence of any other numbers, an
absence that you could easily have corrected, these are our best bet. Just
because another test _might have given_ different results doesn't mean that
it will actually give different results.

In short, facts are never irrelevant.

Peter Dimov

unread,
Mar 25, 2005, 6:10:51 AM3/25/05
to
Larry Evans wrote:
>
> The relevant part of sp_counted_impl_pd is:
>
> template<class P, class D> class sp_counted_impl_pd
>> public sp_counted_base
> {
> private:
>
> P ptr; // copy constructor must not throw
> D del; // copy constructor must not throw
>
> ...
> public:
> ...
> sp_counted_impl_pd( P p, D d ): ptr(p), del(d)
> ...
> };
>
>
> The use of P instead of P* in the above may seem non-intuitive,
> but that's the way it is. It actually resolves to an X* where
> X is the template arg to shared_ptr, which makes the comment
> after 'P ptr' mysterious also. Peter, could you explain
> the rationale behind using P instead of P*?

This is going a bit off-topic, but the reason is that detail::shared_count
is not limited to pointers, it can reference count any resource. Nothing in
Boost makes use of this capability, though.

Peter Dimov

unread,
Mar 25, 2005, 6:11:34 AM3/25/05
to
Andrei Alexandrescu (See Website for Email) wrote:
>
> As I pointed out in my previous post, these numbers are irrelevant.
>
> Granted, it's hard to come up with salient numbers. These really offer
> about zero information. Dave Held and I in our tests on smart pointers
> (C++ Users Journal, April 2004) tried at least to fragment the memory
> before testing anything, so we give the allocator a run for its money.

I disagree. I can extract enough useful information from these numbers. I
will also be able to extract useful information from numbers obtained in a
different way.

Fragmenting the memory and giving the allocator a hard time was not my
intent, because this would hide the virtual call overhead. I deliberately
used a small object with a trivial destructor. (The small object allocator
is usually non-fragmentable, BTW; I'm surprised that your results showed
otherwise. Are your tests available?)

> Incidentally, I've just been teaching assistant for an architecture
> class (http://www.cs.washington.edu/education/courses/cse378/05wi/),
> with which occasion I got up to snuff with the latest and greatest in
> terms of speculative execution, branch prediction, caching etc. I can
> say that virtual calls in a loop will be an easy lunch for any of
> today's processors. The thing is, in a real application things don't
> happen that way; many other activites happen intermingled with the
> smart-ptr related stuff, and that will trip the cache and the branch
> predictor.

I know fairly well that benchmarks don't represent what happens in a real
application, and I've been saying that for years. The difference is that you
seem to think that in real applications the overhead is more visible. This
hasn't been the case in any of the real applications I've seen.

Emil

unread,
Mar 25, 2005, 6:11:13 AM3/25/05
to
I think this kind of discussion will not lead anywhere. I think the
first thing that needs to be established when there is a conflict of
any sort, is what is the problem.

Saying that shared_ptr has too much overhead is bogus. Too much
overhead compared to what? To my knowledge, shared_ptr is unique amoung
the smart pointers in that it decouples the deletion of the object from
the type of the smart pointer. I am not aware of another smart pointer
framework that supports this; therefore I tend to accept that
shared_ptr has exactly the amount of overhead necessary to implement
this feature.

So, if the problem is still "overhead", we must first find a base to
compare against, e.g. an alternative design or implementation of this
special shared_ptr feature.

If the problem is not overhead, then I think step 1 is to spell it out.

Emil

Andrei Alexandrescu (See Website For Email)

unread,
Mar 25, 2005, 6:12:16 AM3/25/05
to
Peter Dimov wrote:
>>So now we have one extra compulsive dynamic allocation for any given
>>shared resource, whether the resource supports intrusive refcounting
>>(COM, Corba, Mozilla, in-house designed objects...) or not.
>
>
> Nope, you have one extra compulsive dynamic allocation for any given shared
> resource managed by a shared_ptr. Obviously if you don't need a shared_ptr
> (or a weak_ptr) to that resource, you are free to use a different smart
> pointer, like boost::intrusive_ptr.

Ok ok ok, I considered that implicit because I was talking about
shared_ptr's overheads, and it would be unreasonable even for someone
like me to claim that shared_ptr has an overhead even when not used :o).

By the way, what's an approximate tally of the number of smart pointers
in boost?


Andrei

Peter Dimov

unread,
Mar 25, 2005, 11:38:34 PM3/25/05
to
Andrei Alexandrescu (See Website For Email) wrote:
> Peter Dimov wrote:
>>> So now we have one extra compulsive dynamic allocation for any given
>>> shared resource, whether the resource supports intrusive refcounting
>>> (COM, Corba, Mozilla, in-house designed objects...) or not.
>>
>> Nope, you have one extra compulsive dynamic allocation for any given
>> shared resource managed by a shared_ptr. Obviously if you don't need
>> a shared_ptr (or a weak_ptr) to that resource, you are free to use a
>> different smart pointer, like boost::intrusive_ptr.
>
> Ok ok ok, I considered that implicit because I was talking about
> shared_ptr's overheads, and it would be unreasonable even for someone
> like me to claim that shared_ptr has an overhead even when not used
> :o).

You've come pretty close to that, in my opinion.

It is, ironically, the versatility of shared_ptr that even allows you to
make the claim above that it forces an extra allocation in situations where
intrusive counting could have been used. Most other non-intrusive smart
pointers don't support such objects at all, so naturally they are immune to
the claim that they force an extra dynamic allocation in this case. So yeah,
compared to something that doesn't work, shared_ptr has extra overhead.

Intrsuive vs non-intrusive counting is _the_ axis that separates the two
major classes of smart pointers. Comparisons between particular instances,
one of each side, aren't very useful, except if you are really desperate.

It'd be more interesting if you pick a non-intrusive pointer to compare
against. Since you've focused on the evilness of the extra allocation,
reference linked would be your best bet, and I guess you'd also like for us
to constrain ourselves to single-threaded cases. ;-)

So:

++ no extra dynamic allocation
-? pointer operations instead of integer increment/decrement on
copy/assign/destroy, L1 cache miss opportunity
-- *3 instead of *2 per instance
-- lack of most shared_ptr features, except
=? ability to use a statically bound, type encoded, stateless deleter
++ deleter can be inlined

> By the way, what's an approximate tally of the number of smart
> pointers in boost?

Six, I think. I only use three of them, but the rest do have their fans.
Seven, if you count std::auto_ptr among them, four, if you subscribe to the
notion that the array variants aren't separate pointers (the language now
gives us the tools to fold them into their non-array counterparts). A
reasonably complete family would include:

unique_ptr<X, D> // subsumes scoped_ptr and auto_ptr
shared_ptr<X>
weak_ptr<X>
intrusive_ptr<X>

David Abrahams

unread,
Mar 25, 2005, 11:43:31 PM3/25/05
to
"Andrei Alexandrescu (See Website for Email)" <SeeWebsit...@moderncppdesign.com> writes:

>> I didn't say there was no cost whatsoever in that case; I just said
>> there's no vtable. But if we must, let's steal a bit from the
>> reference count ;-)
>
> That's more time to manipulate :o).
>
> Ok, so sorry again for the long detour in aligning our terminologies.
> Allow me to summarize below the disadvantages of shared_ptr.

With respect to...?

....it seems to be intrusive_ptr you're measuring against.
....later you


> Its
> advantages have been mentioned here and elsewhere.
>
> * sizeof is double compared to a regular pointer. The extra pointer
> points to a control block.
>
> * There is one extra dynamic allocation hit per shared object (hopefully
> that could be avoided as per the idea I mentioned in another thread,
> idea that I see wasn't foreign to Dave Harris. He, however, opined that
> weak_ptr support rules that optimization out.)

We have been down that road; see other posts.

> * Users cannot substitute a custom dynamic memory allocator for the
> control block, even if they have one specializing in small objects of a
> specific size.

I think you mean that users can't provide a control-block allocator.
What you wrote sounds like you want to be able to replace ("substitute
for") the control block with an allocator.

> * There is no way to exploit a platform-specific, framework-specific, or
> application-specific (COM, CORBA, Mozilla, etc.) intrusive reference
> counting, even when that exists.

If by "exploit" you mean "use instead of paying for the shared_ptr's
reference count," that's true. You can, of course, always use a
deleter whose only job is to decrement the application-specific count.

> * The control block maintains one pointer that caters for deletion of
> the shared object. "Maintains" implies in this case two assignments
> (could be reduced to one if the compiler's smart) during
> construction,

I think you are talking about the vtable pointer initialization, but
it's not entirely clear.

> and one more assignment during destruction (could be reduced to zero if
> the compiler's smart).

You can also reduce it to one function pointer by using the
Boost.Function's "virtual-function-bloat-combatting trick."

> * The deletion of the shared resource costs one extra virtual call into
> a vtable that's distinct from the managed object's vtable (cache miss
> opportunity), in addition to the managed object's (often virtual)
> destructor call. It also costs one more dynamic memory deallocation
> corresponding to the extra memory allocation mentioned above.

Unless there are weak_ptrs hanging around, in which case that extra
deletion exists but is delayed.

> * The control block maintains (= spends space and time on) one integer
> that caters for weak pointers, even in designs that don't use weak
> pointers (most).

Yes. It's worth noting _where_ time is spent when no weak pointers
are used: upon construction and destruction.

> * The code performs atomic increment and decrement of the reference
> count. That uses the Windows Interlocked*** API on Windows and currently
> a full-blown mutex (gasp!) on the other systems.

It would probably be more appropriate to leave out the drama and just
discuss the costs.

> The atomic operations can be replaced with unsynchronized operations
> on a per-build basis. On systems that do support some
> system-specific way of doing atomic increment, there is no way for
> users to plug in custom code for the atomic operations.

I don't understand what you're trying to communicate with the last
sentence.

> * There is no checking for null upon dereference, or there is checking
> on a per-build basis.

Now you seem to be comparing with some idealized policy-based pointer.

> Anything I forgot?

Well, yeah, if you're comparing against the ultimate configurable
smart pointer that has no intrinsic overheads:

* You pay for a reference count (policy_ptr can emulate scoped_ptr,
right?)

* You pay for an internal pointer to the resource. I can imagine that
for some generic code a "smart" pointer that's always 0 can be
optimized away with the empty base class optimization might be
useful.

* You pay to store source the code, understand the design, compile the
headers, and pick your way through the squabbling among experts over
what the costs are ;-)

Now I'm just being facetious with the last bullet, of course, but I
think these three make a valid point. When the ultimate configurable
smart pointer has the same advantages as shared_ptr, it has (at least)
the same intrinsic design costs. It's a little strange to call these
costs "overhead" without noting that fact. I understand "overhead" to
apply to a method of accomplishing some thing X, where you can measure
against some ideal method of accomplishing the same said thing T. For
example, the mutex-based way of implementing shared_ptr has measurable
overhead when compared to the lock-free way. But you haven't said
the pointer you're measuring against accomplishes.

Carried to an extreme, this is a (not very well-designed) policy-based
pointer with almost no costs:

struct dummy {};
template <class T = dummy, class Policy = void> struct policy_ptr
{};

It also has almost no capabilities. The user can now come along and
configure it to her heart's content!

template <class T>
struct policy_ptr<T, my_policy>
{
// ...
};

I think it's a great idea to have a (very well-designed) policy based
pointer in Boost, and maybe even in the standard. I also think
comparing its time/space costs with the costs of shared_ptr is an
exercise in silliness unless we have to choose between them -- which
we don't. And if we did have to choose just one, we'd also want to
factor in other costs like understandability, interoperability of code
using the component, and others on which policy_ptr does not fare as
well as shared_ptr.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Peter Dimov

unread,
Mar 25, 2005, 11:46:09 PM3/25/05
to
Andrei Alexandrescu (See Website for Email) wrote:
> Allow me to summarize below the disadvantages of shared_ptr. Its
> advantages have been mentioned here and elsewhere. [...]

Your list is factually correct, except for some minor nits. But you are
leaving out some interesting details.

Smart pointer operations belong to three major categories, performance wise:

1. access (operator* and friends)
2. copy/assign/destroy
3. initial construction/final destruction

These are ordered by their relative impact on performance. In your list you
neglect to mention the category of every overhead source. This is not
surprising since most of the bullets are (3), with one notable (2)
exception. You say

> * There is no checking for null upon dereference, or there is checking
> on a per-build basis.

without mentioning that this is a category 1 operation that can easily cause
a measurable slowdown in a real application. Even a call to an empty
function at this performance sensitive point may prove problematic if not
optimized out.

> * There is no way to exploit a platform-specific, framework-specific,
> or application-specific (COM, CORBA, Mozilla, etc.) intrusive
> reference counting, even when that exists.

This we already went over. A non-intrusive smart pointer A generated by
policy generator G will not exploit the above either. Another smart pointer
B generated by the same generator might. If there's a difference between
this example and replacing shared_ptr with intrusive_ptr, I don't see it.
You do realize that it's trivial to provide a generator such that sp<T,
shared> and sp<T, intrusive> map to shared_ptr and intrusive_ptr, right?
This doesn't change the underlying pointers. It's just syntactic sugar.

> * Users cannot substitute a custom dynamic memory allocator for the
> control block, even if they have one specializing in small objects of
> a specific size.

This feature is not precluded by the design (which in fact supports
per-count allocators because of the dynamic dispatch), but you are right
that it is not provided by the current Boost implementation. There's an
undocumented #define that replaces the allocator on a per-project basis.

Andrei Alexandrescu (See Website for Email)

unread,
Mar 26, 2005, 4:49:25 AM3/26/05
to
Emil wrote:
> I think this kind of discussion will not lead anywhere. I think the
> first thing that needs to be established when there is a conflict of
> any sort, is what is the problem.
>
> Saying that shared_ptr has too much overhead is bogus. Too much
> overhead compared to what? To my knowledge, shared_ptr is unique amoung
> the smart pointers in that it decouples the deletion of the object from
> the type of the smart pointer. I am not aware of another smart pointer
> framework that supports this; therefore I tend to accept that
> shared_ptr has exactly the amount of overhead necessary to implement
> this feature.
>
> So, if the problem is still "overhead", we must first find a base to
> compare against, e.g. an alternative design or implementation of this
> special shared_ptr feature.
>
> If the problem is not overhead, then I think step 1 is to spell it out.

Excellent point! Here's my view of the problem.

Shared_ptr (without weak pointers) is in tr1, and will possibly make it
in the standard. If no other pointer will be proposed, and if we ignore
auto_ptr as not being really brainy, shared_ptr will be the *only*
choice of a smart pointer ever offered by C++0X. Users who want a smart
pointer will either have to use shared_ptr, or get or create some
nonstandard smart pointer.

Now, IMHO smart pointers are a performance-sensitive device because they
can be very pervasive and they compete with raw pointers and manual
memory management, wihich are tedious but fast.

That's why I think offering only shared_ptr is not a desirable state of
affairs. What I am advocating is the adoption (by boost and by the
standard) of a policy-based framework that (1) offers programmers a
broad range of canned designs obtained by mixing and matching simple
policies, (2) offers support for extensibility, giving users the option
of plugging their own simple policies that are system-specific (such as
a system-specific atomic operations, detail that is very
performance-sensitive).

So really I'm not debating whether or not shared_ptr is a good
implementation within the confines of its design choices. It's a fine
implementation. The whole discussion started from my comment:

"Anyhow, for others the same preference [of using a policy-based smart
pointer instead of shared_ptr] is determined by shared_ptr's compulsory
presence of the deleter, whether or not you have a use for it; for
others it's the compulsory weak_ptr support, whether or not you need
that, etc. "

which, after you adjust for my misleading use of "deleter", reveals that
I was really referring to paying for shared_ptr design choices, not its
implementation's incremental drawbacks, which I am confident can be
reduced to zero.

A policy-based smart pointer is being proposed for boost and is next in
the review chain. One of the designs supported by policy_ptr is exactly
that of shared_ptr, in addition to incomparable flexibility and
extensibility. I believe it would be very good if a policy-based pointer
makes it into the standard, and credit is definitely not what I'm afetr
- in fact I don't deserve it, because policy_ptr as written by Dave Held
with the help of Jonathan Turkanis is miles ahead from my initial smart_ptr.

Part of making people think about, and understand, the benefits of
policy-based smart pointers is an understanding of the costs of
shared_ptr's design, something that I believe has not been discussed
enough in the past. A policy-based smart pointer in the hands of an
application architect offers much better adjustment of design to need,
and as an added benefit, excellent performance.


Andrei

Carl Barron

unread,
Mar 26, 2005, 12:34:58 PM3/26/05
to
In article <42437322...@moderncppdesign.com>, See Website For
Email <SeeWebsit...@moderncppdesign.com> wrote:

> By the way, what's an approximate tally of the number of smart pointers
> in boost?

Independent smart ptrs include:
intrusive_ptr,
shared_ptr,
shared_array,
scoped_ptr,
scoped_array, and prehaps some I missed.

also weak_ptr that requres a shared_ptr to exist.

Andrei Alexandrescu (See Website for Email)

unread,
Mar 26, 2005, 12:37:38 PM3/26/05
to
David Abrahams wrote:
>>* The code performs atomic increment and decrement of the reference
>>count. That uses the Windows Interlocked*** API on Windows and currently
>>a full-blown mutex (gasp!) on the other systems.
>
>
> It would probably be more appropriate to leave out the drama and just
> discuss the costs.

Sigh, and I'd onlud started to enjoy it :o)

>>The atomic operations can be replaced with unsynchronized operations
>>on a per-build basis. On systems that do support some
>>system-specific way of doing atomic increment, there is no way for
>>users to plug in custom code for the atomic operations.
>
> I don't understand what you're trying to communicate with the last
> sentence.

If I work with shared_ptr on a platform that has a specific efficient
way of doing atomic, there's no way for the user to have shared_ptr use
that, barring surgery on shared_ptr's source.

>>* There is no checking for null upon dereference, or there is checking
>>on a per-build basis.
>
>
> Now you seem to be comparing with some idealized policy-based pointer.

Nope, with a realistic one :o).

> Now I'm just being facetious with the last bullet, of course, but I
> think these three make a valid point. When the ultimate configurable
> smart pointer has the same advantages as shared_ptr, it has (at least)
> the same intrinsic design costs. It's a little strange to call these
> costs "overhead" without noting that fact. I understand "overhead" to
> apply to a method of accomplishing some thing X, where you can measure
> against some ideal method of accomplishing the same said thing T. For
> example, the mutex-based way of implementing shared_ptr has measurable
> overhead when compared to the lock-free way. But you haven't said
> the pointer you're measuring against accomplishes.

My point was that, if shared_ptr is all the standard offers, people will
only have the choice of using shared_ptr (and pay for all it costs) no
matter what their design asks for, or use some nonstandard smart pointer
(or use raw pointers). That's why I'm comparing shared_ptr with other
designs.

> Carried to an extreme, this is a (not very well-designed) policy-based
> pointer with almost no costs:
>
> struct dummy {};
> template <class T = dummy, class Policy = void> struct policy_ptr
> {};
>
> It also has almost no capabilities. The user can now come along and
> configure it to her heart's content!
>
> template <class T>
> struct policy_ptr<T, my_policy>
> {
> // ...
> };

(Scribbling down in my notepad...) Ok, noted. :o)

> I think it's a great idea to have a (very well-designed) policy based
> pointer in Boost, and maybe even in the standard. I also think
> comparing its time/space costs with the costs of shared_ptr is an
> exercise in silliness unless we have to choose between them -- which
> we don't. And if we did have to choose just one, we'd also want to
> factor in other costs like understandability, interoperability of code
> using the component, and others on which policy_ptr does not fare as
> well as shared_ptr.

I think it's a good thing because at a point people might have to
choose, in the following sense: "Do we need a policy_ptr, or is
shared_ptr good for everybody?". Given that indeed policy_ptr is more
complex, its advantages, which are user-configurabilty and efficiency,
must be understood.


Andrei

Andrei Alexandrescu (See Website for Email)

unread,
Mar 26, 2005, 5:31:49 PM3/26/05
to
and was being colonized by the French, and another came from Angola and
was being colonized by the Portuguese. When they came to the Bandung
conference, they looked at the Portuguese, and at the Frenchman, and at
the Englishman, and at the other -- Dutchman -- and learned or realized
that the one thing that all of them had in common: they were all from
Europe, they were all Europeans, blond, blue-eyed and white-skinned.
They began to recognize who their enemy was. The same man that was
colonizing our people in Kenya was colonizing our people in the Congo.
The same one in the Congo was colonizing our people in South Africa, and
in Southern Rhodesia, and in Burma, and in India, and in Afghanistan,
and in Pakistan. They realized all over the world where the dark man was
being oppressed, he was being oppressed by the white man; where the dark
man was being exploited, he was being exploited by the white man. So
they got together under this basis -- that they had a common enemy.

And when you and I here in Detroit and in Michigan and in America who
have been awakened today look around us, we too realize here in America
we all have a common enemy, whether he's in Georgia or Michigan, whether
he's in California or New York. He's the same man: blue eyes and blond
hair and pale skin -- same man. So what we have to do is what they did.
They agreed to stop quarreling among themselves. Any little spat that
they had, they'd settle it among themselves, go into a huddle -- don't
let the enemy know that you got [sic] a disagreement.

Instead of us airing our differences in public, we have to realize we're
all the same family. And when you have a family squabble, you don't get
out on the sidewalk. If you do, everybody calls you uncouth, unrefined,
uncivilized, savage. If you don't make it at home, you settle it at
home; you get in the closet -- argue it out behind closed doors. And
then when you come out on the street, you pose a common front, a united
front. And this is


Carl Barron

unread,
Mar 26, 2005, 6:24:12 PM3/26/05
to
that we have to
get together and remove the evils, the vices, alcoholism, drug
addiction, and other evils that are destroying the moral fiber of our
community. We our selves have to lift the level of our community, the
standard of our community to a higher level, make our own society
beautiful so that we will be satisfied in our own social circles and
won't be running around here trying to knock our way into a social
circle where we're not wanted. So I say, in spreading a gospel such as
black nationalism, it is not designed to make the black man re-evaluate
the white man -- you know him already -- but to make the black man
re-evaluate himself. Don't change the white man's mind -- you can't
change his mind, and that whole thing about appealing to the moral
conscience of America -- America's conscience is bankrupt. She lost all
conscience a long time ago. Uncle Sam has no conscience.

They don't know what morals are. They don't try and eliminate an evil
because it's evil, or because it's illegal, or b


David Abrahams

unread,
Mar 27, 2005, 5:21:38 AM3/27/05
to
"Andrei Alexandrescu (See Website for Email)" <SeeWebsit...@moderncppdesign.com> writes:

>> I think it's a great idea to have a (very well-designed) policy based
>> pointer in Boost, and maybe even in the standard. I also think
>> comparing its time/space costs with the costs of shared_ptr is an
>> exercise in silliness unless we have to choose between them -- which
>> we don't. And if we did have to choose just one, we'd also want to
>> factor in other costs like understandability, interoperability of code
>> using the component, and others on which policy_ptr does not fare as
>> well as shared_ptr.
>
> I think it's a good thing because at a point people might have to
> choose, in the following sense: "Do we need a policy_ptr, or is
> shared_ptr good for everybody?".

I think the question in fact should be "is it good enough for enough
people?" There are some components that are only really needed for a
user group that is so small that it doesn't make sense to standardize
them. If the answer is that it isn't good enough for enough people,
the next question should be "what do we need to put in the standard to
fix that?" Considering policy_ptr would of course be appropriate at
that point.

Anyway I hope that we are indeed faced with that question soon. But
that isn't the same as asking "which one should we choose?" I
maintain that comparing policy_ptr with shared_ptr is really a waste
of time, besides being meaningless as I was trying to demonstrate with
my empty policy_ptr "implementation". All you need to do in order to
move past the first question above is demonstrate that shared_ptr is
probably unacceptable for a substantial fraction of smart pointer
applications.

> Given that indeed policy_ptr is more complex, its advantages, which
> are user-configurabilty and efficiency, must be understood.

User-configurability is an advantage. Efficiency is not, in any
meaningful sense. policy_ptr doesn't have any inherent efficiency;
its efficiency depends almost entirely on how it is configured.
_That_ might be considered an advantage, though.

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Peter Dimov

unread,
Mar 27, 2005, 9:33:30 AM3/27/05
to
"Andrei Alexandrescu (See Website for Email)" <SeeWebsit...@moderncppdesign.com> wrote in message news:<4244F647...@moderncppdesign.com>...

> If I work with shared_ptr on a platform that has a specific efficient
> way of doing atomic,

... and the shared_ptr implementation for that platform does not use
these atomic primitives, ...

> there's no way for the user to have shared_ptr use that, barring surgery
> on shared_ptr's source.

... which, once performed, can be contributed back to
boost::shared_ptr, so that the other users of the platform do not need
to reinvent the wheel.

Right?

Jeremy J

unread,
Mar 27, 2005, 8:48:04 PM3/27/05
to
> I think the question in fact should be "is it good enough for enough
> people?"

Good enough is very hard to put your finger on. I can assure you it
(shared_ptr) is not good enough for me, but that is a useless data
(see below).

> Anyway I hope that we are indeed faced with that question soon. But
> that isn't the same as asking "which one should we choose?" I
> maintain that comparing policy_ptr with shared_ptr is really a waste
> of time, besides being meaningless as I was trying to demonstrate with
> my empty policy_ptr "implementation". All you need to do in order to
> move past the first question above is demonstrate that shared_ptr is
> probably unacceptable for a substantial fraction of smart pointer
> applications.

I maintain that C++ is a general purpose programming language and it
is unacceptable to 'anticipate' the quality of a library component
based on how many people 'might' need it. If you are trying to fill
some design hole, or provide a specific component to help implement
specific idioms, then the success of your attempt is eay to guage. If
you want to please a 'substantial' number of programmers, then turn
back now, because there are few metrics in the world which would
confirm your 'success'.

From what I can tell boost::shared_ptr is trying to be a thread safe
reference counted pointer, which it seems to do very well. I have no
(very little) use for such pointers. A policy_ptr is trying to be a
(user configurable threading), (user configurable ownership), (user
confugurable etc...) pointer, I probably will find a few uses for such
pointers, but that is my buisiness, and does not reflect the quality
of the pointers. Each fulfills some role in a design, only policy_ptr
happens to be able to fulfill more roles, at the cost of simplicity.

> User-configurability is an advantage. Efficiency is not, in any
> meaningful sense. policy_ptr doesn't have any inherent efficiency;
> its efficiency depends almost entirely on how it is configured.
> _That_ might be considered an advantage, though.

Different strokes, for different folks...

Jeremy Jurksztowicz

David Abrahams

unread,
Mar 28, 2005, 6:58:22 AM3/28/05
to
arke...@myrealbox.com (Jeremy J) writes:

>> I think the question in fact should be "is it good enough for enough
>> people?"
>
> Good enough is very hard to put your finger on. I can assure you it
> (shared_ptr) is not good enough for me, but that is a useless data
> (see below).

Not completely. The best we can do is probably to gather anecdotes
and use intuition.

>> Anyway I hope that we are indeed faced with that question soon. But
>> that isn't the same as asking "which one should we choose?" I
>> maintain that comparing policy_ptr with shared_ptr is really a waste
>> of time, besides being meaningless as I was trying to demonstrate with
>> my empty policy_ptr "implementation". All you need to do in order to
>> move past the first question above is demonstrate that shared_ptr is
>> probably unacceptable for a substantial fraction of smart pointer
>> applications.
>
> I maintain that C++ is a general purpose programming language and it
> is unacceptable to 'anticipate' the quality of a library component
> based on how many people 'might' need it.

It's not about 'anticipating' 'quality'. It's about deciding whether
a component should be _standardized_ or not. There's no clear
consensus in the committee about which domains are important to cover
in the standard library, how many people in need is enough, or on most
other questions that would help to really nail this down. I can tell
you, however, that there seems to be a feeling that some good and
useful things probably shouldn't be standardized. For example, as of
a couple years ago, most people in the LWG weren't convinced a
rational number class template was of enough general utility to
warrant standardizing it. I happened to disagree, but I don't get to
decide how things go.

> If you are trying to fill some design hole, or provide a specific
> component to help implement specific idioms, then the success of
> your attempt is eay to guage. If you want to please a 'substantial'
> number of programmers, then turn back now, because there are few
> metrics in the world which would confirm your 'success'.

Yes, that's why I wrote "probably unacceptable." In other words, you
need to be reasonably convincing. Nobody expects to be able to
quantify success here.

> From what I can tell boost::shared_ptr is trying to be a thread safe
> reference counted pointer, which it seems to do very well. I have no
> (very little) use for such pointers. A policy_ptr is trying to be a
> (user configurable threading), (user configurable ownership), (user
> confugurable etc...) pointer, I probably will find a few uses for
> such pointers, but that is my buisiness, and does not reflect the
> quality of the pointers.

Quality is not at issue. For the purposes of this discussion, we can
assume they're all excellent. If past experience is any guide, the
committee will want to be convinced that what we have already accepted
into TR1 isn't good enough for some substantial fraction of C++
programmers, and then that policy_ptr is the right way to address that
deficiency.

signing-'off'-now-with-'best'-regards-ly y'rs
dave

--
Dave Abrahams
Boost Consulting
www.boost-consulting.com

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Peter Dimov

unread,
Mar 28, 2005, 12:24:24 PM3/28/05
to
Jeremy J wrote:
> From what I can tell boost::shared_ptr is trying to be a thread safe
> reference counted pointer, which it seems to do very well.

"Trying to be thread safe" is probably not the right way to put it. If you
have thread A operating on shared_ptr p, and thread B operating on
shared_ptr q, we want this to be allowed without explicit user locks. This
is a good thing if these two threads are spawned by two libraries written by
different authors. shared_ptr tries to be as thread unsafe as possible,
while still offering the above guarantee.

Smart pointers that try to be thread safe usually allow the stronger version
of thread safety, where thread A and thread B can operate on the same
pointer p without locks.

You can turn boost::shared_ptr's thread safety off with a #define. What you
can't do is use both a thread safe shared_ptr and a thread unsafe shared_ptr
in the same program (or in some nonportable cases, translation unit.)

Some people view this as a mistake. They believe that shared_ptr should have
an additional template parameter that specifies the level of thread safety.

Emil

unread,
Mar 28, 2005, 6:51:00 PM3/28/05
to
> Shared_ptr (without weak pointers) is in tr1

Correction: weak_ptr is also part of TR1. If you're interested to read
the full text of TR1, it can be found at
http://open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1745.pdf.

> Now, IMHO smart pointers are a performance-sensitive device because
they
> can be very pervasive and they compete with raw pointers and manual
> memory management, wihich are tedious but fast.

I think this point was also missed in much of the preceeding
discussions. Smart pointers do compete with raw pointers, but I must
add that this competition is not limited to only speed. The 3 most
valuable features of raw pointers (compared to many smart pointer
classes) are:

1) The ability to work with incomplete/void types
2) The ability to destroy an object regardless of how many pointers
refer to it.
3) Speed.

The above are the 3 reasons why for me personally it was hard to
justify the use of smart pointers in my own code. The reason why I use
shared_ptr now is that it answers the 3 points above:

1) is supported directly
2) is supported through weak_ptr

As for 3), shared_ptr is fast where it matters, namely operator*. In
fact, operator* is so critical to performance that I would not consider
using a smart pointer that does any type of book-keeping or checking
work inside operator* (unless it is done for debugging purposes in
debug builds, but this is beyond the scope of this discussion or any
smart pointer interface.)

> That's why I think offering only shared_ptr is not a desirable state
of
> affairs. What I am advocating is the adoption (by boost and by the
> standard) of a policy-based framework that (1) offers programmers a
> broad range of canned designs obtained by mixing and matching simple
> policies, (2) offers support for extensibility, giving users the
option
> of plugging their own simple policies that are system-specific (such
as
> a system-specific atomic operations, detail that is very
> performance-sensitive).

My only problem with (1) is that it comes at the cost of requiring a
complete type and also the fact that if the different systems in a code
base each uses its own type of smart pointer, they will not be
compatible with each other. Policy-based designs merely make it easy to
create these different types of smart pointers. Because code that
converts from one type of smart pointer to another needs a complete
type and it is usually a function template, a side effect of a system
like this is that it introduces the complete types in header files.

As for 2), platform-specific implementations are of no concern to user
code; they are up to the library implementer to worry about. Besides,
any attempt to specify universal framework for optimal implementations
(which is what policy-based designs try to do) limits the possible
implementations to only those that fit the framework.

> A policy-based smart pointer is being proposed for boost and is next
in
> the review chain. One of the designs supported by policy_ptr is
exactly
> that of shared_ptr, in addition to incomparable flexibility and
> extensibility. I believe it would be very good if a policy-based
pointer
> makes it into the standard, and credit is definitely not what I'm
afetr
> - in fact I don't deserve it, because policy_ptr as written by Dave
Held
> with the help of Jonathan Turkanis is miles ahead from my initial
smart_ptr.

Could you elaborate more on this topic? What policies are used that
make policy_ptr cover the shared_ptr/weak_ptr semantics?

> Part of making people think about, and understand, the benefits of
> policy-based smart pointers is an understanding of the costs of
> shared_ptr's design, something that I believe has not been discussed
> enough in the past. A policy-based smart pointer in the hands of an
> application architect offers much better adjustment of design to
need,
> and as an added benefit, excellent performance.

It is misleading to talk about costs of shared_ptr without talking
about the cost of policy-based designs. True, you can customize a
policy-based implementation to address performance costs; however this
ability is not free. The cost is that 1) we require a complete type and
2) we have to worry about conversion between different pointer types.
Take away 1) and 2) and you are reinventing shared_ptr.

In my mind policy-based designs and shared_ptr/weak_ptr should not be
considered as competitive because they are different. I think
policy-based designs should be considered as a framework which
facilitates implementing smart pointers in the case when shared_ptr's
performance has been proven inadequate. The alternative of policy-based
design in this case is to write your own smart pointer class, which is
very very tricky to get right.

Emil

David B. Held

unread,
Mar 30, 2005, 3:32:47 AM3/30/05
to
Emil wrote:
> [...]

> 1) The ability to work with incomplete/void types
> 2) The ability to destroy an object regardless of how many pointers
> refer to it.
> 3) Speed.
> [...]

>>That's why I think offering only shared_ptr is not a desirable state
>>of affairs. What I am advocating is the adoption (by boost and by the
>>standard) of a policy-based framework that (1) offers programmers a
>>broad range of canned designs obtained by mixing and matching simple
>>policies, (2) offers support for extensibility, giving users the
>>option of plugging their own simple policies that are system-specific
>>(such as a system-specific atomic operations, detail that is very
>>performance-sensitive).
>
> My only problem with (1) is that it comes at the cost of requiring a
> complete type

Would you like to defend this claim? Using my private version of
the smart_ptr<> that Andrei refers to, the following is accepted as
a legal C++ program by gcc 3.4.2:

#include "boost/smart_ptr.hpp"
class foo;
void bar(boost::smart_ptr<>::to<foo> p);
int main() { }

If you mean something other than what I think you mean, please do me
the favor of elaborating with an example program and the accompanying
diagnostic.

> and also the fact that if the different systems in a code
> base each uses its own type of smart pointer, they will not be
> compatible with each other.

Not necessarily true. It depends on whether the configurations are
defined to be convertible or not. If different portions of a code
base use different containers, they may also be incompatible. And
yes, I mean the standard containers. All you need to do is provide
custom allocator policies. If interoperability is the highest
virtue, then VB must be virtuous indeed, with its default type of
Variant. Should we traffic in variant types so that all of cour
code is compatible? C++ is about choice...the power to make design
tradeoffs yourself, rather than having a decision forced on you by
the language.

> Policy-based designs merely make it easy to create these different
> types of smart pointers. Because code that converts from one type
> of smart pointer to another needs a complete type and it is usually
> a function template, a side effect of a system like this is that it
> introduces the complete types in header files.

This sounds like you're speaking from experience. Do please share
your experience with us. Typically, a policy configuration will be
selected for a certain set of pointee types, and functions will
traffic in that smart pointer configuration. Gratuitous proliferation
of configurations will obviously complicate matters, but that is not
a problem peculiar to policy-based smart pointers. For the majority
of the code that traffics in a single pointer type for a given pointee
type, the complete type will not need to be known in headers. Where
there is a need to convert from one pointer type to another, that will
typically not be done in headers, nor with template functions.
Rather, it is best to traffic in concrete policy configurations
and allow the constructors to determine convertibility of pointers.

> As for 2), platform-specific implementations are of no concern to
> user code; they are up to the library implementer to worry about.
> Besides, any attempt to specify universal framework for optimal
> implementations (which is what policy-based designs try to do)
> limits the possible implementations to only those that fit the
> framework.

While this is true to some extent, many years of experience have
shown that there are a limited number of reasonable smart pointer
designs, and a policy-based framework is capable of capturing the
vast majority of them, including all of the most interesting ones.
However, if you would like to supply a design that does not fit
within the framework, I'd be glad to take a look at it.

> [...]


> Could you elaborate more on this topic? What policies are used
> that make policy_ptr cover the shared_ptr/weak_ptr semantics?

Well, it's very simple really. I simply ripped the guts out of
{shared|weak}_ptr and stuffed them into some ownership and
storage policies. Short of being able to spell such a configuration
"shared_ptr<>" or "weak_ptr<>", they seem to look, act, and smell
just like the original.

> [...]


> It is misleading to talk about costs of shared_ptr without talking
> about the cost of policy-based designs. True, you can customize a
> policy-based implementation to address performance costs; however
> this ability is not free. The cost is that 1) we require a
> complete type

Until you demonstrate code that proves this, I must dismiss this
claim as FUD.

> and 2) we have to worry about conversion between different pointer
> types.

Only if you want to. Otherwise, you can choose the shared_ptr<>
configuration, and tell all your friends to do the same, and you
get all the benefits of shared_ptr<> while retaining the option
to choose a different configuration at your leisure.

> [...]


> In my mind policy-based designs and shared_ptr/weak_ptr should not
> be considered as competitive because they are different.

You're right. They're not really direct competitors.

> I think policy-based designs should be considered as a framework
> which facilitates implementing smart pointers in the case when
> shared_ptr's performance has been proven inadequate. The alternative
> of policy-based design in this case is to write your own smart
> pointer class, which is very very tricky to get right.

Exactly. But to be more precise, a policy-based framework is a
superset of a fixed-point configuration, because such a configuration
can be emulated within a policy-based frameowrk.

Dave

Peter Dimov

unread,
Mar 30, 2005, 6:26:36 PM3/30/05
to
Andrei Alexandrescu (See Website for Email) wrote:
> * The control block maintains (= spends space and time on) one integer
> that caters for weak pointers, even in designs that don't use weak
> pointers (most).

FWIW, it appears that most of the "time" can be eliminated in an optimal
implementation, leaving one initialization and one atomic comparison (per
ownership group lifetime). The "space" (one integer) will obviously remain.

It is loading more messages.
0 new messages