Rant: some of willchan's thoughts on WeakPtr, for those who care to read

506 views
Skip to first unread message

William Chan (陈智昌)

unread,
Sep 18, 2012, 4:39:51 AM9/18/12
to chromium-dev
I've been meaning to write this email for awhile, because people have been bugging me for my thoughts on WeakPtr for a long while now. Sorry guys for taking so long, but as you know, I've got a lot of things going on and been traveling a lot. Pardon me for any incoherencies in this rant, it's because eae@ poured me a ginormous long island iced tea, and I am still sobering up.

TL;DR: I generally recommend against using WeakPtr, except when it's in a constrained leaf node situation like PostTaskAndReply() or MessageLoop::current()->PostTask() with a base::Closure with WeakPtr<A> and A::Foo(), where A::Foo() is a private function. In these situations however, I very strongly recommend WeakPtr.

<rant>
Why do I feel this way? Well, in short, it's because in a large codebase, like Chromium, I feel that other uses of WeakPtr can easily lead to a situation where the C++ pointer dependency graph does not match the true ownership graph.

What do I mean by the "true" ownership graph? Well, what I mean is that when A "owns" B, which "owns" C, that A decides when B goes away, and B decides when C goes away. C may reference A, but C has no say over when A goes away.

Now, in a smaller project, I wouldn't worry so much. WeakPtr is pretty awesome. I mean, it automagically NULLs out pointers for you, isn't that awesome? I certainly think it is. It saves on so much boilerplate crap, which helps keep the code short and simple. The problem is, in large code bases, it incentivizes dangerous coding practices.

For example, let's have a dependency chain, where A owns B which owns C which owns D. Conceptually, you'd have A keep B as a member (be it directly as a member variable, or a scoped_ptr<B>), as so on until D. Now, separately in the dependency graph (which arguably you should try to force as much as possible into a tree), you have A also owning M, which owns N. Now, let's say that D references B, and so does N.

The first question to ask when you see that D references B and that N references B is "do D or N own B?". In this case, since I've described it as such, they don't. G is owned by B, and N is owned by M which is owned by A. Essentially, the graph is like:

A
|\
B M
| |
C N
|
D

Now, the interesting thing is, what happens when C also references B, but not in any "important" manner. Let's say that the author of C decides that its reference to B is not actually "important', so it wants to keep a WeakPtr<B>. This is in itself dangerous, and let me explain why. Now, it's OK for B to get deleted *without* notifying C that it should go away. That's because it's what WeakPtr<B> is intended to do - gracefully handle the disappearance of B by returning NULL. Some of the time, B may still directly delete C, but some of the time, C may now outlive B, and gracefully handle B's deletion via WeakPtr logic.

As you can see, this breaks the dependency graph. Now we have something like

A
|\
B M
? |
C N
|
D

Now, what happens if D references A? That becomes problematic. What really *should* happen, is when A goes away, it notifies B it's going away, which notifies C, and C notifies D. But, now that we've introduced this WeakPtr<B> into C, we very possibly have this issue where B goes away, but does *not* notify C. (Now, some of you might be like, why wouldn't B notify C it was going away? Well, if B did that, what's the friggin point of C keeping a WeakPtr<B>?). That problem carries forward to D, since if B doesn't notify C that B is going away, then D sure as hell ain't gonna notice. And if D references B or A, what then? If A/B get deleted, then how is D going to get notified?

Well, what happens in practice is when these dependency chains are broken, that D directly monitors A's deletion. Have you guys ever seen the PROFILE_DESTROYED notification? Let's see https://code.google.com/p/chromium/source/search?q=PROFILE_DESTROYED&origq=PROFILE_DESTROYED&btnG=Search+Trunk. Man, that's a lot of hits. Are all the objects in these hits direct children of Profile? No way. They're directly monitoring profile destruction *because* of broken dependency chains.

Now now, you may be asking, what's so bad about directly monitoring profile destruction? Well, as my theory of ownership goes, Foo owns Bar if Foo decides when Bar goes away. Now, if Foo is monitoring Bar's destruction, then Foo is deciding when Bar goes away.

How does this impact our dependency graph? Back to D monitoring A's destruction, we have a situation where the dependency chain does not match the *true* ownership chain. We essentially are reparenting D to A, instead of being parented by C. That's bad. We now have:

  A
 /|\
D B M
  ? |
  C N

Do people see why this is so bad? When D monitors A's destruction, it's not specified that D gets destroyed before or after the B/M destruction chains. That's a fundamental problem with observer paradigms. Observers are *bad* at preserving ordering. Let's say you know that between the siblings B and M, B must be constructed before M and destroyed after M (this mimics the member variable construction/destruction ordering, where earlier member variables are constructed *before* and destroyed *after* later member variables). It's *much* better when dependency chains match the actual C++ ownership structure. Can you imagine if *all* of all various subsystems (network, extensions, omnibox, autofill, rendering, etc etc) and their offspring reparented directly to the Profile? How would we sort out the ordering dependencies amongst the various subsystems and their offspring? That'd be a disaster. The true dependency graph is important, and we should make our code mirror it as best as we can.

Now, people may ask why, if D references A, we don't just have A make sure to notify B of its deletion, which notifies C, and therefore D.

Well, this is precisely the problem with *large* codebases. In most *sub*sections of the dependency chain, it's *OK* that parent node X does not immediately notify Y that it's going away. Back references aren't *everywhere*. They are common, but not *everywhere*. So what happens is we don't have a situation where A owns B which owns C which owns D which references A, we have A owns B which owns C which owns ... which owns X which references A. *Somewhere* along that dependency chain, we have someone who uses a WeakPtr<A> back reference or something. The dependency chain is long enough that, when you're authoring X, do you *really* want to understand the full dependency chain when you want to reference A? No, hell no. It's way simpler (especially due to OWNERS) to simply acquire a WeakPtr<A> and check if it's NULL before accessing it. And sometimes, X needs to know when A goes away. When that happens, do you think the author of X will try to figure out the entire dependency chain from A to X? HELL NO. That's simply masochistic, there's no incentive structure for that to happen. I mean, you'd have to read all the code to figure out the dependency chain, and then you'd have to plumb all the pointers through all the function arguments down to the spot where you need it, and get all the OWNERS approvals for all the steps along the way. The easy solution is to add a notification in A for when it goes away, and for X to monitor that notification to figure out when to begin its own teardown behavior. The natural (and more importantly, economically incentivized) outcome is that most people take the easy way out, which of course, simply exacerbates the problem.

These fundamental problems of understanding the full dependency graph vs simply taking one off fixes (like observers) lead to situations where the "true" ownership graph does not match the actual pointer structure of the codebase. And it's very easy to see why, I mean, it's a pain to think about the "true" ownership graph, not to mention plumbing the necessary pointers throughout the codebase.

I've been primarily discussing WeakPtrs here, but this general paradigm of automagically gracefully handling deletion is fundamentally dangerous. I'm going to call out a number of different patterns I see here:
* "I don't know when A is actually supposed to go away, so I'll refcount A"
* "I don't know when A is actually supposed to go away, so I'll keep a WeakPtr<A>"
* "I don't know when A is actually supposed to go away, so I'll add a notification in A for when it goes away, and then register an observer to monitor that notification."
* "I dont' know when Observer Bar is actually supposed to go away, so if Bar is gone, then notifications from Foo will automagically not notify the dead Bar, rather than have Bar unregister itself from Foo."
* "I don't know when A is actually supposed to go away, so I'll add a *global* list of A objects, and add an A::IsValidA() function as my hack instead of a WeakPtr<A>." (https://code.google.com/searchframe#OAMlx_jo-ck/src/chrome/browser/profiles/profile_manager.h&type=cs&l=87) (https://code.google.com/p/chromium/source/search?q=IsValidProfile&origq=IsValidProfile&btnG=Search+Trunk)

As I've ranted about before, there's no easy way (in our codebase) out of understanding the dependency graph. Refcounting or WeakPtr or Observers or whatever are short/simple hacks that may work as bandaids for a problem, but don't address the root cause, which will just manifest elsewhere given long enough. And later on, they will be harder to fix, because understanding the full dependency chain becomes harder as time goes on. Better componentization like the content/ refactor helps make dependency chains easier to understand, because we try to minimize back references across modules, but it's still a fundamentally hard problem since our codebase is so large and intricate.

That's why I only like WeakPtr use in very constrained situations. In particular, I like using them in leaf nodes of the dependency graph, where they are contained in the private section of a class, so the WeakPtr<Foo> can't leak outside of class Foo, and lead to corruption of the dependency graph. In smaller codebases, I think the issues of conveying dependencies via pointers is much less of a problem, because the dependency graph is way smaller, and often enough, the full graph can fit in one person's head. But in our codebase, no one person really understands the full dependency graph of Chromium, and therefore people employ these one off hacks rather than trying to truly make the code match the dependency graph.

Thanks to all of you who have managed to read this far. This is the answer to why, when people send me code reviews using SupportsWeakPtr<T>, or WeakPtr<T> outside of a class, or other mechanisms which enable graceful deletion of objects without explicit notification, that I R- their changelists. In smaller codebases, their instincts would be just fine and dandy, and indeed would be desirable, due to requiring less code. But in a code base with a dependency graph as large as ours, these shortcuts lead to incentivizing bad coding practices that hurt our codebase as a whole. I have a strong bias towards optimizing for the long-term health of the project, so that's why you see me being anal in code reviews and disagreeing with code that makes WeakPtr<T> and its ilk of paradigms more pervasive in our codebase.

Cheers, and sorry for the rant. And sorry to those who have waited for me for so long to write this up.

PS: This is just the willchan@ perspective, and not meant as any overall Chromium directive. It's meant to be more of a fuller explanation for when I R- a change due to one of these patterns. I've spent much of my Chromium career cleaning up Profile's dependency graph, primarily in the network stack, removing refcounting and other automagic references, so I'm particularly sensitive to these problems in a large codebase. Beware when you cc me on code reviews :)
</rant>

Randy Smith

unread,
Sep 18, 2012, 10:36:42 AM9/18/12
to William Chan, chromium-dev
First of all, thank you for the rant.  Thorough and thought provoking.

However, I'd like to challenge it, if for no other reason that I'll learn something from your response :-}.  I hear you as saying:
* It's frigging hard, in a project like ours, to track the full ownership/dependency graph between objects.
* Widespread use of WeakPtr<> (at least outside of class implementations) encourages people not to try.
* Therefore, we shouldn't used WeakPtr<>.

The conclusion I come to from that (simplified) summary is different: We should try to find programming patterns that really truly decouple objects, so that they don't *need* to know about each other's lifetimes.  

More specifically, I think that objects within a subsystem (which is the largest set that I think should be expected to be completely in someone's head) should have strict ownership/dependency/lifetime guarantees, and sometimes that's ok between subsystems if the relationship is easy to specify (e.g. if Profile directly owns a whole bunch of subsystems, that strikes me as simple enough that everyone could keep it in their head).  But I'd also like to have a model where subsystems at separated points in the overall architecture could interact without having to worry about lifetime issues.

Having said this, I realize that this is exactly what I've been trying to do within the Downloads system.  Within what I think of as core downloads, I'm aiming for strict ownership (DownloadManager owns DownloadItem owns DownloadFile), whereas in the relationships with other subsystems (net), I'm looking to decouple the classes so that I don't need to worry about the lifetime relationship of DownloadResourceHandler vs. DownloadFile; either can die first, and everything will still work (simplified).  I'm sure I haven't gotten this completely right, but at a high level it strikes me as a good thing to aim for.

So: Can you say more about why you don't want to try to find patterns between subsystems that allow for decoupling of lifetime dependencies between cooperating objects?  If we could find such patterns, they'd strike me as much more robust than actually figuring out the full dependency graph, for all the reasons you give below.

-- Randy


--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Wez

unread,
Sep 18, 2012, 4:41:18 PM9/18/12
to William Chan, chromium-dev
Hi Will,

Thanks for the rant!  I've actually got a CL gathering dust to update the WeakPtr comments to discourage use of SupportsWeakPtr; perhaps that can be one SupportsWeakPtr review you R+? ;)

tl;dr: OMHO, mixing ref-counted and non-ref-counted objects, and having non-ref-counted objects do work on other threads, makes managing lifetimes tricky and WeakPtr is a useful tool in addressing that.

FWIW, we're making use of weak pointers and ref-counting extensively in the Chromoting codebase and we've had a lot of pain with ambiguous object ownership and/or lifetimes, and interesting(*) interactions with threading.  The general guideline we're following in Chromoting is that the two should only be used as part of implementations, and (ideally) never in interfaces, so we generally try to follow an asynchronous interaction model:

- Each object has exactly one owner, e.g. it's held in a scoped_ptr<> somewhere, and not ref-counted.
- Each object is called on a single thread, and makes callbacks on that thread, even if it works on others.

This means that we tend to have a few distinct classes of object:

- Simple ones that can be torn down immediately via scoped_ptr<>.  These may use WeakPtr internally to orphan tasks they post to themselves.
- Ones that need to finish work to teardown, that the caller must wait for.  These have a two-phase stop-and-delete teardown, with a callback dispatched on deletion.
- Ones that need to finish work to teardown, but which the caller doesn't care about.  These usually involve a ref-counted core with a WeakPtr to the caller-owned boilerplate, deletion of which triggers teardown.  Because the object typically still needs message-loops running to process teardown tasks, we have AutoThreadTaskRunner/AutoThread which quit managed message-loops only when all potential task posters have dropped references they hold to the TaskRunner.

In all three cases the result is that callers have a clear idea of when resources that they have supplied to an object, particularly interfaces implemented by the caller or other caller-owned objects, will no longer be touched.

That contract means the calling code and object implementations can have their lifetimes reasoned about independently most of the time, which I find is a win.

Wez


(*) Of the "May you live in interesting times" ilk.

William Chan (陈智昌)

unread,
Sep 18, 2012, 6:02:03 PM9/18/12
to Randy Smith, chromium-dev
I think you're right that we should try to decouple subsystems as much as possible. The way to achieve this is better componentization of our codebase, to wrangle the dependency graph into something "reasonable". But if there is a true dependency edge in that graph, then I think it can be dangerous (for all the reasons I explained) to lose that dependency information and relying on WeakPtr<T> instead.

There is no magic bullet, no magical programming pattern that will solve this decoupling. The way to achieve better isolation between our subsystems is to refactor them into having more sensible APIs.

Randy Smith

unread,
Sep 19, 2012, 11:45:20 AM9/19/12
to William Chan (陈智昌), chromium-dev
On Tue, Sep 18, 2012 at 6:02 PM, William Chan (陈智昌) <will...@chromium.org> wrote:
I think you're right that we should try to decouple subsystems as much as possible. The way to achieve this is better componentization of our codebase, to wrangle the dependency graph into something "reasonable". But if there is a true dependency edge in that graph, then I think it can be dangerous (for all the reasons I explained) to lose that dependency information and relying on WeakPtr<T> instead.

Completely agreed.  I think this is a "do (defined dependency edge, no WeakPtr<>) or do-not (make sure the dependency doesn't exist), there is no try" situation.

There is no magic bullet, no magical programming pattern that will solve this decoupling. The way to achieve better isolation between our subsystems is to refactor them into having more sensible APIs.

Yeah ... my instinct is that we can go further than that.  What are the characteristics of two subsystems that interact but don't have the dependency edge?  What are patterns that work well for that situation?  What are the gotchas?  I don't have the answers, but it seems worth asking the questions.  

William Chan (陈智昌)

unread,
Sep 19, 2012, 1:18:28 PM9/19/12
to Wez, chromium-dev
On Tue, Sep 18, 2012 at 1:41 PM, Wez <w...@chromium.org> wrote:
Hi Will,

Thanks for the rant!  I've actually got a CL gathering dust to update the WeakPtr comments to discourage use of SupportsWeakPtr; perhaps that can be one SupportsWeakPtr review you R+? ;)

SGTM :)
 

tl;dr: OMHO, mixing ref-counted and non-ref-counted objects, and having non-ref-counted objects do work on other threads, makes managing lifetimes tricky and WeakPtr is a useful tool in addressing that.

FWIW, we're making use of weak pointers and ref-counting extensively in the Chromoting codebase and we've had a lot of pain with ambiguous object ownership and/or lifetimes, and interesting(*) interactions with threading.  The general guideline we're following in Chromoting is that the two should only be used as part of implementations, and (ideally) never in interfaces, so we generally try to follow an asynchronous interaction model:

- Each object has exactly one owner, e.g. it's held in a scoped_ptr<> somewhere, and not ref-counted.

Awesome
 
- Each object is called on a single thread, and makes callbacks on that thread, even if it works on others.

Awesome
 

This means that we tend to have a few distinct classes of object:

- Simple ones that can be torn down immediately via scoped_ptr<>.  These may use WeakPtr internally to orphan tasks they post to themselves.

Awesome
 
- Ones that need to finish work to teardown, that the caller must wait for.  These have a two-phase stop-and-delete teardown, with a callback dispatched on deletion.

Don't fully understand this, but it could just be the morning :)
 
- Ones that need to finish work to teardown, but which the caller doesn't care about.  These usually involve a ref-counted core with a WeakPtr to the caller-owned boilerplate, deletion of which triggers teardown.  Because the object typically still needs message-loops running to process teardown tasks, we have AutoThreadTaskRunner/AutoThread which quit managed message-loops only when all potential task posters have dropped references they hold to the TaskRunner.

Barring the AutoThreadTaskRunner and AutoThread stuff which I don't fully understand, this SGTM. I've seen two ways of handling this, either using a RefCountedThreadSafe<Core> object +WeakPtr<T>, which it sounds like you're doing, or using a non-refcounted Core object with an Orphan() member function. Have T keep a scoped_ptr<Core>, and in its destructor, do a if (core_->has_pending_async_operation()) core_.release()->Orphan(); I prefer the latter approach since I find scoped_ptr<Core> with an explicit memory ownership transfer via an Orphan() call clearer in terms of ownership than RefCountedThreadSafe<Core>. But I don't care too much, since this is an implementation detail of the class. The important thing to do is not to leak WeakPtr<T> references to the consumers of T.
 

In all three cases the result is that callers have a clear idea of when resources that they have supplied to an object, particularly interfaces implemented by the caller or other caller-owned objects, will no longer be touched.

That contract means the calling code and object implementations can have their lifetimes reasoned about independently most of the time, which I find is a win.

I don't fully grasp everything you've delineated here, but overall, I find that I very much agree with your guys' approach.
Reply all
Reply to author
Forward
0 new messages