I've been meaning to write this email for awhile, because people have been bugging me for my thoughts on WeakPtr for a long while now. Sorry guys for taking so long, but as you know, I've got a lot of things going on and been traveling a lot. Pardon me for any incoherencies in this rant, it's because eae@ poured me a ginormous long island iced tea, and I am still sobering up.
TL;DR: I generally recommend against using WeakPtr, except when it's in a constrained leaf node situation like PostTaskAndReply() or MessageLoop::current()->PostTask() with a base::Closure with WeakPtr<A> and A::Foo(), where A::Foo() is a private function. In these situations however, I very strongly recommend WeakPtr.
<rant>
Why do I feel this way? Well, in short, it's because in a large codebase, like Chromium, I feel that other uses of WeakPtr can easily lead to a situation where the C++ pointer dependency graph does not match the true ownership graph.
What do I mean by the "true" ownership graph? Well, what I mean is that when A "owns" B, which "owns" C, that A decides when B goes away, and B decides when C goes away. C may reference A, but C has no say over when A goes away.
Now, in a smaller project, I wouldn't worry so much. WeakPtr is pretty awesome. I mean, it automagically NULLs out pointers for you, isn't that awesome? I certainly think it is. It saves on so much boilerplate crap, which helps keep the code short and simple. The problem is, in large code bases, it incentivizes dangerous coding practices.
For example, let's have a dependency chain, where A owns B which owns C which owns D. Conceptually, you'd have A keep B as a member (be it directly as a member variable, or a scoped_ptr<B>), as so on until D. Now, separately in the dependency graph (which arguably you should try to force as much as possible into a tree), you have A also owning M, which owns N. Now, let's say that D references B, and so does N.
The first question to ask when you see that D references B and that N references B is "do D or N own B?". In this case, since I've described it as such, they don't. G is owned by B, and N is owned by M which is owned by A. Essentially, the graph is like:
A
|\
B M
| |
C N
|
D
Now, the interesting thing is, what happens when C also references B, but not in any "important" manner. Let's say that the author of C decides that its reference to B is not actually "important', so it wants to keep a WeakPtr<B>. This is in itself dangerous, and let me explain why. Now, it's OK for B to get deleted *without* notifying C that it should go away. That's because it's what WeakPtr<B> is intended to do - gracefully handle the disappearance of B by returning NULL. Some of the time, B may still directly delete C, but some of the time, C may now outlive B, and gracefully handle B's deletion via WeakPtr logic.
As you can see, this breaks the dependency graph. Now we have something like
A
|\
B M
? |
C N
|
D
Now, what happens if D references A? That becomes problematic. What really *should* happen, is when A goes away, it notifies B it's going away, which notifies C, and C notifies D. But, now that we've introduced this WeakPtr<B> into C, we very possibly have this issue where B goes away, but does *not* notify C. (Now, some of you might be like, why wouldn't B notify C it was going away? Well, if B did that, what's the friggin point of C keeping a WeakPtr<B>?). That problem carries forward to D, since if B doesn't notify C that B is going away, then D sure as hell ain't gonna notice. And if D references B or A, what then? If A/B get deleted, then how is D going to get notified?
Now now, you may be asking, what's so bad about directly monitoring profile destruction? Well, as my theory of ownership goes, Foo owns Bar if Foo decides when Bar goes away. Now, if Foo is monitoring Bar's destruction, then Foo is deciding when Bar goes away.
How does this impact our dependency graph? Back to D monitoring A's destruction, we have a situation where the dependency chain does not match the *true* ownership chain. We essentially are reparenting D to A, instead of being parented by C. That's bad. We now have:
A
/|\
D B M
? |
C N
Do people see why this is so bad? When D monitors A's destruction, it's not specified that D gets destroyed before or after the B/M destruction chains. That's a fundamental problem with observer paradigms. Observers are *bad* at preserving ordering. Let's say you know that between the siblings B and M, B must be constructed before M and destroyed after M (this mimics the member variable construction/destruction ordering, where earlier member variables are constructed *before* and destroyed *after* later member variables). It's *much* better when dependency chains match the actual C++ ownership structure. Can you imagine if *all* of all various subsystems (network, extensions, omnibox, autofill, rendering, etc etc) and their offspring reparented directly to the Profile? How would we sort out the ordering dependencies amongst the various subsystems and their offspring? That'd be a disaster. The true dependency graph is important, and we should make our code mirror it as best as we can.
Now, people may ask why, if D references A, we don't just have A make sure to notify B of its deletion, which notifies C, and therefore D.
Well, this is precisely the problem with *large* codebases. In most *sub*sections of the dependency chain, it's *OK* that parent node X does not immediately notify Y that it's going away. Back references aren't *everywhere*. They are common, but not *everywhere*. So what happens is we don't have a situation where A owns B which owns C which owns D which references A, we have A owns B which owns C which owns ... which owns X which references A. *Somewhere* along that dependency chain, we have someone who uses a WeakPtr<A> back reference or something. The dependency chain is long enough that, when you're authoring X, do you *really* want to understand the full dependency chain when you want to reference A? No, hell no. It's way simpler (especially due to OWNERS) to simply acquire a WeakPtr<A> and check if it's NULL before accessing it. And sometimes, X needs to know when A goes away. When that happens, do you think the author of X will try to figure out the entire dependency chain from A to X? HELL NO. That's simply masochistic, there's no incentive structure for that to happen. I mean, you'd have to read all the code to figure out the dependency chain, and then you'd have to plumb all the pointers through all the function arguments down to the spot where you need it, and get all the OWNERS approvals for all the steps along the way. The easy solution is to add a notification in A for when it goes away, and for X to monitor that notification to figure out when to begin its own teardown behavior. The natural (and more importantly, economically incentivized) outcome is that most people take the easy way out, which of course, simply exacerbates the problem.
These fundamental problems of understanding the full dependency graph vs simply taking one off fixes (like observers) lead to situations where the "true" ownership graph does not match the actual pointer structure of the codebase. And it's very easy to see why, I mean, it's a pain to think about the "true" ownership graph, not to mention plumbing the necessary pointers throughout the codebase.
I've been primarily discussing WeakPtrs here, but this general paradigm of automagically gracefully handling deletion is fundamentally dangerous. I'm going to call out a number of different patterns I see here:
* "I don't know when A is actually supposed to go away, so I'll refcount A"
* "I don't know when A is actually supposed to go away, so I'll keep a WeakPtr<A>"
* "I don't know when A is actually supposed to go away, so I'll add a notification in A for when it goes away, and then register an observer to monitor that notification."
* "I dont' know when Observer Bar is actually supposed to go away, so if Bar is gone, then notifications from Foo will automagically not notify the dead Bar, rather than have Bar unregister itself from Foo."
As I've ranted about before, there's no easy way (in our codebase) out of understanding the dependency graph. Refcounting or WeakPtr or Observers or whatever are short/simple hacks that may work as bandaids for a problem, but don't address the root cause, which will just manifest elsewhere given long enough. And later on, they will be harder to fix, because understanding the full dependency chain becomes harder as time goes on. Better componentization like the content/ refactor helps make dependency chains easier to understand, because we try to minimize back references across modules, but it's still a fundamentally hard problem since our codebase is so large and intricate.
That's why I only like WeakPtr use in very constrained situations. In particular, I like using them in leaf nodes of the dependency graph, where they are contained in the private section of a class, so the WeakPtr<Foo> can't leak outside of class Foo, and lead to corruption of the dependency graph. In smaller codebases, I think the issues of conveying dependencies via pointers is much less of a problem, because the dependency graph is way smaller, and often enough, the full graph can fit in one person's head. But in our codebase, no one person really understands the full dependency graph of Chromium, and therefore people employ these one off hacks rather than trying to truly make the code match the dependency graph.
Thanks to all of you who have managed to read this far. This is the answer to why, when people send me code reviews using SupportsWeakPtr<T>, or WeakPtr<T> outside of a class, or other mechanisms which enable graceful deletion of objects without explicit notification, that I R- their changelists. In smaller codebases, their instincts would be just fine and dandy, and indeed would be desirable, due to requiring less code. But in a code base with a dependency graph as large as ours, these shortcuts lead to incentivizing bad coding practices that hurt our codebase as a whole. I have a strong bias towards optimizing for the long-term health of the project, so that's why you see me being anal in code reviews and disagreeing with code that makes WeakPtr<T> and its ilk of paradigms more pervasive in our codebase.
Cheers, and sorry for the rant. And sorry to those who have waited for me for so long to write this up.
PS: This is just the willchan@ perspective, and not meant as any overall Chromium directive. It's meant to be more of a fuller explanation for when I R- a change due to one of these patterns. I've spent much of my Chromium career cleaning up Profile's dependency graph, primarily in the network stack, removing refcounting and other automagic references, so I'm particularly sensitive to these problems in a large codebase. Beware when you cc me on code reviews :)
</rant>