Ref-Counting and Multi-Threading and Browser Shutdown (Oh My)

188 views
Skip to first unread message

waf...@chromium.org

unread,
Mar 3, 2014, 5:31:40 PM3/3/14
to chromi...@chromium.org
Hi Chromium-dev,

A quick question about ref-counted objects that are multi-threaded:

Ref-counted objects - use scoped_refptr; but better yet, rethink your design.  Reference-counted objects make it difficult to understand ownership and destruction order, especially when multiple threads are involved.  There is almost always another way to design your object hierarchy to avoid refcounting.  Avoiding refcounting in multithreaded situations is usually easier if you restrict each class to operating on just one thread, and use PostTask() and the like to proxy calls to the correct thread.  base::Bind(), WeakPtr, and other tools make it possible to automatically cancel calls to such an object when it dies.  Note that too much of our existing code uses refcounting, so just because you see existing code doing it does not mean it's the right solution.  (Bonus points if you're able to clean up such cases.)

I'm a little confused about the above guidance. As an example, let's consider two designs:

(1) An instance of Class [A] owns a scoped_ptr to an instance of Class [B]. Class [A] interacts on the UI thread and is only used on the UI thread. However, it touches the filesystem and so delegates some tasks to Class [B], which only performs actions on the FILE thread (or in the blocking pool, etc, not really important). So [A] gets a call to load a file, and posts a task to call a member function of [B] on the FILE thread, passing a WeakPtr in the callback for the posted task.

(2) As above, but instead Class [A] owns a scoped_refptr to an instance of Class [B], and gives a scoped_refptr to the callback for the posted task.

It seems like (1) is the recommended design, and (2) is discouraged. But it seems to me that (1) suffers from a fatal flaw that (2) does not: if the instance of [A] is deleted (say the browser shuts down which somehow causes the instance of [A]'s deletion), then the instance of [B] is deleted, possibly while its member function is running. This can mean (among other things) that the member function of [B] is highly likely to cause a crash. The fact that the callback will not begin running if the WeakPtr is invalid doesn't make a difference, because the callback may already be midway through execution when the WeakPtr is invalidated. And it's not necessarily possible to post a task to proxy B's destructor, either, since there might not be anything to post a task to during browser shutdown.

After some discussion with my team, we've concluded that when passing callbacks between threads, if [A] can be freed during browser shutdown, ref-counting is pretty much always necessary. As far as I can tell, the owning class could be freed at any time if it is owned either directly or indirectly by g_browser_process. And once that's happening, you can't post a task to the FILE thread to destroy [B], because the threads are already shut down and posting tasks won't be useful anymore.

Am I missing some basic insight? Or some way around this problem without using ref counting? (Maybe the only solution is just to leak everything during shutdown?)

Thanks for your guidance,

- Josh

Dana Jansens

unread,
Mar 3, 2014, 5:37:19 PM3/3/14
to waf...@chromium.org, chromium-dev
On Mon, Mar 3, 2014 at 5:31 PM, <waf...@chromium.org> wrote:
Hi Chromium-dev,

A quick question about ref-counted objects that are multi-threaded:

Ref-counted objects - use scoped_refptr; but better yet, rethink your design.  Reference-counted objects make it difficult to understand ownership and destruction order, especially when multiple threads are involved.  There is almost always another way to design your object hierarchy to avoid refcounting.  Avoiding refcounting in multithreaded situations is usually easier if you restrict each class to operating on just one thread, and use PostTask() and the like to proxy calls to the correct thread.  base::Bind(), WeakPtr, and other tools make it possible to automatically cancel calls to such an object when it dies.  Note that too much of our existing code uses refcounting, so just because you see existing code doing it does not mean it's the right solution.  (Bonus points if you're able to clean up such cases.)

I'm a little confused about the above guidance. As an example, let's consider two designs:

(1) An instance of Class [A] owns a scoped_ptr to an instance of Class [B]. Class [A] interacts on the UI thread and is only used on the UI thread. However, it touches the filesystem and so delegates some tasks to Class [B], which only performs actions on the FILE thread (or in the blocking pool, etc, not really important). So [A] gets a call to load a file, and posts a task to call a member function of [B] on the FILE thread, passing a WeakPtr in the callback for the posted task.

(2) As above, but instead Class [A] owns a scoped_refptr to an instance of Class [B], and gives a scoped_refptr to the callback for the posted task.

It seems like (1) is the recommended design, and (2) is discouraged. But it seems to me that (1) suffers from a fatal flaw that (2) does not: if the instance of [A] is deleted (say the browser shuts down which somehow causes the instance of [A]'s deletion), then the instance of [B] is deleted, possibly while its member function is running. This can mean (among other things) that the member function of [B] is highly likely to cause a crash. The fact that the callback will not begin running if the WeakPtr is invalid doesn't make a difference, because the callback may already be midway through execution when the WeakPtr is invalidated. And it's not necessarily possible to post a task to proxy B's destructor, either, since there might not be anything to post a task to during browser shutdown.

For the compositor we have a similar situation (UI thread object owns a compositor thread object). The destruction of the UI thread object blocks on the compositor thread object being destroyed.

 

After some discussion with my team, we've concluded that when passing callbacks between threads, if [A] can be freed during browser shutdown, ref-counting is pretty much always necessary. As far as I can tell, the owning class could be freed at any time if it is owned either directly or indirectly by g_browser_process. And once that's happening, you can't post a task to the FILE thread to destroy [B], because the threads are already shut down and posting tasks won't be useful anymore.

Am I missing some basic insight? Or some way around this problem without using ref counting? (Maybe the only solution is just to leak everything during shutdown?)

Thanks for your guidance,

- Josh

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Joao da Silva

unread,
Mar 3, 2014, 5:43:36 PM3/3/14
to waf...@chromium.org, Chromium-dev
I've favored doing (1) in the past, and posting a DeleteSoon to release [B]. Note that any callbacks back to [A] must be posted to [A]'s loop, so either the WeakPtr is already invalidated or [A] is alive and won't be destroyed concurrently.

Note that in (2) you may end up releasing [B] in its background thread, then releasing [A] in that background thread too, and if [A] owns some other objects that should live on UI then they'll be deleted on the wrong thread. Many things can go wrong from there, and IMO this is a major reason to avoid ref counting.

I think the shutdown sequence of those classes should be clarified; you may need to get a Shutdown() call at a slightly earlier stage to make sure you can cleanup properly.



--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Peter Kasting

unread,
Mar 3, 2014, 5:47:17 PM3/3/14
to waf...@chromium.org, Chromium-dev
On Mon, Mar 3, 2014 at 2:31 PM, <waf...@chromium.org> wrote:
(1) An instance of Class [A] owns a scoped_ptr to an instance of Class [B]. Class [A] interacts on the UI thread and is only used on the UI thread. However, it touches the filesystem and so delegates some tasks to Class [B], which only performs actions on the FILE thread (or in the blocking pool, etc, not really important). So [A] gets a call to load a file, and posts a task to call a member function of [B] on the FILE thread, passing a WeakPtr in the callback for the posted task.

Ideally, classes are created, destroyed, and accessed all one a single thread.  (Optional: creation can happen on a different thread as long as it's guaranteed to complete before any further accesses occur.)  Given your description, this suggests that [A] should be UI-thread-scoped and [B] should be file-thread-scoped.

If this is the case, [A] cannot own [B] via scoped_ptr<>, since as you note, that ties [B]'s destruction to [A]'s thread, which violates the above ideal, and in this case (as in most violations) is unsafe.

(2) As above, but instead Class [A] owns a scoped_refptr to an instance of Class [B], and gives a scoped_refptr to the callback for the posted task.

This allows [B] to outlive [A].  If you were considering making [A] own [B], then probably you intend for the destruction order to be [B] first, then [A].  This violates that order.  Indeed, if [B] refs itself for some reason, it will never be destroyed.  Better to prevent that, and ensure the same destruction order in all cases.

After some discussion with my team, we've concluded that when passing callbacks between threads, if [A] can be freed during browser shutdown, ref-counting is pretty much always necessary.

By the time something is trying to free [A], you don't have options anymore; you need [B] to already be shut down, or you need to leak [B] or use refs as you describe.  This should be a signal that waiting until [A]'s destruction is problematic.
 
As far as I can tell, the owning class could be freed at any time if it is owned either directly or indirectly by g_browser_process. And once that's happening, you can't post a task to the FILE thread to destroy [B], because the threads are already shut down and posting tasks won't be useful anymore.

If the FILE thread is already shut down, [B] should have been destroyed already.  It doesn't make sense to have a live object on a dead thread.

The above two replies are the key to the issue.  The correct solution here is that before the file thread is shut down, [B] needs to be explicitly destroyed on the file thread.  [A] should hold a weak pointer to [B] that will get invalidated at this time, which will cancel any in-flight calls from [A] to [B].  (We know no calls are running on [B], since its shutdown is happening on the thread [B] lives on.)  Then the file thread can be shut down; then [A] can be deleted on the UI thread.  If the file thread is guaranteed to be deleted before the UI thread (I think so?) this also guarantees your destruction order.

The normal way to accomplish this is to explicitly destroy [B] during some cleanup stage that runs before the file thread is shut down.

PK

waf...@chromium.org

unread,
Mar 3, 2014, 7:40:37 PM3/3/14
to chromi...@chromium.org, waf...@chromium.org
Thanks for the replies.

Is it false that listening for NOTIFICATION_APP_TERMINATING is discouraged (as it increases coupling)? (Or, is there another way to detect an imminent shutdown?)

Maciej Pawlowski

unread,
Mar 4, 2014, 4:15:18 AM3/4/14
to chromi...@chromium.org
W dniu 2014-03-03 23:47, Peter Kasting pisze:
On Mon, Mar 3, 2014 at 2:31 PM, <waf...@chromium.org> wrote:
(1) An instance of Class [A] owns a scoped_ptr to an instance of Class [B]. Class [A] interacts on the UI thread and is only used on the UI thread. However, it touches the filesystem and so delegates some tasks to Class [B], which only performs actions on the FILE thread (or in the blocking pool, etc, not really important). So [A] gets a call to load a file, and posts a task to call a member function of [B] on the FILE thread, passing a WeakPtr in the callback for the posted task.

Ideally, classes are created, destroyed, and accessed all one a single thread.  (Optional: creation can happen on a different thread as long as it's guaranteed to complete before any further accesses occur.)  Given your description, this suggests that [A] should be UI-thread-scoped and [B] should be file-thread-scoped.

If this is the case, [A] cannot own [B] via scoped_ptr<>, since as you note, that ties [B]'s destruction to [A]'s thread, which violates the above ideal, and in this case (as in most violations) is unsafe.
But if [A] doesn't own [B], how does it call its methods?
PostTask requires a pointer. [A] cannot safely operate on some pointer to an object that is owned by FILE thread because it can become invalid at any moment.
-- 
BR
Maciej Pawlowski
Opera Desktop Wroclaw

William Chan (陈智昌)

unread,
Mar 4, 2014, 8:30:23 AM3/4/14
to Joshua Pawlicki, chromium-dev
It's discouraged. You shouldn't rely on this signal. Your object should be deleted when its owner tells it to die. The owner of your object is *not* the app termination signal, unless your object represents the application.

To demonstrate why this is the case, observe what happens if instead of destroying objects in accordance with the dependency tree (I call it a tree, although in reality it's probably unfortunately more of a graph with some ugly cycles, which is why we have lots of shutdown crashes), we instead attach each object to the app termination signal. In this manner, we essentially flatten the tree and treat all objects as siblings, thereby losing significant information about dependency edges amongst all the different objects. When the signal fires, the only possible sane destruction ordering are FIFO or LIFO, neither of which would match the true dependency tree orderings, thereby leading to destruction ordering bugs (use-after-frees).

Typically, the reason people consider using the app termination notification as their destruction signal is because they've given up trying to grok the relevant section of the application object dependency tree. It's because understanding large sections of the Chromium code base is indeed a hard problem. The incentives are unfortunately not well set up to encourage good behavior here. It's significantly quicker to do the easy hack of simply listening to notifications rather than understand and architect code correctly, and the costs are not immediately borne by the author of the hack:
  - builds up technical debt for maintaining correct object dependencies
  - subtle shutdown ordering issues lead to sporadic use-after-free crashes
    - these subtle UAFs are difficult to debug, wasting vast quantities of some other engineer's time
    - UAFs often lead to crashes in different objects, so some other engineer has to debug this

In short, please don't use notifications as your signal for object destruction. It's almost certainly the wrong decision and harmful to the overall project, even if you do not directly feel its pain. Bite the bullet and understand the dependencies.


--

William Chan (陈智昌)

unread,
Mar 4, 2014, 1:17:59 PM3/4/14
to Joshua Pawlicki, chromium-dev
I've got meta-comments on everyone else's comments :)

* Dana's suggested pattern uses WaitableEvent. The advantage of this is you don't have to yield back to the MessageLoop and post a continuation that has to wait in the MessageLoop queue again. The gains are code simplicity and potentially performance. The disadvantages are potential for jank (you really want to avoid blocking certain threads, like the UI thread) and deadlock (in Dana's example, the UI thread waits on the IO thread, which is indeed a pattern we do elsewhere like https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/renderer_host/render_widget_helper.cc&sq=package:chromium&q=RenderWidgetHelper::WaitForBackingStoreMsg&type=cs&l=139...but if someone does have a WaitableEvent on the IO thread waiting for signalling on the UI thread, then we have deadlock risk). The deadlock risk means that we should not generalize this mechanism across all threads, although it's perhaps safe for the specific UI=>IO case. As far as the performance vs jank consideration, there are definitely cases like the code I linked to earlier where, it's actually better overall for performance to tolerate some thread blocking in order to get the backing store IPC sooner. That said, in general, we prefer message passing to locking (http://www.chromium.org/developers/design-documents/threading#Overview) in order to reduce blocking on jank critical threads. I don't know enough about the Chromium compositor to say whether or not it makes sense in their case, but given the user experience critical nature of the compositor, I can well believe that maybe it makes sense for them. In your case though, you really don't want to block the UI thread for as long as a filesystem operation can take, because that's an excessively long period of time not to respond to UI events. Now, you're probably only doing this on shutdown, so that's a special case. That said, we also want shutdown to proceed as quickly as possible, so I would still advise against this approach here.

* Joao and Peter's suggested patterns are slight variants of each other. They're both fundamentally message passing patterns that follow our wiki's guidance on keeping most objects single threaded and passing destruction messages. This is the general solution you should use by default. Now, there are some minor points here.
  - One, there was mention of scoped_ptr. The advantage of using scoped_ptr is you document the ownership model via the scoped_ptr convention. The disadvantage is it destroys the object on the wrong thread, at least by default (you can manually avoid this in the destructor by releasing and doing DeleteSoon() yourself). A now possible solution is to use the [D]eleter template parameter to achieve the DeleteOnFooTaskRunner behavior. At the expense of more template kludge and perhaps non-obvious scoped_ptr behavior, you can have the ownership model advantage without the wrong destruction thread disadvantage.
  - Peter addressed your point about the FILE thread destruction. If your object lives on the FILE thread, it should be destroyed before the FILE thread. All the browser threads are owned by the browser process (perhaps indirectly, I'm not up to date anymore). A failure to destroy your object before the browser process destroys the FILE thread represents a failure of the code to match the dependency tree, since the FILE thread should be one of the last nodes destroyed when visiting all nodes in the dependency tree.
  - A primary difference in Joao and Peter's solutions is whether or not A needs to wait around for B to be destroyed. In Joao's pattern, in A::~A, A just fires the destruction message (DeleteTask) to the FILE thread, and revokes the WeakReferenceOwner, which effectively cancels any B=>A messages, since they all should be using a WeakPtr. The advantage of this approach is it does not delay the destruction of A. The disadvantage is, as Peter notes, the lack of assurance that B::~B runs before A::~A completes. This is a nice property to have, but is not critical. The revocation of the WeakReferenceOwner effectively means that, from A's perspective, B's destruction is complete. This is clearly a weaker model. The advantage of accepting this weaker model is shutdown can complete quickly. A good example of this weaker model in play is our treatment of DNS queries. Since getaddrinfo() can block for indeterminate periods of time, we only issue these to worker pool threads that we do not join on shutdown.

Cheers.

On Mon, Mar 3, 2014 at 2:31 PM, <waf...@chromium.org> wrote:

--

Peter Kasting

unread,
Mar 4, 2014, 3:08:07 PM3/4/14
to mpawlowski, Chromium-dev
On Tue, Mar 4, 2014 at 1:15 AM, Maciej Pawlowski <mpawl...@opera.com> wrote:
W dniu 2014-03-03 23:47, Peter Kasting pisze:
If this is the case, [A] cannot own [B] via scoped_ptr<>, since as you note, that ties [B]'s destruction to [A]'s thread, which violates the above ideal, and in this case (as in most violations) is unsafe.
But if [A] doesn't own [B], how does it call its methods?
PostTask requires a pointer. [A] cannot safely operate on some pointer to an object that is owned by FILE thread because it can become invalid at any moment. 

[A] can hold a weak pointer to [B]. I mentioned this at the end of my email.

PK

Carlos Pizano

unread,
Mar 4, 2014, 9:47:47 PM3/4/14
to chromi...@chromium.org, mpawlowski
Obligatory semi-controversial comment:

Chrome shutdown (and in general process shutdown) is a chaotic moment. End of the world by large meteorite if you will. Last thing you need to be doing is folding your clothes. So if you have a neat design don't go changing it for shutdown time. Just make sure the critical data is saved and let the objects be killed when the tread they live on is killed.

-cpu

Peter Kasting

unread,
Mar 4, 2014, 9:52:05 PM3/4/14
to Carlos Pizano, Chromium-dev, mpawlowski
On Tue, Mar 4, 2014 at 6:47 PM, Carlos Pizano <c...@chromium.org> wrote:
Chrome shutdown (and in general process shutdown) is a chaotic moment. End of the world by large meteorite if you will. Last thing you need to be doing is folding your clothes. So if you have a neat design don't go changing it for shutdown time. Just make sure the critical data is saved and let the objects be killed when the tread they live on is killed.

Unless this causes a crash during shutdown.  Or a deadlock.  Or some similarly bad consequence.

These sorts of consequences seem likely for most subsystems that use multiple classes communicating across threads, which live up through shutdown time, but which haven't been designed to handle shutdown.

So I'm not convinced this advice is very useful.  It seems likely to lead to trouble.

PK

Rachel Blum

unread,
Mar 4, 2014, 11:56:03 PM3/4/14
to Carlos Pizano, Chromium-dev, mpawlowski
Except then shutting down tests becomes a pain, and we end up with flaky tests, false positives from the memory tree, etc. Please do fold your laundry ;)



On Tue, Mar 4, 2014 at 6:47 PM, Carlos Pizano <c...@chromium.org> wrote:

John Abd-El-Malek

unread,
Mar 5, 2014, 2:40:46 AM3/5/14
to William Chan, Joshua Pawlicki, chromium-dev
On Tue, Mar 4, 2014 at 10:17 AM, William Chan (陈智昌) <will...@chromium.org> wrote:
I've got meta-comments on everyone else's comments :)

* Dana's suggested pattern uses WaitableEvent. The advantage of this is you don't have to yield back to the MessageLoop and post a continuation that has to wait in the MessageLoop queue again. The gains are code simplicity and potentially performance. The disadvantages are potential for jank (you really want to avoid blocking certain threads, like the UI thread) and deadlock (in Dana's example, the UI thread waits on the IO thread, which is indeed a pattern we do elsewhere like https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/renderer_host/render_widget_helper.cc&sq=package:chromium&q=RenderWidgetHelper::WaitForBackingStoreMsg&type=cs&l=139...but if someone does have a WaitableEvent on the IO thread waiting for signalling on the UI thread, then we have deadlock risk). The deadlock risk means that we should not generalize this mechanism across all threads, although it's perhaps safe for the specific UI=>IO case. As far as the performance vs jank consideration, there are definitely cases like the code I linked to earlier where, it's actually better overall for performance to tolerate some thread blocking in order to get the backing store IPC sooner. That said, in general, we prefer message passing to locking (http://www.chromium.org/developers/design-documents/threading#Overview) in order to reduce blocking on jank critical threads. I don't know enough about the Chromium compositor to say whether or not it makes sense in their case, but given the user experience critical nature of the compositor, I can well believe that maybe it makes sense for them. In your case though, you really don't want to block the UI thread for as long as a filesystem operation can take, because that's an excessively long period of time not to respond to UI events. Now, you're probably only doing this on shutdown, so that's a special case. That said, we also want shutdown to proceed as quickly as possible, so I would still advise against this approach here.

+1 for never allowing a thread to block on another. We have base/threading/thread_restrictions.h specifically to catch threads waiting on another, which is an anti-pattern.

The WaitForBackingStoreMsg is an exception, and it was replaced by the cc::CompletionEvent class. I didn't know that it's used to block shutdown, since the initial change that allowed it (https://chromiumcodereview.appspot.com/11085054/) sounded like it would only replace the WaitForBackingStoreMsg wait. In hindsight, we shouldn't have made cc::CompletionEvent have the ScopedAllowWait, but instead should have put ScopedAllowWait in each callsite, thereby auditing each place that it's used in.

It would be great if for shutdown, another solution is found for the compositor that doesn't involve the UI thread waiting on another.

Maciej Pawlowski

unread,
Mar 5, 2014, 4:01:51 AM3/5/14
to Chromium-dev
W dniu 2014-03-04 21:08, Peter Kasting pisze:
I believe there's still a design problem with passing weak_ptrs between threads. If [B] is not a child of [A] and is instead owned by *someone* in FILE thread, that someone needs to give a weak_ptr<B> to [A]. How? To PostTask for a "A::SetB" method, it has to have a weak_ptr<A>...

William Chan (陈智昌)

unread,
Mar 5, 2014, 6:30:36 AM3/5/14
to mpawl...@opera.com, Chromium-dev
You may need to provide a concrete example in order to discuss this more completely. Fundamentally though, if you do not have a well-defined relationship between A and B, then the object lifetimes are also not well understood, so you probably have a bug if A and B are trying to communicate. Well-defined relationship does not require direct ownership. It can be transitive ownership (i.e. ancestor/descendent relationship) or a sibling relationship (member variables of the same object, which has a well-defined construction/destruction ordering) or cousin or whatever. Now, if you have a well-defined relationship between A and B, then it is definitely possible to plumb the WeakPtr<A> via the relevant nodes in that relationship chain.


-- 
BR
Maciej Pawlowski
Opera Desktop Wroclaw

--

Maciej Pawlowski

unread,
Mar 5, 2014, 7:13:08 AM3/5/14
to chromi...@chromium.org
W dniu 2014-03-05 12:30, William Chan (陈智昌) pisze:
On Wed, Mar 5, 2014 at 1:01 AM, Maciej Pawlowski <mpawl...@opera.com> wrote:
W dniu 2014-03-04 21:08, Peter Kasting pisze:
On Tue, Mar 4, 2014 at 1:15 AM, Maciej Pawlowski <mpawl...@opera.com> wrote:
W dniu 2014-03-03 23:47, Peter Kasting pisze:
If this is the case, [A] cannot own [B] via scoped_ptr<>, since as you note, that ties [B]'s destruction to [A]'s thread, which violates the above ideal, and in this case (as in most violations) is unsafe.
But if [A] doesn't own [B], how does it call its methods?
PostTask requires a pointer. [A] cannot safely operate on some pointer to an object that is owned by FILE thread because it can become invalid at any moment. 

[A] can hold a weak pointer to [B]. I mentioned this at the end of my email.

PK
I believe there's still a design problem with passing weak_ptrs between threads. If [B] is not a child of [A] and is instead owned by *someone* in FILE thread, that someone needs to give a weak_ptr<B> to [A]. How? To PostTask for a "A::SetB" method, it has to have a weak_ptr<A>...

You may need to provide a concrete example in order to discuss this more completely. Fundamentally though, if you do not have a well-defined relationship between A and B, then the object lifetimes are also not well understood, so you probably have a bug if A and B are trying to communicate. Well-defined relationship does not require direct ownership. It can be transitive ownership (i.e. ancestor/descendent relationship) or a sibling relationship (member variables of the same object, which has a well-defined construction/destruction ordering) or cousin or whatever. Now, if you have a well-defined relationship between A and B, then it is definitely possible to plumb the WeakPtr<A> via the relevant nodes in that relationship chain.

Let's try this with the relationship hinted in the original post. Class A is a UI widget which wants to store some data on disk via class B which is its FILE delegate. Let's assume then, that [B] must write all data requested by [A] to disk before dying and that it should die more or less along with [A] (no need to keep it alive after the UI widget associated with it has died on UI thread). I intentionally don't specify whether [B] should outlive [A] or the other way around, so that there may be more ways to make this scenario work.

The most obvious solution would be to make A own B. This won't work - ownership semantics don't seem to be applicable with objects that live in different threads, [A] cannot have a scoped_ptr<B>.

Now if we want to make [B] live in FILE thread and be owned by *something* there, first we must figure out what that thing will be. Who stores the scoped_ptr<B> in FILE thread? Is there a different class hierarchy? A copy of the UI thread class hierarchy? A big fat FileThreadWorkers god class which owns all assorted utils and delegates that must live in FILE thread?

Then we must figure out how does that something pass a weak_ptr<B> to [A] in UI thread. Or perhaps [A] should somehow find the owner of [B] which lives in FILE thread and request a weak_ptr<B>? Can we make this transaction safe in absence of thread-safe weak_ptr and/or locks?

Third, how do we ensure that whatever owns [B] in FILE thread won't delete it before [A] schedules its last work? Remember that all changes the user makes in UI thread have to be written down to disk, we can't say "eh, that weak_ptr is dead now, let's just not run that callback".

Dana Jansens

unread,
Mar 5, 2014, 10:45:28 AM3/5/14
to John Abd-El-Malek, William Chan, Joshua Pawlicki, chromium-dev
On Wed, Mar 5, 2014 at 2:40 AM, John Abd-El-Malek <j...@chromium.org> wrote:



On Tue, Mar 4, 2014 at 10:17 AM, William Chan (陈智昌) <will...@chromium.org> wrote:
I've got meta-comments on everyone else's comments :)

* Dana's suggested pattern uses WaitableEvent. The advantage of this is you don't have to yield back to the MessageLoop and post a continuation that has to wait in the MessageLoop queue again. The gains are code simplicity and potentially performance. The disadvantages are potential for jank (you really want to avoid blocking certain threads, like the UI thread) and deadlock (in Dana's example, the UI thread waits on the IO thread, which is indeed a pattern we do elsewhere like https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/renderer_host/render_widget_helper.cc&sq=package:chromium&q=RenderWidgetHelper::WaitForBackingStoreMsg&type=cs&l=139...but if someone does have a WaitableEvent on the IO thread waiting for signalling on the UI thread, then we have deadlock risk). The deadlock risk means that we should not generalize this mechanism across all threads, although it's perhaps safe for the specific UI=>IO case. As far as the performance vs jank consideration, there are definitely cases like the code I linked to earlier where, it's actually better overall for performance to tolerate some thread blocking in order to get the backing store IPC sooner. That said, in general, we prefer message passing to locking (http://www.chromium.org/developers/design-documents/threading#Overview) in order to reduce blocking on jank critical threads. I don't know enough about the Chromium compositor to say whether or not it makes sense in their case, but given the user experience critical nature of the compositor, I can well believe that maybe it makes sense for them. In your case though, you really don't want to block the UI thread for as long as a filesystem operation can take, because that's an excessively long period of time not to respond to UI events. Now, you're probably only doing this on shutdown, so that's a special case. That said, we also want shutdown to proceed as quickly as possible, so I would still advise against this approach here.

+1 for never allowing a thread to block on another. We have base/threading/thread_restrictions.h specifically to catch threads waiting on another, which is an anti-pattern.

The WaitForBackingStoreMsg is an exception, and it was replaced by the cc::CompletionEvent class. I didn't know that it's used to block shutdown, since the initial change that allowed it (https://chromiumcodereview.appspot.com/11085054/) sounded like it would only replace the WaitForBackingStoreMsg wait. In hindsight, we shouldn't have made cc::CompletionEvent have the ScopedAllowWait, but instead should have put ScopedAllowWait in each callsite, thereby auditing each place that it's used in.

It would be great if for shutdown, another solution is found for the compositor that doesn't involve the UI thread waiting on another.

We are actively working to change this actually. We only use a compositor thread in the renderer process on most platforms. This will affect the ChromeOS browser, but we're working on removing the compositor thread entirely there.

William Chan (陈智昌)

unread,
Mar 5, 2014, 11:14:47 AM3/5/14
to mpawl...@opera.com, chromium-dev
On Wed, Mar 5, 2014 at 4:13 AM, Maciej Pawlowski <mpawl...@opera.com> wrote:
W dniu 2014-03-05 12:30, William Chan (陈智昌) pisze:
On Wed, Mar 5, 2014 at 1:01 AM, Maciej Pawlowski <mpawl...@opera.com> wrote:
W dniu 2014-03-04 21:08, Peter Kasting pisze:
On Tue, Mar 4, 2014 at 1:15 AM, Maciej Pawlowski <mpawl...@opera.com> wrote:
W dniu 2014-03-03 23:47, Peter Kasting pisze:
If this is the case, [A] cannot own [B] via scoped_ptr<>, since as you note, that ties [B]'s destruction to [A]'s thread, which violates the above ideal, and in this case (as in most violations) is unsafe.
But if [A] doesn't own [B], how does it call its methods?
PostTask requires a pointer. [A] cannot safely operate on some pointer to an object that is owned by FILE thread because it can become invalid at any moment. 

[A] can hold a weak pointer to [B]. I mentioned this at the end of my email.

PK
I believe there's still a design problem with passing weak_ptrs between threads. If [B] is not a child of [A] and is instead owned by *someone* in FILE thread, that someone needs to give a weak_ptr<B> to [A]. How? To PostTask for a "A::SetB" method, it has to have a weak_ptr<A>...

You may need to provide a concrete example in order to discuss this more completely. Fundamentally though, if you do not have a well-defined relationship between A and B, then the object lifetimes are also not well understood, so you probably have a bug if A and B are trying to communicate. Well-defined relationship does not require direct ownership. It can be transitive ownership (i.e. ancestor/descendent relationship) or a sibling relationship (member variables of the same object, which has a well-defined construction/destruction ordering) or cousin or whatever. Now, if you have a well-defined relationship between A and B, then it is definitely possible to plumb the WeakPtr<A> via the relevant nodes in that relationship chain.

Let's try this with the relationship hinted in the original post. Class A is a UI widget which wants to store some data on disk via class B which is its FILE delegate. Let's assume then, that [B] must write all data requested by [A] to disk before dying and that it should die more or less along with [A] (no need to keep it alive after the UI widget associated with it has died on UI thread). I intentionally don't specify whether [B] should outlive [A] or the other way around, so that there may be more ways to make this scenario work.

The most obvious solution would be to make A own B. This won't work - ownership semantics don't seem to be applicable with objects that live in different threads, [A] cannot have a scoped_ptr<B>.

A should own B. A should post a task to delete B on the FILE thread. SequencedTaskRunner::DeleteSoon() is the typical way of doing this. I also noted earlier that scoped_ptr supports a [D]eleter template parameter that could be used to do this type of deletion across threads. Does this work for you? Alternatively, you can also use PostTaskAndReply for your file operations if all B did was serve as a proxy object for file operations.
 

Now if we want to make [B] live in FILE thread and be owned by *something* there, first we must figure out what that thing will be. Who stores the scoped_ptr<B> in FILE thread? Is there a different class hierarchy? A copy of the UI thread class hierarchy? A big fat FileThreadWorkers god class which owns all assorted utils and delegates that must live in FILE thread?

Then we must figure out how does that something pass a weak_ptr<B> to [A] in UI thread. Or perhaps [A] should somehow find the owner of [B] which lives in FILE thread and request a weak_ptr<B>? Can we make this transaction safe in absence of thread-safe weak_ptr and/or locks?

Third, how do we ensure that whatever owns [B] in FILE thread won't delete it before [A] schedules its last work? Remember that all changes the user makes in UI thread have to be written down to disk, we can't say "eh, that weak_ptr is dead now, let's just not run that callback".


-- 
BR
Maciej Pawlowski
Opera Desktop Wroclaw

Chris Hopman

unread,
Mar 5, 2014, 12:16:51 PM3/5/14
to William Chan, mpawl...@opera.com, chromium-dev
So, I've found that the clearest (and imo best) approach was described, in part, by willchan. I have one question about it though.

Let me clarify the approach.

A lives on UI thread
B lives on FILE thread

---
A creates B on UI thread in A::A() and keeps a pointer to it
  if B needs to do work on the file thread for initialization it should use two-phase construction
  (B::B() is the only part of B that will run on the UI thread)
A posts-tasks to B
  callbacks back to A use a weak_ptr<A>
If B posts tasks to itself, it uses a weak_ptr<B>
A::~A() posts a task to delete B
---

One alternative is that B can post tasks back to itself without a weak_ptr<B>, and then A::~A() can just post a task telling B to delete itself when finished. I think that's not as good, but is still clear.

Doing it like this means that A is never in a state where B doesn't exist. This has worked for me very well for the case where B is basically doing work for A. I haven't had a case where B is actually making requests back to A (other than the callbacks from requests to B) so I don't know how that works, how I would approach it.

The real question that I have is, how should this handle shutdown? My understanding is that the FILE thread shuts down before the UI thread. However, we don't want B to be deleted before A (then A can be in a state where B is gone, so it should probably have a weak_ptr... but then it can't grab that weak_ptr in its constructor because that's the wrong thread... etc. etc.). So, maybe we want A to be deleted at a point where it can still post the delete task to the FILE thread for B. I don't know how to address this, so any tips would be appreciated.

William Chan (陈智昌)

unread,
Mar 5, 2014, 2:02:22 PM3/5/14
to Chris Hopman, mpawl...@opera.com, chromium-dev
On Wed, Mar 5, 2014 at 9:16 AM, Chris Hopman <cjho...@chromium.org> wrote:
So, I've found that the clearest (and imo best) approach was described, in part, by willchan. I have one question about it though.

Let me clarify the approach.

A lives on UI thread
B lives on FILE thread

---
A creates B on UI thread in A::A() and keeps a pointer to it
  if B needs to do work on the file thread for initialization it should use two-phase construction
  (B::B() is the only part of B that will run on the UI thread)
A posts-tasks to B
  callbacks back to A use a weak_ptr<A>
If B posts tasks to itself, it uses a weak_ptr<B>
A::~A() posts a task to delete B
---

One alternative is that B can post tasks back to itself without a weak_ptr<B>, and then A::~A() can just post a task telling B to delete itself when finished. I think that's not as good, but is still clear.

Doing it like this means that A is never in a state where B doesn't exist. This has worked for me very well for the case where B is basically doing work for A. I haven't had a case where B is actually making requests back to A (other than the callbacks from requests to B) so I don't know how that works, how I would approach it.

The real question that I have is, how should this handle shutdown? My understanding is that the FILE thread shuts down before the UI thread. However, we don't want B to be deleted before A (then A can be in a state where B is gone, so it should probably have a weak_ptr... but then it can't grab that weak_ptr in its constructor because that's the wrong thread... etc. etc.). So, maybe we want A to be deleted at a point where it can still post the delete task to the FILE thread for B. I don't know how to address this, so any tips would be appreciated.

Hm, I think maybe there's some confusion on how shutdown works? Let me try to explain.

On the UI thread, we stop Run()'ing the MessageLoop. After that happens, we eventually join all the BrowserThreads. Therefore, while it's true the FILE thread shuts down before the UI thread since the UI thread is the main thread, the UI thread's MessageLoop stops before any other threads' MessageLoop stops.

Chris Hopman

unread,
Mar 5, 2014, 4:18:30 PM3/5/14
to William Chan (陈智昌), mpawl...@opera.com, chromium-dev
On Wed, Mar 5, 2014 at 11:02 AM, William Chan (陈智昌) <will...@chromium.org> wrote:
On Wed, Mar 5, 2014 at 9:16 AM, Chris Hopman <cjho...@chromium.org> wrote:
So, I've found that the clearest (and imo best) approach was described, in part, by willchan. I have one question about it though.

Let me clarify the approach.

A lives on UI thread
B lives on FILE thread

---
A creates B on UI thread in A::A() and keeps a pointer to it
  if B needs to do work on the file thread for initialization it should use two-phase construction
  (B::B() is the only part of B that will run on the UI thread)
A posts-tasks to B
  callbacks back to A use a weak_ptr<A>
If B posts tasks to itself, it uses a weak_ptr<B>
A::~A() posts a task to delete B
---

One alternative is that B can post tasks back to itself without a weak_ptr<B>, and then A::~A() can just post a task telling B to delete itself when finished. I think that's not as good, but is still clear.

Doing it like this means that A is never in a state where B doesn't exist. This has worked for me very well for the case where B is basically doing work for A. I haven't had a case where B is actually making requests back to A (other than the callbacks from requests to B) so I don't know how that works, how I would approach it.

The real question that I have is, how should this handle shutdown? My understanding is that the FILE thread shuts down before the UI thread. However, we don't want B to be deleted before A (then A can be in a state where B is gone, so it should probably have a weak_ptr... but then it can't grab that weak_ptr in its constructor because that's the wrong thread... etc. etc.). So, maybe we want A to be deleted at a point where it can still post the delete task to the FILE thread for B. I don't know how to address this, so any tips would be appreciated.

Hm, I think maybe there's some confusion on how shutdown works? Let me try to explain.
Oh, there definitely is.
 
On the UI thread, we stop Run()'ing the MessageLoop. After that happens, we eventually join all the BrowserThreads. Therefore, while it's true the FILE thread shuts down before the UI thread since the UI thread is the main thread, the UI thread's MessageLoop stops before any other threads' MessageLoop stops.

Great. So then, if I understand this correctly, if A is destroyed properly by whoever owns it, then B will also be destroyed properly.

William Chan (陈智昌)

unread,
Mar 5, 2014, 6:47:58 PM3/5/14
to Chris Hopman, mpawl...@opera.com, chromium-dev
On Wed, Mar 5, 2014 at 1:18 PM, Chris Hopman <cjho...@chromium.org> wrote:
On Wed, Mar 5, 2014 at 11:02 AM, William Chan (陈智昌) <will...@chromium.org> wrote:
On Wed, Mar 5, 2014 at 9:16 AM, Chris Hopman <cjho...@chromium.org> wrote:
So, I've found that the clearest (and imo best) approach was described, in part, by willchan. I have one question about it though.

Let me clarify the approach.

A lives on UI thread
B lives on FILE thread

---
A creates B on UI thread in A::A() and keeps a pointer to it
  if B needs to do work on the file thread for initialization it should use two-phase construction
  (B::B() is the only part of B that will run on the UI thread)
A posts-tasks to B
  callbacks back to A use a weak_ptr<A>
If B posts tasks to itself, it uses a weak_ptr<B>
A::~A() posts a task to delete B
---

One alternative is that B can post tasks back to itself without a weak_ptr<B>, and then A::~A() can just post a task telling B to delete itself when finished. I think that's not as good, but is still clear.

Doing it like this means that A is never in a state where B doesn't exist. This has worked for me very well for the case where B is basically doing work for A. I haven't had a case where B is actually making requests back to A (other than the callbacks from requests to B) so I don't know how that works, how I would approach it.

The real question that I have is, how should this handle shutdown? My understanding is that the FILE thread shuts down before the UI thread. However, we don't want B to be deleted before A (then A can be in a state where B is gone, so it should probably have a weak_ptr... but then it can't grab that weak_ptr in its constructor because that's the wrong thread... etc. etc.). So, maybe we want A to be deleted at a point where it can still post the delete task to the FILE thread for B. I don't know how to address this, so any tips would be appreciated.

Hm, I think maybe there's some confusion on how shutdown works? Let me try to explain.
Oh, there definitely is.

:(
 
 
On the UI thread, we stop Run()'ing the MessageLoop. After that happens, we eventually join all the BrowserThreads. Therefore, while it's true the FILE thread shuts down before the UI thread since the UI thread is the main thread, the UI thread's MessageLoop stops before any other threads' MessageLoop stops.

Great. So then, if I understand this correctly, if A is destroyed properly by whoever owns it, then B will also be destroyed properly.

Correct.

Thiemo Nagel

unread,
Mar 12, 2014, 5:28:03 PM3/12/14
to pkas...@chromium.org, Chromium-dev
(Optional: creation can happen on a different thread as long as it's guaranteed to complete before any further accesses occur.)

This sounds dangerous to me. How do you guarantee time ordering? Do you imply using synchronization primitives?  -- Without them, the notion of "before" does not (reliably) exist across threads.

Best,
Thiemo

Chris Hopman

unread,
Mar 12, 2014, 5:52:29 PM3/12/14
to tna...@chromium.org, pkas...@chromium.org, Chromium-dev
In the approach I mentioned a couple posts above, B is created on the UI thread but destroyed and accessed on the FILE thread. The guaranteed ordering is caused by all further accesses being initiated from tasks posted from the UI thread (i.e. the access is in the task or a result of it).


--

Thiemo Nagel

unread,
Mar 12, 2014, 7:51:35 PM3/12/14
to Chris Hopman, pkas...@chromium.org, Chromium-dev
In your scenario the subsequent access is *initiated* in order after construction, but it is not guaranteed that the state of B has been transferred from the UI thread to the FILE thread before the access takes place there. The cache controller of the CPU on which the UI thread runs may be busy and thus slow to flush out the state of B to main memory in which case the CPU the FILE thread is running on loads the state of B from uninitialized memory.

Lei Zhang

unread,
Mar 12, 2014, 7:56:11 PM3/12/14
to tna...@chromium.org, Chris Hopman, pkas...@chromium.org, Chromium-dev
For the purposes of this discussion, we can safely assume the CPU has
a working cache coherency protocol.

Bernhard Bauer

unread,
Mar 12, 2014, 7:56:28 PM3/12/14
to tna...@chromium.org, Chris Hopman, pkas...@chromium.org, Chromium-dev
Posting tasks to other message loops will indeed use synchronisation primitives to ensure that (see base::MessageLoop).

Bernhard.


To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Thiemo Nagel

unread,
Mar 17, 2014, 5:05:47 PM3/17/14
to Bernhard Bauer, Chromium-dev, the...@chromium.org
On Thursday, March 13, 2014, Bernhard Bauer <bau...@google.com> wrote:
Posting tasks to other message loops will indeed use synchronisation primitives to ensure that (see base::MessageLoop).

Thanks a lot for the pointer! My understanding now is this: If initialization of an object happens on one thread which then calls PostTask() to a different thread which then accesses the object, I guess this is safe because PostTask() uses a synchronization primitive which implies a memory barrier. (Posix mandates memory barriers for mutexes and I assume that Windows has a similar requirement.)

More generally, as far as I understand now, this implies that it is safe to access the same object on multiple threads, as long as control of the object is passed along via PostTask(), either explicitly through binding a pointer to the object as parameter or implicitly by convention. (Though the latter probably doesn't help the long-term robustness of the code.)

For the sake of argument, I'm responding to Lei below:

> For the purposes of this discussion, we can safely assume the CPU has

> a working cache coherency protocol.

I can't say to which extent cache coherency mechanisms ensure that memory hierarchy is hidden from the individual threads. Yet, even if you're right, this doesn't invalidate my point, although I have to change the reasoning somewhat: In the presence of perfect cache coherency there still is the issue of memory ordering. For a class instantiation as in
    MyClass* g_object = new MyClass();
the order between initializing the memory of the MyClass instance and setting the g_object pointer is not defined. (The compiler has a free hand to determine which is the most efficient initialization order. On some architectures, including ARM, the CPU may further re-order memory accesses.) As a consequence, when accessed from a different thread and without using synchronization primitives, the other thread might see an uninitialized or partly initialized object.

William Chan (陈智昌)

unread,
Mar 17, 2014, 5:24:21 PM3/17/14
to tna...@chromium.org, Bernhard Bauer, Chromium-dev, the...@chromium.org
Reply all
Reply to author
Forward
0 new messages