What to do when posting a BLOCK_SHUTDOWN task, and the reply task *must* also run?

Luc Nguyen

unread,

Aug 17, 2023, 1:03:44 PM8/17/23

to schedu...@chromium.org

From the main/UI thread, very roughly, I do something along the lines of:

```

base::ThreadPool::PostTaskAndReplyWithResult(

FROM_HERE,

{BLOCK_SHUTDOWN},

&ProcessFoo,

&StoreProcessedFoo

);

```

The problem is that while `ProcessFoo()` is running in the background, the user might initiate browser shutdown. This results in the reply task `StoreProcessedFoo()` never being run (the BLOCK_SHUTDOWN trait only applies to the task posted to the ThreadPool, and IIUC, even if I somehow managed to post `StoreProcessedFoo()` back on the main thread with the BLOCK_SHUTDOWN trait, it would be ignored/"fizzled"). So the processed foo is lost.

Is there anything I can do to make sure `StoreProcessedFoo()` is run to prevent data loss?

---------

More context:

UMA histograms are backed by a file. In the `ProcessFoo()` above, what we actually do is remove these histograms from the file, and put them into a UMA proto. We then do some more processing like compressing.

Then, in `StoreProcessedFoo()`, we store this compressed UMA proto into Local State.

It's a problem if `StoreProcessedFoo()` is never run, because the histograms have already been removed from the file on disk, and the compressed UMA proto is not stored, resulting in data loss.

It's unfortunately not possible to only remove the histograms from disk after the compressed UMA proto is stored. I won't go into too much detail, but this could cause histograms being duplicated instead of lost.

My tentative solution to this (crrev.com/c/4780995) is to add some work in `PostDestroyThreads()` (IIUC, at this stage, all BLOCK_SHUTDOWN tasks have been finished, and all threads have been joined, so this is single-threaded). I check if there was some compressed UMA proto that was produced (but not yet stored), and if so, manually store it in Local State.

It seems to work fine, but there are some caveats. For example:

- 1) I need to ensure I do this work in all the various platform-specific PostDestroyThreads(),

- 2) On some platforms (Chromecast), it seems like Local State is already destroyed before PostDestroyThreads(). I suppose I could delay its destruction for my purposes but I don't know if that entails anything else.

Is there any better/canonical way to do this?

Build for everyone - Google

Luc Nguyen

Software Developer

careers.google.com

Learn more about our candidate privacy policy.

Luc Nguyen

unread,

Aug 21, 2023, 11:37:38 AM8/21/23

to Joe Mason, schedu...@chromium.org

Thanks for the heads up! :-)

Build for everyone - Google

Luc Nguyen

Software Developer

careers.google.com

Learn more about our candidate privacy policy.

On Mon, Aug 21, 2023 at 11:35 AM Joe Mason <joenot...@google.com> wrote:

I think most of the experts are on a plane today, heading for a conference this week. You might want to ping the thread again next week if you don't get a response. (Sorry!)

--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAORL5cPukHnX8NTpkPGjKf8LbCHnSN-PDdPYisNATto58U41vw%40mail.gmail.com.

François Doray

unread,

Aug 25, 2023, 8:10:35 AM8/25/23

to Luc Nguyen, Gabriel Charette, schedu...@chromium.org

Even if you successfully store the data in local state, you would also need to check that JsonPrefStore backing the local state deterministically commits those changes to disk (requires the ThreadPool). You might want to use components/keep_alive_registry/scoped_keep_alive.h to keep the browser alive until everything is complete (the session data deleter those that, see KeepAliveOrigin::SESSION_DATA_DELETER).

I recommend waiting for gab@'s point of view before implementing any solution.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAORL5cO9rzJdYScq5Zjb0ZXQoVPgZ3zxZWHVD3%3DZYVuChaNqqg%40mail.gmail.com.

Luc Nguyen

unread,

Aug 25, 2023, 10:37:38 AM8/25/23

to François Doray, Gabriel Charette, schedu...@chromium.org

TIL about ScopedKeepAlive...! Thanks for that. Will be waiting for gab@'s follow up.

Build for everyone - Google

Luc Nguyen

Software Developer

careers.google.com

Learn more about our candidate privacy policy.

Gabriel Charette

unread,

Aug 25, 2023, 2:02:35 PM8/25/23

to Luc Nguyen, schedu...@chromium.org

This is tricky indeed.

(typing this response offline on mobile, apologies for any inaccuracies)

PostDestroyThreads is a good timing, another one would be in the destructor of a scoped object bound to the reply, but both have the caveat that local_state() might no longer be around (BrowserMainParts, content-layer, doesn't have it in its ordering contract as it's a chrome-layer concept)

We could solve this by explicitly defining in ChromeBrowserMainParts when local_state() should go away (or at least until which phase it's guaranteed to exist) and fixing platforms that are misaligned.

Re. Having to duplicate the code for each platform: There is a ChromeBrowserMainParts::PostDestroyThreads() for generic code that applies to each, no? And if not, we should be able to add the unused override? ChromeBrowserExtraPartsMetrics::PostDestroyThreads is likely an even better override.

That seems like the best way to me to ensure that this is happening and well ordered with the rest of shutdown (profile and local state destruction).

--

Gabriel Charette

unread,

Aug 25, 2023, 2:29:03 PM8/25/23

to Luc Nguyen, scheduler-dev

local_state()'s final flush is synchronous, if it's alive, the write will make it before end of shutdown

Luc Nguyen

unread,

Aug 25, 2023, 2:58:51 PM8/25/23

to Gabriel Charette, scheduler-dev

Thanks gab@.

Re: PostDestroyThreads() -- there is indeed `ChromeBrowserMainParts::PostDestroyThreads()` in `//chrome`, but there are also ones specific for iOS and one for Chromecast.

I'm curious about your opinion on `ScopedKeepAlive` that fdoray@ pointed out -- would that also work? Any caveats I'm missing? It seems to me that this would simplify the implementation quite a bit; just create a `ScopedKeepAlive` before posting `ProcessFoo()` to a background thread, and then destroying it in `StoreProcessedFoo()`.

Build for everyone - Google

Luc Nguyen

Software Developer

careers.google.com

Learn more about our candidate privacy policy.

Gabriel Charette

unread,

Aug 25, 2023, 3:38:28 PM8/25/23

to Luc Nguyen, scheduler-dev

Hmm, that's interesting, I've never seen a ScopedKeepAlive used this way but I think it'd work. Either is fine IMO.

Luc Nguyen

unread,

Nov 6, 2023, 1:23:43 PM11/6/23

to Gabriel Charette, scheduler-dev

Wanted to follow up on this.

This mostly works. Except I am getting crash reports when trying to create the "ScopedKeepAlive" object, with a check failing: "CHECK(!is_shutting_down_);". Bug link: crbug.com/1490916.

Which is odd, because I am creating the ScopedKeepAlive object in a task that is posted to the main thread. I was assuming that if browser shutdown was already initiated, then the task simply wouldn't run.

To be clear, the code looks a bit like:

```

void CreateFooEvery30Minutes() {

CreateScopedKeepAlive(); // Sometimes, this is creating crash reports due to a CHECK.

base::ThreadPool::PostTaskAndReplyWithResult(

FROM_HERE,

{BLOCK_SHUTDOWN},

&ProcessFoo,

&StoreProcessedFooAndDestroyScopedKeepAlive

);

}

```

Where `CreateFooEvery30Minutes()` is posted to the main task runner every 30 minutes using some timer.

Looking deeper into it, it seems like `is_shutting_down_` isn't exactly when browser shutdown has been initiated, but rather when Unpin() has been called, which means that browser shutdown is about to be initiated (but hasn't yet), and tasks on the main task runner are still being executed.

So it seems like even if a task is posted to the main thread and is running, there's no guarantee that we're not already "shutting down" (at least, according to KeepAliveRegistry).

(Disclaimer: I am not very familiar with this code so I could be wrong. Feel free to correct me :-).)

Is there some standard solution here?

For now, what I'm thinking is that in `CreateFooEvery30Minutes()`, if `is_shutting_down_` is true, then I call `ProcessFoo` and `StoreProcessedFoo` synchronously instead of thread hopping and doing the work on a background thread.

I think that would work, assuming that ScopedKeepAlive/KeepAliveRegistry are only interacted with on the main thread so that there's no race condition. (I don't see any DCHECKs/CHECKs or comments stating this though).

WDYT?

Thanks in advance!

Build for everyone - Google

Luc Nguyen

Software Developer

careers.google.com

Learn more about our candidate privacy policy.

Luc Nguyen

unread,

Nov 13, 2023, 11:08:45 AM11/13/23

to Gabriel Charette, scheduler-dev

Friendly bump on this :-)

Build for everyone - Google

Luc Nguyen

Software Developer

careers.google.com

Learn more about our candidate privacy policy.

Gabriel Charette

unread,

Nov 22, 2023, 5:18:05 PM11/22/23

to Luc Nguyen, scheduler-dev

Doing it synchronously if is_shutting_down sounds reasonable. There isn't a "standard solution" but this is special enough to have a custom solution IMO.

On Mon, Nov 6, 2023 at 1:23 PM Luc Nguyen <lucn...@google.com> wrote:

Luc Nguyen

unread,

Nov 22, 2023, 5:27:36 PM11/22/23

to Gabriel Charette, scheduler-dev

Awesome! Thanks a bunch!

Build for everyone - Google

Luc Nguyen

Software Developer

careers.google.com

Learn more about our candidate privacy policy.

Reply all

Reply to author

Forward