Using mojo with TaskRunner in disk cache

57 views
Skip to first unread message

Yutaka Hirano

unread,
Feb 10, 2022, 6:11:28 AM2/10/22
to scheduler-dev, morl...@chromium.org, rse...@chromium.org, Yoichi Osato, Daisuke Enomoto
Hi scheduler-dev@,

I'm working to sandbox the HTTP cache in the network service. The basic idea is to broker certain file operations with mojo (here is the design doc).

Recently I found a problem: mojo needs SequencedTaskRunner where part of the disk cache runs on a TaskRunner created by base::ThreadPool::CreateTaskRunner.

Each task posted to the task runner is independent, so I'm wondering if it's reasonable to create a SequencedTaskRunner before posting each task, instead of posting a task to the TaskRunner. Does this have performance implications? Has anyone experienced a similar problem?

Thanks,

Kentaro Hara

unread,
Feb 10, 2022, 6:42:55 AM2/10/22
to Yutaka Hirano, scheduler-dev, morl...@chromium.org, rse...@chromium.org, Yoichi Osato, Daisuke Enomoto
Each task posted to the task runner is independent, so I'm wondering if it's reasonable to create a SequencedTaskRunner before posting each task, instead of posting a task to the TaskRunner. Does this have performance implications? Has anyone experienced a similar problem?

Assuming each file operation is heavy, I think it makes sense to post tasks to separate SequencedTaskRunners. Then the heavy file operations can run concurrently (i.e., without being constrained by the FIFO order guaranteed by each SequencedTaskRunner).

Maybe can we just use base::ThreadPool::PostTask() (which internally posts a task as a one-off sequence)?


--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CABihn6G4Q9_iROaqJbGsGdnBdW_%2BxKJf2PG2V9L%2B%3D3%3D-sXt6jw%40mail.gmail.com.


--
Kentaro Hara, Tokyo

Kouhei Ueno

unread,
Feb 10, 2022, 7:54:29 AM2/10/22
to Kentaro Hara, Yutaka Hirano, scheduler-dev, morl...@chromium.org, rse...@chromium.org, Yoichi Osato, Daisuke Enomoto

Etienne Pierre-doray

unread,
Feb 10, 2022, 10:03:24 AM2/10/22
to Kouhei Ueno, Kentaro Hara, Yutaka Hirano, scheduler-dev, morl...@chromium.org, rse...@chromium.org, Yoichi Osato, Daisuke Enomoto
Each task posted to the task runner is independent, so I'm wondering if it's reasonable to create a SequencedTaskRunner before posting each task, instead of posting a task to the TaskRunner. Does this have performance implications?
Semantically, it sounds like this doesn't need to be a SequencedTaskRunner (unless there are other related use cases that do need SequencedTaskRunner), and it could be ThreadPool::PostTask().
Otherwise, it's fine to use SequencedTaskRunner as well as it's cheap enough.

 Maybe can we just use base::ThreadPool::PostTask() (which internally posts a task as a one-off sequence)?
That's true, the current implementation of ThreadPool::PostTask is very similar to creating SequencedTaskRunner + SequencedTaskRunner::PostTask. It's nice to use ThreadPool::PostTask because it gives more potential for optimization, but that's very hypothetical.

François Doray

unread,
Feb 10, 2022, 10:37:58 AM2/10/22
to Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, Yutaka Hirano, scheduler-dev, morl...@chromium.org, rse...@chromium.org, Yoichi Osato, Daisuke Enomoto
ThreadPool::PostTask posts a one-off task which doesn't have any concurrency or ordering constraints with other tasks.

A Mojo remote can cause various tasks to be scheduled (e.g. disconnect handler, reply callback). It is generally not desirable to execute these tasks concurrently with tasks that bind or use the Mojo remote. By requiring the Mojo remote to be bound and used from a single sequence, it becomes obvious that all callbacks scheduled by the Mojo remote will also run on that sequence.

The cost of base::ThreadPool::CreateSequencedTaskRunner(...)->PostTask(...) is roughly the same as base::ThreadPool::PostTask(), so you can simply use the former to post tasks that will bind/use a mojo::Remote. You don't even have to keep a reference to the SequencedTaskRunner returned by CreateSequencedTaskRunner.

Yutaka Hirano

unread,
Feb 10, 2022, 11:45:03 AM2/10/22
to François Doray, Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, scheduler-dev, morl...@chromium.org, rse...@chromium.org, Yoichi Osato, Daisuke Enomoto
Thanks, let me share a simplified example.

Here is a hypothetical class that is used to read a file.

class CacheFile {
public:
  ...
  void Read(const base::FilePath& path, Callback callback) {
    task_runner_->PostTaskAndReply(base::BindOnce(ReadInternal, path), std::move(callback));
  }
private:
  // Created by base::ThreadPool::CreateTaskRunner.
  scoped_refptr<TaskRunner> task_runner_;
};

std::string ReadInternal(const base::FilePath& path) {
  base::File file(path);
  std::string result;
  // Read the file contents.
  ...
  return result;
}

I want to broker the file open operation in ReadInternal with mojo. I'm going to define a mojo interface:

interface FileOpener {
  [Sync] Open(FilePath path) => (File file);
};

and give a pending_remote<FileOpener> to ReadInternal.

class CacheFile {
public:
  ...
  void Read(const base::FilePath& path, pending_remote<FileOpener> opener, Callback callback) {
    task_runner_->PostTaskAndReply(base::BindOnce(ReadInternal, path, std::move(opener)), std::move(callback));
  }
private:
  // Created by base::ThreadPool::CreateTaskRunner, and this instance is shared by many operations/instances.
  scoped_refptr<TaskRunner> task_runner_;
};

std::string ReadInternal(const base::FilePath& path, pending_remote<FileOpener> pending_opener) {
  base::File file;
  mojo::Remote<FileOpener> opener(std::move(pending_opener));
  opener->Open(path, &file);

  std::string result;
  // Read the file contents.
  ...
  return result;
}

Now mojo::Remote<FileOpener> requires a SequencedTaskRunner, and I'm looking for a way to run ReadInternal on a SequencedTaskRunner.
Your responses sound like I can create a SequenceTaskRunner for each Read operation without much performance loss, right?

  void Read(const base::FilePath& path, pending_remote<FileOpener> opener, Callback callback) {
    auto task_runner = base::ThreadPool::CreateSequencedTaskRunner(...);
    task_runner->PostTaskAndReply(base::BindOnce(ReadInternal, path, std::move(opener)), std::move(callback));
  }


François Doray

unread,
Feb 10, 2022, 12:12:31 PM2/10/22
to Yutaka Hirano, Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, scheduler-dev, morl...@chromium.org, rse...@chromium.org, Yoichi Osato, Daisuke Enomoto
Correct, the code snippet at the end of your last message has roughly the same cost as a direct call to base::ThreadPool::PostTask (there is an extra memory allocation but this is not something you should worry about compared to the cost of scheduling a task).

Joe Mason

unread,
Feb 10, 2022, 12:45:34 PM2/10/22
to Yutaka Hirano, François Doray, Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, scheduler-dev, morl...@chromium.org, Robert Sesek, Yoichi Osato, Daisuke Enomoto
A side issue: the use of [Sync] in the mojo interface worries me. That's a common source of performance problems. Since Read is already an async function, is there any reason not to rewrite this as something like:

  void Read(const base::FilePath& path, pending_remote<FileOpener> opener, Callback callback) {
    // Runs `callback` on the current sequence.
    auto reply_callback = base::BindPostTask(base::SequencedTaskRunnerHandle::Get(), std::move(callback));
    task_runner->PostTask(base::BindOnce(ReadInternal, path, std::move(opener), std::move(reply_callback));
  }

  void ReadInternal(const base::FilePath& path, pending_remote<FileOpener> pending_opener, Callback reply_callback) {
    auto read_file_callback = base::BindOnce([](Callback reply_callback, base::File file) {
      std::string result;
      // Read the file contents
      std::move(reply_callback).Run(result);
    }, std::move(reply_callback));
    // When the async mojo call completes, `read_file_callback` will be called with `file`.
    opener->Open(path, std::move(read_file_callback));
  }

Another way to structure this would be to change ReadInternal to take a base::File object instead of a FilePath, and have an OpenFileWithMojo function:

  void OpenFileWithMojo(const base::FilePath& path, pending_remote<FileOpener> opener, base::OnceCallback<void(base::File)> callback) {
    base::File file;
    mojo::Remote<FileOpener> opener(std::move(opener));
    // Mojo runs `callback` on the current task runner, which is the one created in Read.
    opener->Open(path, std::move(callback));
  }
  
  void Read(const base::FilePath& path, pending_remote<FileOpener> opener, Callback callback) {
    auto reply_callback = base::BindPostTask(base::SequencedTaskRunnerHandle::Get(), std::move(callback));
    auto task_runner = base::ThreadPool::CreateSequencedTaskRunner(...);
    task_runner->PostTask(base::BindOnce(OpenFileWithMojo, std::move(opener),
        base::BindOnce(ReadInternal).Then(std::move(reply_callback))));
  }

  std::string ReadInternal(base::File file) { std::string result; /* read contents */ return result; }
    



Yutaka Hirano

unread,
Feb 14, 2022, 5:55:49 AM2/14/22
to Joe Mason, François Doray, Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, scheduler-dev, morl...@chromium.org, Robert Sesek, Yoichi Osato, Daisuke Enomoto
Thank you.

Regarding the blocking code, how important is it, given we already have blocking function calls (e.g., base::File::File, base::File::Read in ReadInternal()), and some will remain with this effort (e.g., base::File::Read in ReadInternal())?

Joe Mason

unread,
Feb 14, 2022, 12:23:52 PM2/14/22
to Yutaka Hirano, François Doray, Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, scheduler-dev, morl...@chromium.org, Robert Sesek, Yoichi Osato, Daisuke Enomoto
I'm not sure if there are any performance issues for mojo [Sync], or technical reasons to avoid them in this specific design. But in general it's better to avoid Sync mojo calls unless absolutely necessary because:

- It's easier to reason about. Like I said, I'm not sure if there are any issues with [Sync], even though I'm fairly knowledgeable about mojo, because I've never had to look into how it's implemented. Mojo's properties can be confusing enough without adding [Sync].
- Keeping sync functions as local as possible makes the code easier to refactor later. For example this design binds a new mojo remote to open each file - if it turns out that has performance problems, it wouldn't be hard to switch to a single persistent FileOpener interface, and call open(filename) on it multiple times. But if you do that you'd also need to refactor to remove [Sync] from the interface. It's easier to design it without Sync in the first place.

In this case I don't think my suggestion to avoid Sync leads to more complicated code (in fact I find the last version, with ReadInternal taking a file handle, to be the clearest) so I'd recommend against Sync unless you have an argument in favour of it that I've missed.

Daniel Cheng

unread,
Feb 14, 2022, 1:29:07 PM2/14/22
to Joe Mason, Yutaka Hirano, François Doray, Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, scheduler-dev, morl...@chromium.org, Robert Sesek, Yoichi Osato, Daisuke Enomoto
There are several reasons to avoid [Sync] if possible.

1. It can cause jank because the sequence waiting on the sync call can't do anything else. This can be mitigated by not making sync calls on important threads (e.g. the main thread in a renderer process), but in the end, it still costs a sequence to wait.

2. It can cause reordering of IPCs due to the sync IPC processing model (https://chromium.googlesource.com/chromium/src/+/HEAD/mojo/public/cpp/bindings/README.md#re_entrancy).


Hope that provides some helpful context.

Daniel

Joe Mason

unread,
Feb 14, 2022, 3:44:25 PM2/14/22
to Daniel Cheng, Yutaka Hirano, François Doray, Etienne Pierre-doray, Kouhei Ueno, Kentaro Hara, scheduler-dev, morl...@chromium.org, Robert Sesek, Yoichi Osato, Daisuke Enomoto
Thanks, the reordering of IPC's is the non-obvious Mojo implementation detail I was vaguely remembering.

I THINK that none of those apply to this specific patch, because it's making a separate SequencedTaskRunner & mojo connection for each call, so there are no other tasks on the sequence to block and no other IPC's on the mojo connection to re-order. But I'd need to think about it a lot to convince myself that's correct, and it's easier to not use [Sync] than to go through that work... (And, it would be really easy to change one of those assumptions during refactoring without realizing that it makes the mojo [Sync] dangerous!)
Reply all
Reply to author
Forward
0 new messages