Transferable ArrayBuffer support on MessagePort

54 views
Skip to first unread message

Hiroki Nakagawa

unread,
Aug 31, 2015, 12:51:32 AM8/31/15
to site-isol...@chromium.org, stora...@chromium.org, Kinuko Yasuda, jsb...@chromium.org, dch...@chromium.org, m...@chromium.org, dmu...@chromium.org
Hi everyone,

TL;DR: I'm writing design doc about transferable ArrayBuffer support on MessagePort. I'd appreciate it if you could give me feedback.


In the current implementation, MessagePort does not support sending an ArrayBuffer as a transferable object yet. The biggest problem to support this feature is that our MessagePort implementation involves cross-process operations. We need to choose a way to transfer buffers over processes as efficiently as possible. In the doc, I propose 4 options and discuss their pros and cons, and then discuss how to improve the efficiency of transfer further in certain cases.

In the short-term, I plan to adopt a simpler option to send buffers over Chrome IPC (option#2 in the doc). In the long-term, I'll attempt to transfer buffers using shared memory (option#3 in the doc) and to bypass process hops if sender and receiver ports are in the same process (e.g. DedicatedWorker, iframe) (Discussion#2).

Thanks,
Hiroki

Kinuko Yasuda

unread,
Sep 1, 2015, 4:44:39 AM9/1/15
to Hiroki Nakagawa, site-isol...@chromium.org, stora...@chromium.org, Joshua Bell, Daniel Cheng, m...@chromium.org, Daniel Murphy
Thanks Hiroki for sending this!

I feel that developer's general interests would be:
1) optimize the same-process case in general (discussion #2)
2) make it work in general, as it doesn't work at all (option #1 or #2)
3) optimize the cross-process case (option #3)

While necessary engineering cost would be (the latter is bigger):
option #1 < option #2 << option #3 <<< discussion #2

I'm not really sure if we want to invest on 3) (option #3) yet.  Do people in site-isolation-dev@ have some idea/interest about priorities on this?  If not I think starting with option #2 for now sounds like a reasonable plan.

Would we be able to know how much performance gain we could have by doing option #2 (or #3) over #1?  Say, would we be able to locally measure how much it takes for serializing and copying X size of buffer?  If doing some preliminary measurement is not that hard it'd be helpful to have some actual numbers.


Joshua Bell

unread,
Sep 1, 2015, 12:49:15 PM9/1/15
to Kinuko Yasuda, Hiroki Nakagawa, site-isol...@chromium.org, stora...@chromium.org, Daniel Cheng, m...@chromium.org, Daniel Murphy
Great write-up, Hiroki!

I would not want us to ship optimized same-process if the behavior observed by script differs from cross-process. That implies we should do option #1 first, so we have the correct semantics and can get tests running. (Unless I'm missing something, that basically just means we neuter any buffers in the transferrables array, and otherwise leave the pipeline as-is.) Everything after that can be data-driven optimization.

Otherwise, I agree with Kinuko: the same-process case seems to be where we'll get the most developer benefit. The use case I hear about most often is games and other perf-critical single-page/multi-worker apps, but additional data on what scenarios are important would be welcome.

Possible follow-up questions: How does FF optimize these cases? Would these proposed approaches play well with the WebAsm thinking?

As an aside, we may want to work with devrel to document any implicit limits (e.g. 128MB IPCs). It comes up for IDB every so often.

Daniel Cheng

unread,
Sep 3, 2015, 5:23:49 AM9/3/15
to Joshua Bell, Kinuko Yasuda, Hiroki Nakagawa, site-isol...@chromium.org, stora...@chromium.org, m...@chromium.org, Daniel Murphy
Pardon my ignorance: which of these proposed alternatives will have observable differences in behavior (other than speed of transfer) between same-process vs cross-process?

In general, I agree with the sentiment of focusing on optimizing the same-process case and just getting the cross-process case to a working state. It would probably be useful to add UMA metrics to see how often we do need to serialize these cross-process (and how long it takes?).

Daniel

Hiroki Nakagawa

unread,
Sep 3, 2015, 11:46:17 AM9/3/15
to Daniel Cheng, Joshua Bell, Kinuko Yasuda, site-isol...@chromium.org, stora...@chromium.org, m...@chromium.org, Daniel Murphy
Thank you for your comments!

According to feedback, I will...

  1. implement option #1 as a baseline and add UMAs to record a buffer size etc for further optimizations.
  2. measure costs of memory copy and serialization, and implement option #2 when it turns out to be worth doing.
  3. work on the same process case (discussion #2)


2015-09-03 18:23 GMT+09:00 Daniel Cheng <dch...@chromium.org>:
Pardon my ignorance: which of these proposed alternatives will have observable differences in behavior (other than speed of transfer) between same-process vs cross-process?

I think there are no differences other than speed of transfer.
 
In general, I agree with the sentiment of focusing on optimizing the same-process case and just getting the cross-process case to a working state. It would probably be useful to add UMA metrics to see how often we do need to serialize these cross-process (and how long it takes?).

Sounds good. I'll add UMA metrics to record buffer size, time to serialize and transfer type (same-process vs cross-process).
 
Daniel

On Tue, Sep 1, 2015 at 9:49 AM Joshua Bell <jsb...@chromium.org> wrote:
Great write-up, Hiroki!

I would not want us to ship optimized same-process if the behavior observed by script differs from cross-process.

I think there are no differences other than speed of transfer.
 
That implies we should do option #1 first, so we have the correct semantics and can get tests running. (Unless I'm missing something, that basically just means we neuter any buffers in the transferrables array, and otherwise leave the pipeline as-is.)

That's right. I wrote a rough patch to do this.
 
Everything after that can be data-driven optimization.

Otherwise, I agree with Kinuko: the same-process case seems to be where we'll get the most developer benefit. The use case I hear about most often is games and other perf-critical single-page/multi-worker apps, but additional data on what scenarios are important would be welcome.

Possible follow-up questions: How does FF optimize these cases? Would these proposed approaches play well with the WebAsm thinking?

Thank you. I'll add use cases and these questions to the doc.

Regarding FF, currently FF adopts single-process architecture and the cross-process case wouldn't be a matter for now. However, FF seems to be switching to multi-process architecture, so this will be a problem.

Regarding WebASM, I'm not sure about their plans, but I guess our approaches should work with it unless we don't change behavior observed from scripts. Anyway, I'll look through their issues.
 
As an aside, we may want to work with devrel to document any implicit limits (e.g. 128MB IPCs). It comes up for IDB every so often.
 
Agree. I tried to send a large buffer (> 128MB) before and a renderer silently crashed...


On Tue, Sep 1, 2015 at 1:44 AM, Kinuko Yasuda <kin...@chromium.org> wrote:
Thanks Hiroki for sending this!

I feel that developer's general interests would be:
1) optimize the same-process case in general (discussion #2)
2) make it work in general, as it doesn't work at all (option #1 or #2)
3) optimize the cross-process case (option #3)

While necessary engineering cost would be (the latter is bigger):
option #1 < option #2 << option #3 <<< discussion #2

I have the same feeling as you.
 
I'm not really sure if we want to invest on 3) (option #3) yet.  Do people in site-isolation-dev@ have some idea/interest about priorities on this?  If not I think starting with option #2 for now sounds like a reasonable plan.

Would we be able to know how much performance gain we could have by doing option #2 (or #3) over #1?  Say, would we be able to locally measure how much it takes for serializing and copying X size of buffer?  If doing some preliminary measurement is not that hard it'd be helpful to have some actual numbers.

Sounds reasonable. I'll conduct a performance evaluation and update the doc.
Reply all
Reply to author
Forward
0 new messages