Canvas.toBlob: perf experiments and conclusion on two different implementations

Olivia Lai

unread,

Dec 3, 2015, 12:54:54 PM12/3/15

to pain...@chromium.org, no...@chromium.org, ju...@chromium.org, schedu...@chromium.org

Hi all,

This is a discussion thread about the internal implementation of canvas.toBlob. Basically, we've implemented two different versions of image encoding within the toBlob function:

1) threaded implementation: opens up a separate thread that accomplishes the asynchronous image encoding task continuously and the thread is shared among all canvases

2) idle-periods implementation: performs the image encoding task incrementally and fit in the task into the idle periods on the main thread

I had repeatedly tried experiments of both idle-periods implementation and threaded implementation for png image encoding on a variety of perf trybots (data available here: https://docs.google.com/a/google.com/spreadsheets/d/18LKHE3pV262hd4DCEw4k4MtMowwyaUXCxhkOCOXt1_0/edit?usp=sharing). As a summary, we measured both the toBlob duration, as well as the impact of toBlob on the smoothness of animation (i.e. whether it causes jankiness).

Experimentally, we find that:

1) The toBlob function running on threaded version is significantly faster than that running on idle-periods version; the difference can go up to hundreds of milliseconds for canvas with 4k*4k size.

2) There is no consistent, obvious evidence that the threaded version has more negative impacts on the smoothness of animation frames as compared to that of the idle-periods version. Even when aggressively invalidation happens on each animation cycle, the mean frame time on both implementation versions are still very close, with a less than 0.1 difference. Even on low-end devices, the difference still cannot be observed.

Also, theoretically, tasks arranged on idle periods may run a risk of getting indefinitely delayed, especially when the main thread is super busy. But threaded implementation does not have this risk.

Based on these, we conclude that the threaded implementation have two advantages over the idle-periods implementation: a)faster and b)no risk of getting indefinitely delayed; at the same time, the only advantage that we are hoping to look for in the idle-periods implementation--having less negative impact on smoothness--is not observable from the experimental data. Therefore, we decide to ship canvas.toBlob in its threaded implementation and delete the codes about idle-periods implementation.

Regards,

Olivia

Jeremy Roman

unread,

Dec 3, 2015, 1:05:23 PM12/3/15

to Olivia Lai, paint-dev, Noel Gordon, Justin Novosad, schedu...@chromium.org

Is there a plan to move to a thread pool or similar approach long term? I think Chromium uses thread pools for similar cases, and I'm not sure we'll want to have an unterminating thread for each off-thread feature we add in the future.

--
You received this message because you are subscribed to the Google Groups "paint-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to paint-dev+...@chromium.org.
To post to this group, send email to pain...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/paint-dev/CADojJpDdvezopYe5ZdsA-GtJzf2%3DKh2AGyh8ngN5wqKBMQbxfg%40mail.gmail.com.

Justin Novosad

unread,

Dec 3, 2015, 1:09:55 PM12/3/15

to Jeremy Roman, Olivia Lai, paint-dev, Noel Gordon, scheduler-dev

On Thu, Dec 3, 2015 at 1:05 PM, Jeremy Roman <jbr...@chromium.org> wrote:

Is there a plan to move to a thread pool or similar approach long term? I think Chromium uses thread pools for similar cases, and I'm not sure we'll want to have an unterminating thread for each off-thread feature we add in the future.

Absolutely. As soon as a shared thread pool becomes available, we should migrate to it.

scheduler team: is this in the plans?

Elliott Sprehn

unread,

Dec 3, 2015, 2:18:58 PM12/3/15

to Justin Novosad, Olivia Lai, schedu...@chromium.org, Noel Gordon, paint-dev, Jeremy Roman

Using a thread means you're contending with all the other threads, in a real app that includes the parser thread, v8 compiler, GC threads, workers, service workers, compositor, raster threads, and more.

I believe it looks faster in small benchmarks, but I bet you're just causing descheds in busy apps to win the race. Oilpan did this too, at first they just used more threads to look amazing on some benchmarks. :)

Ex. Raster tasks are likely more important than these tasks, but you probably bump them off the cores.

Please don't add a new unterminating thread.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/paint-dev/CABpaAqTr5e1jNfnRzvgmJ4de2urg4-y_-R-iwB_Yy9CFj9vO0w%40mail.gmail.com.

Victor Miura

unread,

Dec 3, 2015, 2:33:48 PM12/3/15

to Elliott Sprehn, Justin Novosad, Olivia Lai, schedu...@chromium.org, Noel Gordon, paint-dev, Jeremy Roman

The performance difference looks small on many of the devices, and negative for the thread on Android One. We need to take care on the low core devices, especially 2 core Android devices. Even if not on the main thread, we may need to moderate the time and not leave it all to the OS scheduler.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/paint-dev/CAO9Q3iJctkfp0DhgT3WsToT-LZ3wtv26AvrZ5JQuj8DijqhxNQ%40mail.gmail.com.

xlai

unread,

Dec 3, 2015, 3:14:48 PM12/3/15

to Victor Miura, Elliott Sprehn, Justin Novosad, schedu...@chromium.org, Noel Gordon, paint-dev, Jeremy Roman

First of all, ideally if we have a shared thread pool for Blink, the threaded implementation will become the most suitable implementation; but now we don't have it; and we cannot keep waiting for the shared thread pool to be ready and let that block shipping this important function in canvas.

Secondly, Oilpan and canvas.toBlob have different priorities; whilst it is okay that the garbage collection tasks are delayed for a while when the main thread is super busy, users who invoke canvas.toBlob expect to see the task being completed within reasonable time.

Nevertheless, there are a few compromising approaches to address your concerns:

1) Like junov@ said, as soon as a shared pool is available, we migrate it.

2) We can ship both implementations, with idle-periods implementation set as default. Then we set up a Finch experiment to allow 1% of users to try out the threaded implementation. We can then watch the metrics from the Finch experiment and detect whether those 1% of users encounter smoothness problems. Elliott mentioned that small benchmarks in perf tests may not tell the real picture; then, we should gather real-world data from Finch experiment. If we don't ship canvas.toBlob and let the users try it, we can hardly gather those real-world data.

Sami Kyostila

unread,

Dec 4, 2015, 12:08:54 PM12/4/15

to xlai, Victor Miura, Elliott Sprehn, Justin Novosad, scheduler-dev, Noel Gordon, paint-dev, Jeremy Roman

2015-12-03 20:14 GMT+00:00 xlai <xl...@chromium.org>:

First of all, ideally if we have a shared thread pool for Blink, the threaded implementation will become the most suitable implementation; but now we don't have it; and we cannot keep waiting for the shared thread pool to be ready and let that block shipping this important function in canvas.

If we do decide to go the threaded route, I don't exposing a shared thread pool to Blink would necessarily be all that complicated. The simplest way would be to have a function like blink::Platform::backgroundThreadPoolTaskRunner() which routes tasks into base::WorkerPool (like we already do for v8).

You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To post to this group, send email to schedu...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CADHjF%2BHd8U46zeefiMnDczmR7uuk-hz%3D9gnDhVmLr_UzZ90Qqg%40mail.gmail.com.

- Sami

Justin Novosad

unread,

Dec 4, 2015, 12:45:31 PM12/4/15

to Sami Kyostila, xlai, Victor Miura, Elliott Sprehn, scheduler-dev, Noel Gordon, paint-dev, Jeremy Roman

On Fri, Dec 4, 2015 at 12:08 PM, Sami Kyostila <skyo...@google.com> wrote:

2015-12-03 20:14 GMT+00:00 xlai <xl...@chromium.org>:
First of all, ideally if we have a shared thread pool for Blink, the threaded implementation will become the most suitable implementation; but now we don't have it; and we cannot keep waiting for the shared thread pool to be ready and let that block shipping this important function in canvas.

If we do decide to go the threaded route, I don't exposing a shared thread pool to Blink would necessarily be all that complicated. The simplest way would be to have a function like blink::Platform::backgroundThreadPoolTaskRunner() which routes tasks into base::WorkerPool (like we already do for v8).

Nice. That sounds very reasonable IMHO.

Noel Gordon

unread,

Dec 7, 2015, 1:16:38 AM12/7/15

to Justin Novosad, Sami Kyostila, xlai, Victor Miura, Elliott Sprehn, scheduler-dev, paint-dev, Jeremy Roman

Yes, sounds interesting. If the user was doing back-to-back toBlob calls on the same <canvas> say, processing their video frames maybe, this would queue their toBlob calls FIFO order on a single thread pool? Is that the idea?

What happens to all that (assumed) queued background toBlob work when the user navs the page?

~noel

Justin Novosad

unread,

Dec 7, 2015, 4:50:03 PM12/7/15

to Noel Gordon, Sami Kyostila, xlai, Victor Miura, Elliott Sprehn, scheduler-dev, paint-dev, Jeremy Roman

On Mon, Dec 7, 2015 at 1:16 AM, Noel Gordon <no...@chromium.org> wrote:

Yes, sounds interesting. If the user was doing back-to-back toBlob calls on the same <canvas> say, processing their video frames maybe, this would queue their toBlob calls FIFO order on a single thread pool? Is that the idea?

What happens to all that (assumed) queued background toBlob work when the user navs the page?

The task should be dropped since there is no script environment to receive the completion callback (or resolved promise).

This can be achieved by pulling the task from the queue (if such a thing is possible), and/or there could be an early exit condition based on a state machine that is shared between the threads.

Right now, there is no such check, so the tasks run to completion (which was not the case for the Idle task implementation)

Olivia: I think we need to test that covers this case. I am concerned there might be an issue with posting a callback task back to the main thread after the script execution context has been torn down.

~noel

Tim Ansell

unread,

Dec 10, 2015, 2:03:35 AM12/10/15

to Elliott Sprehn, Justin Novosad, Olivia Lai, scheduler-dev, Noel Gordon, paint-dev, Jeremy Roman

Hi Justin and Olivia,

It is my understanding that toBlob is an async, background API? You request that a blob is created and at some point in the future the blob is available? While a blob is being generated, it does not block the webpage rendering? Is that all correct?

I believe this makes toBlob generation actually lower priority than garbage collection. toBlob doesn't have any potential to block the rendering pipeline and hence is a lower priority then things which can. If we never run GC, we potentially will be required to do GC in the middle of rendering and thus could cause jank. Is there something I'm missing here?

Could you also explain your testing procedure further? It looks like all testing was done on the "tools/perf/page_sets/tough_canvas_cases/canvas_toBlob.html" page? This test seems to be extremely simple canvas test which draws a couple of rectangles?

I would like to reiterate what Elliott mentioned,

Using a thread means you're contending with all the other threads, in a real app that includes the parser thread, v8 compiler, GC threads, workers, service workers, compositor, raster threads, and more.

Have you tried to see what happens with toBlob with more complicated canvas tests? What about using toBlob on a complicated HTML page which contains a small canvas?

The likelihood of jank occurring because of the wrong thread being scheduled is proportional to the number of running threads on the whole system. It would be interesting to see what happens if you had 5 copies of this test running *at the same time* or what happens if you otherwise make the system loaded. Sadly I don't believe perf testing infrastructure has any support for doing these types of things?

This area is *extremely* hard to make good test for (it's something that everyone, including myself, struggle with). We should be very careful about assuming that synthetic benchmarks are demonstrating real world behavior.

Hope that helps!

Tim 'mithro' Ansell

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/paint-dev/CAO9Q3iJctkfp0DhgT3WsToT-LZ3wtv26AvrZ5JQuj8DijqhxNQ%40mail.gmail.com.

Justin Novosad

unread,

Dec 10, 2015, 10:54:08 AM12/10/15

to Tim Ansell, Elliott Sprehn, Olivia Lai, scheduler-dev, Noel Gordon, paint-dev, Jeremy Roman

Thanks Tim,

Olivia and I discussed this, and we think the exposing the shared thread pool to blink and using it for toBlob is worth trying out, and comparing. That being said I am not sure this fully addresses the concerns you are raising because to my knowledge, the shared thread pool does not yield to the compositor (and other high priority things), or does it?

Also, we'd like to improve our tests by making them truly tough. I agree with your concern that the test we were using may have been too easy (not threaded enough and/or not busy enough on the main thread). I am not sure how one would create a telemetry test that has multiple animated pages running at once. Do you know of scheduling tests we could use as inspiration?

Tim Ansell

unread,

Dec 10, 2015, 8:31:55 PM12/10/15

to Justin Novosad, Sami Kyostila, Elliott Sprehn, Olivia Lai, scheduler-dev, Noel Gordon, paint-dev, Jeremy Roman, Gabriel Charette

Adding Sami explicitly to see if he knows the answer to some of these questions.

I wanted to say thank you for taking the time to work with us on this! You are a little unlucky in coming to this space at a time when things are in a large state of flux and we are still trying to get both guidelines and tooling here solid. It is a little unfair that we are asking more from toBlob implementation then we have other things in the past.

On 11 December 2015 at 02:54, Justin Novosad <ju...@chromium.org> wrote:

Thanks Tim,

Olivia and I discussed this, and we think the exposing the shared thread pool to blink and using it for toBlob is worth trying out, and comparing.

Do you think it is possible to make toBlob be independent of were it ends up been running? (IE on a shared thread pool, during idle time or on it's own thread..)

If so, I think we (scheduler team) should then look at providing the interface needed so you don't need to care about how you end up running and then scheduling can choose the best solution as needs change.

Sami, as Blink scheduler lead, what is your thoughts here?

That being said I am not sure this fully addresses the concerns you are raising because to my knowledge, the shared thread pool does not yield to the compositor (and other high priority things), or does it?

I don't think the shared thread pool actually exists yet? Sami, was your shared pool a suggestion or has it been created?

We can lower the priority of threads at the OS level (and have done some attempts too) but that only provides guarantees over the long time periods (order of many seconds). A low priority thread can still cause a high priority to miss work at the frame time scale (order of 10milliseconds).

Also, we'd like to improve our tests by making them truly tough. I agree with your concern that the test we were using may have been too easy (not threaded enough and/or not busy enough on the main thread). I am not sure how one would create a telemetry test that has multiple animated pages running at once. Do you know of scheduling tests we could use as inspiration?

I don't know of any tests which do this at the moment (but I'm not up to date on all the recent changes). Sami, do you know of anything?

The best place to look might actually be the startup tests that is driving some of the work by gab@ and team in Montreal.

Sadly, not having any tests for things like this is partly how we got into having so many threads mess in the first place.

Tim 'mithro' Ansell

PS I didn't see a response regarding my API question?

Sami Kyostila

unread,

Jan 4, 2016, 12:51:40 PM1/4/16

to Tim Ansell, Justin Novosad, Elliott Sprehn, Olivia Lai, scheduler-dev, Noel Gordon, paint-dev, Jeremy Roman, Gabriel Charette

To tie up the loose ends here a bit, here's the interface that was added: https://codereview.chromium.org/1519863002

It's currently hooked up to the shared worker pool, but given the semantic knowledge that these are lower priority background tasks we could potentially do something smarter in the future.

- Sami

xlai

unread,

Jan 27, 2016, 1:31:12 PM1/27/16

to paint-dev, scheduler-dev, Tim Ansell, Justin Novosad, Elliott Sprehn, Noel Gordon, Jeremy Roman, Gabriel Charette, Sami Kyostila

A recent regression alert in perf bot after canvas.toBlob ships (or more specifically, changing from experimental mode to release mode) leads to a discovery that the --enable-experimental-canvas-features flag was not enabled in smoothness.tough_canvas_cases perf tests. This means that a major portion of the previous experimental data collected, as well as the conclusion derived from the data (using threaded implementation instead of idle-tasks implementation), were not valid.

I revert back all the idle-tasks implementations that we deleted and run the comparison on perf trybots again, and find out that the idle-task implementation results in less mean frame time for animation on main thread, as compared to the threaded implementation (even when Blink shared thread pool is used). This time, the difference in mean frame time is obvious and significant: 1ms in my multi-core Linux machine, 2ms in Mac Retina, 5ms in Mac 10-11, 4ms in Mac HDD (dual-core) and 6.5ms in Android s5 device and Nexus 5.

By putting back the idle-tasks implementation, we would expect to see the toBlob running slower in terms of total duration (the previous data on the other perf test still holds), but cause less jank on activities on main thread. This is a desirable outcome as we would expect toBlob to be a low-priority async background API.

Elliott Sprehn

unread,

Jan 27, 2016, 4:41:46 PM1/27/16

to Olivia Lai, paint-dev, Tim Ansell, Sami Kyostila, Gabriel Charette, scheduler-dev, Justin Novosad, Noel Gordon, Jeremy Roman

Great, that matches my expectations as well. :)

Reply all

Reply to author

Forward