Cooperative preemption of image decoding

26 views
Skip to first unread message

Tim Ansell

unread,
Oct 13, 2015, 2:00:18 AM10/13/15
to graphics-dev, no...@chromium.org, scheduler-dev
Hi Graphics Dev! 
(Plus ccing scheduler-dev and Noel)

At the Chrome Scheduling Summit the issue of image decoding came up a couple of times. I made a couple of claims and wanted to followed up with some more information on them -- hopefully this is useful to someone working in the area.

Image decoders can be made to be cooperatively scheduled by incrementally feeding data to the decoder. The basic approach is that you feed the decoder just enough bytes for it to continue decoding so that it ends up yield control to you frequently. You then check the time and stop if decoding has taken too long. Some pseudo code might look like;

    # Decode the image until our timeslice is up
    start = time.time()
    while time.time() - start < timeslice and image_decoding_task.is_not_finished():
      chunk = image_decoding_task.get_chunk()
      image_decoding_task.feed_bytes(data_chunk)

    # If we didn't finish, schedule us to do more
    # decoding in future.    
    if image_decoding_task.is_not_finished():
      post_task(image_decoding_task)

This approach is one of the ways that Firefox does image decoding without significant jank (they also have the option for decoding images on a dedicated thread).

Using this approach we could do image decode can during idle time without issue and actually fits very nicely into the idle approach that was added for garbage collection takes.

There is still a problem of what to do when we need to draw and an image has yet to be decoded (IE it is in the critical path).

Doing this kind of incremental feeding of the system should also work for making image encoders be cooperatively scheduled too.

For JPEG (and for WebP) decoding is approximately the same speed as doing a memcpy of the image! This means that making a copy of an image is almost always the wrong thing to be doing. PNG and GIF are quite a bit slower but even there doing memcpy of the images is still generally a bad idea. (Noel tried to help me find the benchmarks on the perf dashboard but we couldn't figure out how to get a good graph from it.)

Feel free to ask any questions if you have them. I'll do my best to answer but I'm no expert in this area (but can also bug Noel for the answer ;-).

Tim 'mithro' Ansell

Stojiljkovic, Aleksandar

unread,
Oct 13, 2015, 2:50:57 AM10/13/15
to Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev
Hello,
>For JPEG (and for WebP) decoding is approximately the same speed as doing a memcpy of the image!
Is some more data available for this - regarding behavior on different platforms and with different images?
What are decoder libraries used.
Especially interesting is to get numbers about decoding tiles and downscaled version of images.

Thanks.
Kind Regards,
Aleksandar
________________________________
From: graphi...@chromium.org [graphi...@chromium.org] on behalf of Tim Ansell [mit...@mithis.com]
Sent: Tuesday, October 13, 2015 8:59 AM
To: graphics-dev
Cc: no...@chromium.org; scheduler-dev
Subject: Cooperative preemption of image decoding
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org<mailto:graphics-dev...@chromium.org>.
---------------------------------------------------------------------
Intel Finland Oy
Registered Address: PL 281, 00181 Helsinki
Business Identity Code: 0357606 - 4
Domiciled in Helsinki

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Tim Ansell

unread,
Oct 13, 2015, 3:19:45 AM10/13/15
to Stojiljkovic, Aleksandar, graphics-dev, no...@chromium.org, scheduler-dev
We have benchmarks for the image decoders which do appear in the perf dashboard at https://chromeperf.appspot.com/ (but it currently appears to be down).

I think that scaling images is a very expensive operation compared to image decode but have no data to back that claim up (so do actually test before making a decision based on it).

Tim 'mithro' Ansell

Sami Kyostila

unread,
Oct 13, 2015, 10:27:27 AM10/13/15
to Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, ju...@chromium.org
[+junov, +xlai]

Thanks Tim. The main thing I'm wondering is how do we decide whether cooperative decoding is better than doing it on a thread? The overhead from yielding and other things running on the same thread probably means that the overall decoding time is a bit longer, but then again the decoding won't be stealing CPU time from more critical work and could avoid having to spin up another core.

- Sami

--
You received this message because you are subscribed to the Google Groups "scheduler-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scheduler-de...@chromium.org.
To post to this group, send email to schedu...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAHLUNMw_rhzNw1_zGFV1HcCio4Wekkvvvg41zUziK8iA_yjxCw%40mail.gmail.com.

Ross McIlroy

unread,
Oct 13, 2015, 11:16:38 AM10/13/15
to Sami Kyostila, Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
Sami and I were just chatting about this and were thinking that it might be useful to expose some kind of idle tasks mechanism to background threads. These idle tasks could be much more permissive than those on the main thread (since they are not necessarily blocking UI operations), but would allow Chrome to limit the amount of contention due to work on background threads. E.g., the image-decoding / encoding tasks could be built as IdleTasks (using a similar pattern as Tim suggests above), and could then be run either:

 -  on the main thread with cooperative preemption and short idle period deadlines to avoid jank
 -  on a background thread with Chrome able to limit the task's deadline to preempt the task and avoid causing contention on devices with a low core count
 - or on a background thread with the deadline being effectively infinite (which would be equivalent to just posting the task to a background thread currently) if there is plenty of cores to go around.

Sami Kyostila

unread,
Oct 13, 2015, 11:29:05 AM10/13/15
to Ross McIlroy, Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
Nat mentioned off-thread that being able to cancel decode tasks might also be interesting. If they were written in this chunked idle work style, then cancelling now-off-screen work would also become easier.

- Sami

Chris Blume

unread,
Oct 13, 2015, 11:41:58 AM10/13/15
to Sami Kyostila, Ross McIlroy, Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
I like the idea of decoding in chunks.
Do our decoding libraries allow this, though? Do they expect the whole, completed input?


Chris Blume |
 Software Engineer | cbl...@google.com | +1-614-929-9221

To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org.

Vladimir Levin

unread,
Oct 13, 2015, 2:13:01 PM10/13/15
to Chris Blume, Sami Kyostila, Ross McIlroy, Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
+1 to being able to cancel image decodes in the middle of decoding. I think we should keep the decodes running off thread, but when priorities change (like we fling/scroll), we want to be able to cancel tasks. Currently, if an image decode takes something like 100-200ms, and even though it's running on a separate thread, it won't be able to be cancelled which essentially occupies that worker thread until the decode is done.

Justin Novosad

unread,
Oct 13, 2015, 2:28:06 PM10/13/15
to Vladimir Levin, Chris Blume, Sami Kyostila, Ross McIlroy, Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org
FWIW, Olivia Lai is actively working on this for the canvas.toBlob API. The JPEG and PNG libriaries directly allow encoding to be done in chunks. For WebP, it is not as obvious. There is an API in the WebP encoder for updating subregions, so we could separate the work into tiles, but it is not clear whether this API will have the desired perf characteristics. We'll find out soon enough.  The current preliminary implementation uses a dedicated native thread and does the encoding in one big task.  We do not intend to ship it like that. Olivia is currently working on modifying the encoder integrations in blink to exposes chunk-by-chunk encoding. At the scheduling summit brainstorming session, it was suggested that we schedule this work on the shared worker thread pool rather than using idle tasks (where the job may starve), and to split jobs into <4ms chunks.  Each chunk of work would post the next chunk onto the queue. So to make the job cancellable we can just have a signal than can be posted cross-thread to tell the encoder job to abort (exit the scanline loop, and do not post task for next chunk).

Leon Scroggins

unread,
Oct 13, 2015, 2:29:43 PM10/13/15
to Chris Blume, Sami Kyostila, Ross McIlroy, Tim Ansell, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
On Tue, Oct 13, 2015 at 11:41 AM, 'Chris Blume' via Graphics-dev <graphi...@chromium.org> wrote:
I like the idea of decoding in chunks.
Do our decoding libraries allow this, though?

Yes, this is supported, and our decoders take advantage of it to show partial images before all of the data has arrived.



--
Leon Scroggins III
scr...@google.com

Matt Sarett

unread,
Oct 13, 2015, 3:37:53 PM10/13/15
to Graphics-dev, mit...@mithis.com, no...@chromium.org, schedu...@chromium.org
I did some work to run performance benchmarks for jpeg decoding as compared memcpy.  I'm seeing that even for the largest images, the jpeg decodes appear to be about 5x slower on z620 and at least 10x slower on Nexus 6.  Intuitively, this makes sense to me, given that a jpeg decode is a pretty complex, multi-step process (entropy decoding, IDCT, upsampling, color conversion).  I'm sharing a doc with my procedure and results in case anyone wants to look closer.
Jpeg Decode vs memcpy Comparison

If anyone has any suggestions on the design of the benchmark or has any performance results that show a different conclusion, I would definitely be interested to hear their thoughts!

Matt
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org<mailto:graphics-dev+uns...@chromium.org>.

Victor Miura

unread,
Oct 13, 2015, 4:52:11 PM10/13/15
to Matt Sarett, Graphics-dev, mit...@mithis.com, no...@chromium.org, schedu...@chromium.org
Image decoding is very slow relative to other rasterization work.  The attached trace shows for example, a 1590x1716 WEBP in Google+ stream taking over 300ms to decode on Nexus 5.

This makes images completely impractical to decode without a) separate thread, or b) chunking approach.  Would be great to have someone look at this.

How would feeding data piece-by-piece to image decoders work with animated images?


To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org<mailto:graphics-dev...@chromium.org>.
gplus-threaded-gpu-rasterization-020615.json.zip

David Reveman

unread,
Oct 13, 2015, 5:35:47 PM10/13/15
to Victor Miura, Matt Sarett, Graphics-dev, Tim Ansell, no...@chromium.org, schedu...@chromium.org
Splitting each image decode into N number of dependent tasks, where N is output-size-in-bytes / kMaxBytesToDecodePerTask sounds interesting. That would be a relatively simple change in the compositor today and worth experimenting with if the image decoders already have support for this.

David

Tim Ansell

unread,
Oct 13, 2015, 9:00:07 PM10/13/15
to Sami Kyostila, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, ju...@chromium.org
Your assumption that "The overhead from yielding and other things running on the same thread probably means that the overall decoding time is a bit longer" is not quite correct (assuming the same amount of work needs to be performed in both cases).

Cooperatively multitasked "thread switches" are generally cheaper then doing OS level "thread switches". Assuming the same amount of work is done and there are the same number of context switches, the cooperatively multitasked system should win.

See further the explanation below;
----

On a single core system;
  • A cooperatively threaded system can be much faster then using multiple threads.
The reason is because OS level "thread switches" are significantly more expensive than cooperatively multitasked "thread switches".
  • An OS level thread switch has no knowledge about what CPU registers and other resources are in use, so the OS is forced to save everything.
    • The exception is when the thread switch happens at a syscall boundary where the cost is much closer to a normal function call.
    • As a syscall context switch is significantly faster then a non-syscall context switch, the most OSs are optimized to take advantage of this.
    • The syscall context switch being cheap doesn't help with CPU bound workloads which don't end up making syscalls -- like image decode.
The OS also has to change from user space to kernel space which can be expensive but since it is so important, hardware vendors squeeze as much optimize out of it as possible.
  • In a cooperatively multitasked system the compiler knows about what resources it has allocated and thus only needs to save a very small amount of information.

Often the cooperatively multitasked systems do more context switches because they are cheaper.

The reason this is important is that low end Android systems are effectively single core systems.

----

On a multi core system;
  • A cooperatively threaded system (with a thread pool) can be slower then a non-cooperatively threaded system.
The primary reason this ends up happening is because of implementation details rather than properties of how the CPU / OS works. This is because generally the OS has better knowledge about the system architecture and is highly optimized to try and take advantage of things like cache and NUMA to better locate work.

In theory, if the systems had the same level of knowledge, the cooperatively threaded version (with a thread pool) *should* perform better for the same reason it does on the single core system but practical problems generally interfere with the theory :)

----

Sorry for the computer architecture lesson, but it is important to know how these things perform in theory so we have an upper bound on what is possible.

Does that make sense?

Tim 'mithro' Ansell

Tim Ansell

unread,
Oct 13, 2015, 9:02:52 PM10/13/15
to Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
On 14 October 2015 at 02:41, Chris Blume <cbl...@google.com> wrote:
I like the idea of decoding in chunks.

The image decoding libraries are already designed to enable this for a couple of reasons;
  • It allows some of the image to be displayed while the rest of the image is still downloading off the network. Very important for large images on slow networks.

  • It allows decoding only part of the image if the whole image isn't needed.
Do our decoding libraries allow this, though?

If I understand correctly, Yes.
 
Do they expect the whole, completed input?

If I understand correctly, No.


Tim 'mithro' Ansell


Tim Ansell

unread,
Oct 13, 2015, 10:28:28 PM10/13/15
to Vladimir Levin, Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
"Cancelling an image decode" is different from "preempting and image decode". I think you are actually talking about preempting and resuming it at a later time here, right?

The only time we want to partially decode an image and discard the result is under memory pressure (or when we leave the page). Otherwise we are doing the image decode work twice.

Tim 'mithro' Ansell

Vladimir Levin

unread,
Oct 14, 2015, 2:03:01 PM10/14/15
to Tim Ansell, Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
We'd also need to consider the possibility that we've started a decode, but we won't ever come back to it. This can happen if you just fling by an image, we partially decode it, but we keep flinging and not visit that image again.

I think there should be something in place to also just cancel the decode. After all, if the image gets far enough away from visible, finishing the decode is also likely wasted work. 

David Reveman

unread,
Oct 14, 2015, 3:18:44 PM10/14/15
to Vladimir Levin, Tim Ansell, Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
On Wed, Oct 14, 2015 at 2:02 PM, Vladimir Levin <vmp...@chromium.org> wrote:
We'd also need to consider the possibility that we've started a decode, but we won't ever come back to it. This can happen if you just fling by an image, we partially decode it, but we keep flinging and not visit that image again.

I think there should be something in place to also just cancel the decode. After all, if the image gets far enough away from visible, finishing the decode is also likely wasted work. 

If we just split up the decode into multiple dependent tasks as I suggesting then all decode tasks that have not yet started running will be cancelled using the same mechanism as how full image decode tasks are cancelled today when not needed. ie. tasks not in the currently scheduled task graph will be cancelled. 

Vladimir Levin

unread,
Oct 15, 2015, 1:12:02 AM10/15/15
to David Reveman, Tim Ansell, Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
On Wed, Oct 14, 2015 at 12:18 PM, David Reveman <rev...@google.com> wrote:


On Wed, Oct 14, 2015 at 2:02 PM, Vladimir Levin <vmp...@chromium.org> wrote:
We'd also need to consider the possibility that we've started a decode, but we won't ever come back to it. This can happen if you just fling by an image, we partially decode it, but we keep flinging and not visit that image again.

I think there should be something in place to also just cancel the decode. After all, if the image gets far enough away from visible, finishing the decode is also likely wasted work. 

If we just split up the decode into multiple dependent tasks as I suggesting then all decode tasks that have not yet started running will be cancelled using the same mechanism as how full image decode tasks are cancelled today when not needed. ie. tasks not in the currently scheduled task graph will be cancelled. 

If we split up one image decode over several tasks, then presumably each task would have some sort of a handle or a way of letting know the decoding system that it is resuming a previous decode instead of starting a new one. Basically, some state will be kept in the decoder waiting for the next thing to tell it to continue. All I meant above is that if the tasks are canceled, then there needs to be a process of informing the decoder to purge its state. Either that or to somehow ensure that the next time we try to decode this image (if it becomes a priority again), then it should resume from wherever it left off. 

This is a bit of an implementation detail I think, since who knows what this task might look like. Maybe each task can hold on to all the necessary state, so the purge isn't necessary.

David Reveman

unread,
Oct 15, 2015, 10:48:40 AM10/15/15
to Vladimir Levin, Tim Ansell, Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xl...@chromium.org, Justin Novosad
On Thu, Oct 15, 2015 at 1:11 AM, Vladimir Levin <vmp...@chromium.org> wrote:

On Wed, Oct 14, 2015 at 12:18 PM, David Reveman <rev...@google.com> wrote:


On Wed, Oct 14, 2015 at 2:02 PM, Vladimir Levin <vmp...@chromium.org> wrote:
We'd also need to consider the possibility that we've started a decode, but we won't ever come back to it. This can happen if you just fling by an image, we partially decode it, but we keep flinging and not visit that image again.

I think there should be something in place to also just cancel the decode. After all, if the image gets far enough away from visible, finishing the decode is also likely wasted work. 

If we just split up the decode into multiple dependent tasks as I suggesting then all decode tasks that have not yet started running will be cancelled using the same mechanism as how full image decode tasks are cancelled today when not needed. ie. tasks not in the currently scheduled task graph will be cancelled. 

If we split up one image decode over several tasks, then presumably each task would have some sort of a handle or a way of letting know the decoding system that it is resuming a previous decode instead of starting a new one. Basically, some state will be kept in the decoder waiting for the next thing to tell it to continue. All I meant above is that if the tasks are canceled, then there needs to be a process of informing the decoder to purge its state. Either that or to somehow ensure that the next time we try to decode this image (if it becomes a priority again), then it should resume from wherever it left off. 

Yes, with discardable memory as the output for the decode, there's no need to clean up any memory when a decode drops in priority. Just leave it as is and if the priority of image becomes high again we might be lucky and still have the memory from previous decode work available. Effectively making some decode tasks noops.  

David

Leon Scroggins

unread,
Oct 15, 2015, 11:19:47 AM10/15/15
to David Reveman, Vladimir Levin, Tim Ansell, Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xlai, Justin Novosad
Don't we only use discardable memory if all the data has already been received? (We recently discussed this here [1] and that was the conclusion we came to.) A partially decoded image will use the memory stored in the decoder (in the ImageFrame's SkBitmap's SkMallocPixelRef, allocated by malloc). (Maybe I'm too concerned with the current flow here - you *could* decode a partial image to discardable memory, but we do not today. I'm guessing the motivation is that you do not want to partially decode to discardable memory, then throw away that memory due to memory pressure, then try to resume the decode but actually have to start over.)  ImageDecoder does have a way for you to tell it to clear that memory though (which will also happen when the ImageDecoder is deleted). 


David Reveman

unread,
Oct 15, 2015, 2:51:52 PM10/15/15
to Leon Scroggins, Vladimir Levin, Tim Ansell, Chris Blume, Sami Kyostila, Ross McIlroy, graphics-dev, no...@chromium.org, scheduler-dev, xlai, Justin Novosad
On Thu, Oct 15, 2015 at 11:19 AM, Leon Scroggins <scr...@google.com> wrote:
Don't we only use discardable memory if all the data has already been received? (We recently discussed this here [1] and that was the conclusion we came to.) A partially decoded image will use the memory stored in the decoder (in the ImageFrame's SkBitmap's SkMallocPixelRef, allocated by malloc). (Maybe I'm too concerned with the current flow here - you *could* decode a partial image to discardable memory, but we do not today. I'm guessing the motivation is that you do not want to partially decode to discardable memory, then throw away that memory due to memory pressure, then try to resume the decode but actually have to start over.)  ImageDecoder does have a way for you to tell it to clear that memory though (which will also happen when the ImageDecoder is deleted). 

Correct, currently we only decode to discardable if we have all the data available but we definitely want to fix that. Not only because it would allow us to split decode into multiple tasks but also to improve the memory usage of chrome as it is today. We've seen a few cases where we end up with huge amounts of memory usage in the decoders and OOM crash. That would not happen if we used discardable memory. If we're under memory pressure, then I think it's best to allow the discardable memory system to purge partial decodes and just start over.

David

Daniel Bratell

unread,
Oct 16, 2015, 4:39:54 AM10/16/15
to Graphics-dev, Matt Sarett, mit...@mithis.com, no...@chromium.org, schedu...@chromium.org
On Tue, 13 Oct 2015 21:37:52 +0200, Matt Sarett <msa...@google.com> wrote:

I did some work to run performance benchmarks for jpeg decoding as compared memcpy.  I'm seeing that even for the largest images, the jpeg decodes appear to be about 5x slower on z620 and at least 10x slower on Nexus 6.

This made the assumption that the source data was instantly available. Large chunks of memory might have been paged to disk and then it's almost always faster to fetch the source data and decode it, rather than the fetch the much larger decoded memory chunk. Decoding an image should not be made in a loop (like every frame) but I think discarding decoded images can be made more aggressively than today without losing anything valuable.

/Daniel

--
/* Opera Software, Linköping, Sweden: CEST (UTC+2) */

Stojiljkovic, Aleksandar

unread,
Oct 16, 2015, 4:52:29 AM10/16/15
to Daniel Bratell, Graphics-dev, Matt Sarett, mit...@mithis.com, no...@chromium.org, schedu...@chromium.org
Hello,
>Decoding an image should not be made in a loop (like every frame) but I think discarding decoded images can be made more aggressively than today without losing anything valuable.

Not sure if related to same issue, but current approach for Android: SkOneShotDiscardablePixelRef -> ChildDiscardableSharedMemoryManager -> ashmem_... (base/memory/discardable_shared_memory.cc)
seems quite aggressive - for larger images when device is tight with memory behaving like you said:
"Decoding ... made in a loop (like every frame)".
Maybe too aggressive - when tight on memory, even if using smaller images when the "cache" quota is not full. Or, it is just being a good citizen.
Kind Regards,
Aleksandar
________________________________
From: graphi...@chromium.org [graphi...@chromium.org] on behalf of Daniel Bratell [bra...@opera.com]
Sent: Friday, October 16, 2015 11:39 AM
To: Graphics-dev; Matt Sarett
Cc: mit...@mithis.com; no...@chromium.org; schedu...@chromium.org
Subject: Re: Cooperative preemption of image decoding
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org<mailto:graphics-dev...@chromium.org>.

Daniel Bratell

unread,
Oct 16, 2015, 4:57:41 AM10/16/15
to Leon Scroggins, 'David Reveman' via Graphics-dev, David Reveman, Vladimir Levin, Tim Ansell, Chris Blume, Sami Kyostila, Ross McIlroy, no...@chromium.org, scheduler-dev, xlai, Justin Novosad
On Thu, 15 Oct 2015 20:51:51 +0200, 'David Reveman' via Graphics-dev <graphi...@chromium.org> wrote:

If we're under memory pressure, then I think it's best to allow the discardable memory system to purge partial decodes and just start over.

I think you have to be very careful with that. I'm leaning on Presto experience here where this could effectively kill a web application on a slow device. Most likely partially decoded images that are needed in every frame should be the very last thing to discard.

David Reveman

unread,
Oct 16, 2015, 11:53:16 AM10/16/15
to Daniel Bratell, Leon Scroggins, 'David Reveman' via Graphics-dev, Chris Blume, Justin Novosad, Ross McIlroy, Sami Kyostila, Tim Ansell, Vladimir Levin, no...@chromium.org, scheduler-dev, xlai


On Fri, Oct 16, 2015, 4:57 AM Daniel Bratell <bra...@opera.com> wrote:

On Thu, 15 Oct 2015 20:51:51 +0200, 'David Reveman' via Graphics-dev <graphi...@chromium.org> wrote:

If we're under memory pressure, then I think it's best to allow the discardable memory system to purge partial decodes and just start over.

I think you have to be very careful with that. I'm leaning on Presto experience here where this could effectively kill a web application on a slow device. Most likely partially decoded images that are needed in every frame should be the very last thing to discard.



The plan is that the compositor would keep some set of images pinned that are in or close to the current viewport. A failure to resume a decode that we started earlier would only happen if the priority of that image was relatively low sometime between the initial decode and the resume.

David

Stojiljkovic, Aleksandar

unread,
Oct 23, 2015, 11:51:15 AM10/23/15
to David Reveman, Daniel Bratell, Leon Scroggins, 'David Reveman' via Graphics-dev, Chris Blume, Justin Novosad, Ross McIlroy, Sami Kyostila, Tim Ansell, Vladimir Levin, no...@chromium.org, scheduler-dev, xlai, re...@google.com
Hello,
As Scroggo pointed, this might be related issue 501043.<https://code.google.com/p/chromium/issues/detail?id=501043#c13>
There is some work ongoing on decoding (downsampling during decoding) images to resolution needed for display and optional decoding to RGB565. <https://code.google.com/p/chromium/issues/detail?id=438323#c40> Reed is defining API for that in SkImageGenerator.

How it works there (with the patches) for JPEG is like this:
- while data is not yet available (and this is state before the patch), several times JPEGImageDecoder continues (resumes) decoding until it hits :

1023<https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/platform/image-decoders/jpeg/JPEGImageDecoder.cpp&l=1023> if (jpeg_read_scanlines<https://code.google.com/p/chromium/codesearch#chromium/src/third_party/libjpeg_turbo/jpeglibmangler.h&l=84&ct=xref_jump_to_def&cl=GROK&gsn=jpeg_read_scanlines>(info<https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/platform/image-decoders/jpeg/JPEGImageDecoder.cpp&l=1000&ct=xref_jump_to_def&cl=GROK&gsn=info>, &row<https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/platform/image-decoders/jpeg/JPEGImageDecoder.cpp&l=1022&ct=xref_jump_to_def&cl=GROK&gsn=row>, 1) != 1)
return false;

it returns partially decoded image (which needs to be N32 (RGBA) since undecoded part is where alpha is used).
This is invoked from skia ->...-> ImageFrameGenerator.

- once data is available, image gets downsampled while decoded to size corresponding to display, optionally to rgb565.

Don't know yet if issue 501043.<https://code.google.com/p/chromium/issues/detail?id=501043#c13>is related to partial decoding or sampling on large textures (there seems to be mimmapping involved), but related to this topic here it makes sense to prototype (summarizing what was already identified in this thread):

- downsampled (and optionally RGB565) decoding for png
- partial decoding could advance in slices, so that it is possible to orchestrate advancing related to viewport during scrolling. In ImageFrameGenerator it is known if full data is available, so maybe, if it is partial, advance a bit (not untill hitting read fail) and give some time to another decoder. It could be prototyped in same thread and measured before multithreading it.
- partial decoding could also be downsampled,
- more control over decoding/unpinning from discardable memory related to viewport and user action (e.g. scrolling, zooming).
- partial decoding should go to sk discardable memory (if not already - once it advances to the end, it just stays there).

It makes sense about this bug, maybe some other bug, but difficult to tell in general if related to this topic or if I'm just hijacking it.
Kind Regards,
Aleksandar
________________________________
From: graphi...@chromium.org [graphi...@chromium.org]
Sent: Friday, October 16, 2015 6:53 PM
To: Daniel Bratell; Leon Scroggins; 'David Reveman' via Graphics-dev
Cc: Chris Blume; Justin Novosad; Ross McIlroy; Sami Kyostila; Tim Ansell; Vladimir Levin; no...@chromium.org; scheduler-dev; xlai
Subject: Re: Cooperative preemption of image decoding



On Fri, Oct 16, 2015, 4:57 AM Daniel Bratell <bra...@opera.com<mailto:bra...@opera.com>> wrote:

On Thu, 15 Oct 2015 20:51:51 +0200, 'David Reveman' via Graphics-dev <graphi...@chromium.org<mailto:graphi...@chromium.org>> wrote:

If we're under memory pressure, then I think it's best to allow the discardable memory system to purge partial decodes and just start over.

I think you have to be very careful with that. I'm leaning on Presto experience here where this could effectively kill a web application on a slow device. Most likely partially decoded images that are needed in every frame should be the very last thing to discard.



The plan is that the compositor would keep some set of images pinned that are in or close to the current viewport. A failure to resume a decode that we started earlier would only happen if the priority of that image was relatively low sometime between the initial decode and the resume.

David

Reply all
Reply to author
Forward
0 new messages