Re: RasterInterface latencies in different context?

5 views
Skip to first unread message

Zhenyao Mo

unread,
Nov 3, 2022, 1:44:52 PM11/3/22
to Eugene Zemtsov, Sunny Sachanandani, graphics-dev
There is an extra cost of creating a new context if you use
RendererBlinkPlatformImpl::CreateOffscreenGraphicsContext3DProvider(),
but I don't know why there is extra overhead after the context
creation.

CC graphics-dev for more eyes

On Wed, Oct 26, 2022 at 2:44 PM Eugene Zemtsov <ezem...@google.com> wrote:
>
> Hi,
>
> On my windows machine I observe that RasterInterface::ReadbackImagePixels() has much lower latency if graphics context is obtained from RendererBlinkPlatformImpl::CreateSharedOffscreenGraphicsContext3DProvider()
> and much larger latency if graphics context is obtained from RendererBlinkPlatformImpl::CreateOffscreenGraphicsContext3DProvider().
>
> Is it expected? Can I do something to make my graphics context faster?
>
> --
> Thanks,
> Eugene Zemtsov.

Sunny Sachanandani

unread,
Nov 3, 2022, 1:58:55 PM11/3/22
to Zhenyao Mo, Eugene Zemtsov, graphics-dev, Vasiliy Telezhnikov
Initially I thought it might be because one uses OOP-R while the other uses GL command buffer, but it seems they should set the same value for enable_oop_rasterization. The only other thing I can think of is that the shared context has a dependency on it from viz display compositor so it keeps getting prioritized by the display scheduler and hence has lower latency while the one-off context gets deprioritized because viz doesn't depend on it. I'm assuming that you're caching the context from CreateOffscreenGraphicsContext3DProvider() since it creates a new one on every call while CreateSharedOffscreenGraphicsContext3DProvider() returns a cached one from RenderThreadImpl::SharedMainThreadContextProvider().

Sunny Sachanandani

unread,
Nov 3, 2022, 4:16:59 PM11/3/22
to Eugene Zemtsov, Zhenyao Mo, graphics-dev, Vasiliy Telezhnikov
Assuming you're doing synchronous read pixels i.e. ReadbackImagePixels and not ReadbackARGBPixelsAsync, we should be raising the priority of the context due to the WaitForGetOffsetInRange that's called as part of the Finish call (see gpu::Scheduler::RaisePriorityForClientWait) so it's most likely not that.

What I said above about there being no difference related to enable_oop_rasterization between CreateSharedOffscreenGraphicsContext3DProvider() and CreateOffscreenGraphicsContext3DProvider() was wrong - for CreateOffscreenGraphicsContext3DProvider() whether we use OOP-R depends on web_attributes.enable_raster_interface - can you check what's the value you're passing for this?



On Thu, Nov 3, 2022 at 11:59 AM Eugene Zemtsov <ezem...@google.com> wrote:
Yes, I'm caching the context. 

> The only other thing I can think of is that the shared context has a dependency on it from viz display compositor so it keeps getting prioritized by the display scheduler and hence has lower latency while the one-off context gets deprioritized because viz doesn't depend on it.
Yeah, I suspect something like this too :( 


--
Thanks,
Eugene Zemtsov.

Eugene Zemtsov

unread,
Nov 3, 2022, 4:41:35 PM11/3/22
to Sunny Sachanandani, Zhenyao Mo, graphics-dev, Vasiliy Telezhnikov
Yes, I'm caching the context. 

> The only other thing I can think of is that the shared context has a dependency on it from viz display compositor so it keeps getting prioritized by the display scheduler and hence has lower latency while the one-off context gets deprioritized because viz doesn't depend on it.
Yeah, I suspect something like this too :( 


On Thu, Nov 3, 2022 at 10:58 AM Sunny Sachanandani <sun...@google.com> wrote:


--
Thanks,
Eugene Zemtsov.

Eugene Zemtsov

unread,
Nov 3, 2022, 4:43:33 PM11/3/22
to Sunny Sachanandani, Zhenyao Mo, graphics-dev, Vasiliy Telezhnikov
OOP-R depends on web_attributes.enable_raster_interface - can you check what's the value you're passing for this?

  Platform::ContextAttributes attributes;
  attributes.enable_raster_interface = true;
  attributes.support_grcontext = true;
  attributes.prefer_low_power_gpu = true;

--
Thanks,
Eugene Zemtsov.

Sunny Sachanandani

unread,
Nov 4, 2022, 2:45:41 PM11/4/22
to Eugene Zemtsov, Zhenyao Mo, graphics-dev, Vasiliy Telezhnikov
Hmmm, can you confirm that 1) you're using ReadbackImagePixels() and not ReadbackARGBPixelsAsync() and 2) if you have OOP-C i.e. "Canvas out-of-process rasterization" enabled in about:gpu? If OOP-C is disabled the CreateSharedOffscreenGraphicsContext3DProvider() context goes through GL command buffer readPixels path which seems to be somehow faster than the OOP-R readback path. If OOP-C is enabled, can you attach Chrome traces for both cases with the rendering categories enabled?

Eugene Zemtsov

unread,
Nov 29, 2022, 6:07:51 PM11/29/22
to Sunny Sachanandani, Zhenyao Mo, graphics-dev, Vasiliy Telezhnikov
Hi Sunny,

Sorry for the late reply. Holidays :)

I mostly care for the case where OOP-R is enabled, because as I understand it, it's the way everything will be working in the near future.
Currently I look at RI::ReadbackImagePixels() called on a separate thread with a separate context. 

I narrowed down inconsistent readback performance to GrGpu::readPixels(), it sometimes can take either around 3-5 ms or more often around 12-13 ms on a separate context. 
When using on regular context it's much more stable in the 3-5ms range

For example, in the attached trace GPU process:

High latency call
GrGpu::readPixels
Category disabled-by-default-skia.gpu
Start: 103,357.199 ms
Wall Duration: 13.102 ms
CPU Duration: 1.499 ms

Low latency call
GrGpu::readPixels
Category disabled-by-default-skia.gpu
Start: 105,328.148 ms
Wall Duration: 3.987 ms
CPU Duration: 1.403 m

I don't really know how to dig deeper into Skia's GrGpu::readPixels() to figure out what makes it inconsistent and slower.
I'll appreciate any help in debugging what's going on.

This is the CL I am working on.
I'm using this perf test to gauge performance: videoFrame-copyTo-canvas.html

--
Thanks,
Eugene Zemtsov.
trace_RI_ReadbackImagePixels_performance.json.gz
Reply all
Reply to author
Forward
0 new messages