Re: GPU rendering possibility.

Jeremy Roman

unread,

Oct 18, 2016, 1:54:11 PM10/18/16

to Rogovin, Kevin, graphi...@chromium.org, skia-d...@googlegroups.com

+skia-discuss

On Tue, Oct 18, 2016 at 1:40 PM, Rogovin, Kevin <kevin....@intel.com> wrote:

Hi,

I was directed here by a Google employee as this is a possible place to advertise using a new GPU 2D renderer in Chrome. The renderer is still in development at:

https://github.com/01org/fastuidraw

I gave a talk at the X Developer Conference this year:

https://www.x.org/wiki/Events/XDC2016/Program/rogovin_fast_ui_draw/

The renderer is still requiring work to support all the features needed to be a complete solution; it also needs productization.

However, I would like to explore using the renderer in Chrome, possible with the first place to use it is for Canvas.

At this point in time I am hesitant to make a SKIA backend from the renderer; the main reason is related to clipping. The FastUIDraw renderer is set so that clipping is really implemented almost entirely by the GPU with very, very little CPU load; even under icky things like clipping against paths, rotated rectangles and so on. However, it only support clipIn and clipOut (which is exactly needed by GraphicsContext), whereas SKIA right now tracks the clipping region on CPU and implements other clip combine modes. In addition, the renderer is setup strongly to avoid CPU re-computation load as well which is tricky to fit into the SKIA backend interface.

I made a benchmark, ported it to different canvas style toolkits (SKIA, Qt's QPainter and Cairo) which does lots of rotation and lots of clipping. For that load, FastUIDraw was several times faster (against SKIA GL backend it was 5 times faster). The benchmark and its ports are in the git repo linked above under the branch with_ports_of_painter-cells .

At any rate, I am hoping to explore getting FastUIDraw, or atleast the ideas within it, into Chrome. I am also happy to change and tweak its interface to make it more useable by Chrome (I am right now considering changing the brush interface to match SKIA's where shaders can be chained together instead of a fixed set of functionality).

Looking forward to collaboration.

Best Regards,
-Kevin Rogovin

Brian Salomon

unread,

Oct 18, 2016, 2:20:08 PM10/18/16

to Justin Novosad, Jeremy Roman, Rogovin, Kevin, graphi...@chromium.org, skia-d...@googlegroups.com

Hi Kevin,

We are currently working on modifying Skia to store its clip without transforming into device space. We are attempting to remove everything but clip-in and clip-out (or intersect and difference in Skia lingo). In the meantime we were planning to punt clips that use other ops to a slower code path that continues to work but isn't engineered for performance so that we could focus on the simpler clip-in/clip-out-only cases. You may have seen the SkRaserClip which is used by the software backend and rasterizes the clips as they come into SkCanvas. It does exist when running in GPU mode but set to a conservative mode where it really is just tracking clip bounds and doesn't sw-rasterize complicated clips. SkDevice subclasses get to opt into this conservative behavior.

What did you mean by " the renderer is setup strongly to avoid CPU re-computation load as well which is tricky to fit into the SKIA backend interface." Is that related to clipping?

Perhaps we should set up a VC to discuss take advantage of your work in Chrome/Skia?

Brian

On Tue, Oct 18, 2016 at 2:02 PM Justin Novosad <ju...@chromium.org> wrote:

Hi Rogovin, I am the canvas team lead. A skia back-end really would be the path of least resistance for integrating a new renderer because SkDevice is actually designed to be an abstraction layer. When blink was forked from WebKit a few years ago, we removed the graphics abstractions that existed in WebKit. Now, all the graphics code in blink is tightly coupled with skia. Going around skia is probably not very realistic at this point unless you are willing to invest tremendous time and effort.

If your library only supports clip-in and clip-out, that is good enough for blink's uses of skia, and it is definitely sufficient for canvas, so you could get away with implementing a skia back end that does not support other combine modes. FYI, we recently removed uses of the replace op in blink.

To get started with canvas, you'd want to modify Canvas2DLayerBridge to make it instantiate an SkSurface that uses your new backend.

Brian Salomon

unread,

Oct 19, 2016, 1:07:15 PM10/19/16

to Rogovin, Kevin, Justin Novosad, Jeremy Roman, graphi...@chromium.org, skia-d...@googlegroups.com

Hi,

Thanks for the clarification about that. Sounds like we probably have a lot to discuss. We have done some related thinking/exploration. VC = video conference :) Sure, next week would be great. Feel free to propose a time. The Skia team is mostly on the US East Coast.

Brian

On Tue, Oct 18, 2016 at 5:52 PM Rogovin, Kevin <kevin....@intel.com> wrote:

Hi,

Thankyou very much for the very fast replies!

To answer question about “the renderer is setup strongly to avoid CPU re-computation load as well which is tricky to fit into the SKIA backend interface”. It is only tangentially related to clipping. The renderer strongly distinguishes between “what” to draw and “how” to draw. The what to draw would be a sequence of glyphs or path and so on. The “how” to draw is what brush, transformation and clipping are applied. The what is embodied essentially by attribute and index data. The how is embodied by a few numbers (clipping is more complicated to describe). The renderer makes use of an uber-shader and the how is represented as numbers copied to a buffer read by a the uber-shader (via TBO, UBO or (later when I implement the last fallback) an SSBO). One of the bits I have noticed is that regenerating attribute and index data has a non-trivial CPU cost and the goal of FastUIDraw is to reduce the CPU load to be able to draw more and more stuff. There are issues that come up (namely need to do FAST culling) from this approach as well. It is possible to choose to regenerate that data at every draw but that I think is inefficient. The interface for path rendering (be it stroking or filling) is so that the data is generated lazily on demand fetched from the path to stroke or fill.

Clipping is more subtle: it is done via a combination of using the depth buffer to occlude together with hardware clip planes. It makes it so that the CPU has almost nothing to compute as clipping changes and (for most GPU’s) clipping can improve performance (!) I am happy to give more details on it.

What is a VC? Virtual Conference? If so I am happy to do so, but can we do it next week? This week is difficult to schedule anything reliably.

Best Regards,

-Kevin Rogovin

Reply all

Reply to author

Forward