I've been thinking some more about differences between the Canvas Command Buffer (streaming) model as the one I prototyped, and a closer-to-Salamander (deferred) model.To summarize what I think the shape of these looks like.Streaming- Like with current Ganesh model, we use the R-tree to cull and play back display lists in the impl-side raster tasks.- Those paint-ops post culling are serialized to the GPU-side via a ring-buffer.- For paint-ops with images, paths, text-blobs, and other cache-able objects, we push those objects to GPU-side via a resource cache.- Caching happens in LRU based on the paint order after culling, and display list objects that are culled are never serialized to GPU.Deferred- We transfer display lists that intersect a tile and run playback for a tile in the GPU-service.- We need to do culling in the GPU-side so we need to serialize the R-tree, or build it up on service side.- We would eithera) have the paint-ops already in a mem-copyable buffer, along with references to other objects saved on the side, orb) iterate through all paint-ops and serialize them and their objects similar to the Streaming model, orc) some combination.- All cache-able objects in the display lists would be pushed to the GPU before drawing a tile.- All images needed for a tile would be pushed before drawing the tile.- For bad memory cases, we may need an Image Decode Service so the GPU-side can pull images during playback.It seems to me like shipping with the Deferred option depends on completing most of the Salamander data model, and possibly DisplayList deltas and more incremental R-tree computaiton.In my mind I still feel that we should implement the Streaming model first, while building towards the Deferred. Keep playback on the Impl-side, but do the streaming "serialization" and "resource cache" parts in a way that is mostly re-usable for the deferred case. wdyt?
Re: performance. You rightly bring up the difference here in that shipping the entire display list is good for the invalidate the world approach and less good for the blinking cursor case. There are trade-offs here and this hasn't been measured. I think in a long term sense, we want to have incremental updates to the display list, and I think a model where we're sending the entire display list (or sub display lists) fits that future approach a lot better than shipping something short term for the sake of shipping it.
Personally, I don't think that custom display list transport is going to ship any time soon. There's a lot of work that's still left to be designed or is in flight. Discardable gpu memory, transport caching, fonts are entirely unknown, there's no fuzzing, devirtualizing PaintFlags, cleaning up shaders/loopers, sorting out gpu scheduling and backpressure, etc. I think this custom display list transport is something we should do and get done, but I'm concerned about doing it right.
I think the most important thing is to get something (anything) working behind a flag, so that we can start flushing out the unknowns of scheduling and fonts and caching and performance. However, I don't think the set of things remaining to be designed are deeply affected by this decision of how to transport the custom display list. I subjectively think sending the whole thing is less code to write and will get done faster. In any case, we should measure, understand performance, and do that in light of how important it is to ship.
We could build a subset of the display list via rtree (for the set of raster tasks we're scheduling instead of culling for each individual task?) and ship that over IPC instead of sending the entire display list.
Istm that the argument for using a command buffer would be for pipelining if there's significant overhead on the client side to package up what it wants to IPC and we want the raster task to start before the client is done? Is this true?
In the Salamander world the split is at LTHI so I think we won't do an approach like analyze tiles and subset the display list on the Renderer side. I think for that future we'll need the the display list deltas approach.A nice thing with of the command buffer approach is the Compositor code looks identical to Ganesh today. We just swapped what the display list plays into from Ganesh SkCanvas to a SkCommandBufferCanvas with fairly few commands (~36 I think?) built on existing command buffer infrastructure and can be fuzzed by our GPU fuzzer. It took a weekend essentially.
The result of the paint->raster->paint has led us to do things likea) Build a display listb) Raster it into an SkPicturec) Add that SkPicture into another display list.
When really all you're looking to do is change some metadata about how the ops in the original list work.
We're still doing those passes with an SkCanvas today but will want to do it with a PaintCanvas once we start customizing things.
Playback to PaintCanvas provides a consistent way to iterate ops from multiple sources, and maintain the clip/layer stack and CTM, which the passes need.