Hi,I notice Chrome Android WebView use Android GraphicBuffer as GL Texture's buffer, GraphicBuffer allocated by Android ION memory allocator, which is an SoC implement HAL module, I have try this before, but found some compatible issues:1, Some device may have number limit of GraphicBuffer (Galaxy Nexus)
2, Some device cannot lock/unlock to write a GraphicBuffer when it bind to a Texture (Tegra2/3/4, Adreno 2xx...), that means it need to unbind the texture after the gl call, but the bind/unbind OP maybe slow
3, Some device the GraphicBuffer's lock/unlock OP is very slow, maybe 5ms+ (Nexus 10)
Is that mean SoC vendor need to provide new ION driver to solve the above issues for Android 4.4?
Unify code paths with chrome and get more consistent perf characteristics across devices.
To unsubscribe from this group and stop receiving emails from it, send an email to graphics-dev...@chromium.org.
Hi Bo,Do you mean that the webview will use idle upload (PixelBufferRasterWorkerPool ) in the future ?
On Sat, Sep 13, 2014 at 9:39 AM, willy yu <jiaw...@gmail.com> wrote:Hi Bo,Do you mean that the webview will use idle upload (PixelBufferRasterWorkerPool ) in the future ?It's already the case since crrev.com/289252
We actually found that the lock/unlock graphic buffer is slow on some vendors.By definition that the graphic buffer can lock/unlock on non-main thread.Maybe we can operate the graphics buffer on the raster thread.
On 15.09.2014 02:30, David Reveman wrote:
This is alright for now but we should try moving to the 1-copy
rasterizer once https://codereview.chromium.org/562833004/ lands. 1-copy
rasterizer can use gralloc or shared memory. We can use gralloc when
more efficient and we don't have to worry about file descriptor limits
in this case as gralloc buffers are only used as temporary staging
buffers and we can control how many are created.
I'm hoping that we can remove PixelBufferRasterWorkerPool and async
uploads soon.
We actually found that the lock/unlock graphic buffer is slow on
some vendors.
By definition that the graphic buffer can lock/unlock on
non-main thread.
Maybe we can operate the graphics buffer on the raster thread.
David,
Would it be possible to elaborate a bit on the Chrome plans wrt this (and WebView, if it differs)? What does the one copy rasterizer concretely mean?
What is the vision or goal for the mechanism to transfer the sw rasterized bitmaps to the compositor textures?
Would it be preferable to have only one codepath, or do you see acceptable to have few different but "first class" code paths, maybe one for gralloc and one for texture upload code path?
Or is the vision that all HW should use is gralloc if that's suboptimal for compositing, so be it?
The reason I'm asking is because on platforms that benefit of texture swizzling, the gralloc api is a bit counter-intuitive. What appears to be called "zero copy" will end up being rather many copies. These copies are either copy operations or copied data.
The limiting factor of the gralloc API, as far as I understand, is that the contract is that once you lock the buffer, the buffer needs to have the expected bits in, eg. the texture data that was in the texture. This is expensive operation for swizzled textures. From the point of view of the rasterization, doing this work serves no purpose, because the data will be overwritten. For these platforms, gralloc will be quite inferior to texture upload.
Locking "write only" might be a solution, a tip to the implementation that one might provide zeroed data in the buffer instead of readback or cached copy. As far as I can guess form the API definition, this is not the semantics of the flag, though? I read the API so that if one wants to read the bits (as in rasterizer blending), one must have R+W flags.
Platforms that benefit form swizzling, it is quite hard to get more optimal texture update than texture upload. This is the code that runs quite many times, and thus is expected to get a fair deal of attention. The code structure that threads and gl contexts force is of course cumbersome, compared to more relaxed gralloc.. (Barring sw rasterizer that swizzles as part of the rasterization process, at least)
Would it be preferable to have only one codepath, or do you see acceptable to have few different but "first class" code paths, maybe one for gralloc and one for texture upload code path?There are already n paths.
I am a little confuse that how many upload paths in chromium and their differences...I list what I know in below:
1, zero-copy upload, use buffer can shared between CPU/GPU as tile's buffer such as GraphicBuffer on Android2, one-copy upload, use buffer can shared between CPU/GPU but just temporary, will need to copy to the buffer of tile once (by GPU?)
3, async upload, use normal bitmap, use glTexImage2D to upload by CPU when compositor thread is idle
On Mon, Sep 15, 2014 at 7:43 PM, Roger Yi <roge...@gmail.com> wrote:I am a little confuse that how many upload paths in chromium and their differences...I list what I know in below:I'm not really the authority on this area, but here goes...1, zero-copy upload, use buffer can shared between CPU/GPU as tile's buffer such as GraphicBuffer on Android2, one-copy upload, use buffer can shared between CPU/GPU but just temporary, will need to copy to the buffer of tile once (by GPU?)As David said, GpuMemoryBuffer can be backed by gralloc on Android, or by shared memory. I think this applies to both 1 and 2.3, async upload, use normal bitmap, use glTexImage2D to upload by CPU when compositor thread is idleFor async upload, there's threaded upload (AsyncPixelTransferManagerEGL) and idle upload (AsyncPixelTransferManagerIdle)Am I right? and chromium have more upload paths not listed above?---and BTW,Even can put lock/unlock of GraphicBuffer in raster thread and actually I have try this before, but in some devices the lock/unlock is extreme slow, whick make each tile's rasterization time over 10ms, when you scroll the page fast enough, the screen will be empty for a long period...
On 16.09.2014 02:56, Eric Penner wrote:
I figure at this point we should just keep the best performing
solution we have for each platform, while preparing for Ganesh that
has different performance characteristics.
Ok, good to know. Thanks.
Kimmo, what would be the recommended technique on NVidia?
(Talking a from the mobile perspective, as this was related to the
android gralloc work)
The old devices would benefit a bit from the gralloc path code-path.
However, with the new hw such as the gpu with the "K1", current
thinking is that normal texture upload with glTexImage2D would be the fastest and probably also the most "asynchronous" way to update the pixels. There's some memcpying done to achieve asynchronous upload, but if I understand correctly, that should be quite fast compared to cross-thread synchronisation. Whether or not uploading in an aux thread vs in the main compositor helps prevent janks is still not entirely clear for me at least, so I'd need to try to experiment with AsyncPixelTransferManagerEGL. Before that, I can't give any good recommendation, apart from suggestion that preserving a non-gralloc code-path would be great :)
Our current generation mobile stuff wouldn't probably benefit of switching to PBOs, due to the hw not having full cache coherency.
What I think would still warrant it's own code-path is if NVidia can
provide an extension that allows for persistently mapping a PBO in
another process.
How would that work with sandboxing / command buffer? Would references to particular PBOs be sent cross-process as file descriptors and then maybe mmapped in the raster process? I guess that could work, though I'm no driver expert or spec writer. Sounds a bit tricky to specify.. Probably at this point, it's not worth the complication, since you optimize away only a memcpy, and there's the cache issue..
On Mon, Sep 15, 2014 at 12:28 AM, Kimmo Kinnunen <kkin...@nvidia.com> wrote:On 15.09.2014 02:30, David Reveman wrote:
This is alright for now but we should try moving to the 1-copy
rasterizer once https://codereview.chromium.org/562833004/ lands. 1-copy
rasterizer can use gralloc or shared memory. We can use gralloc when
more efficient and we don't have to worry about file descriptor limits
in this case as gralloc buffers are only used as temporary staging
buffers and we can control how many are created.
I'm hoping that we can remove PixelBufferRasterWorkerPool and async
uploads soon.
We actually found that the lock/unlock graphic buffer is slow on
some vendors.
By definition that the graphic buffer can lock/unlock on
non-main thread.
Maybe we can operate the graphics buffer on the raster thread.
David,
Would it be possible to elaborate a bit on the Chrome plans wrt this (and WebView, if it differs)? What does the one copy rasterizer concretely mean?
What is the vision or goal for the mechanism to transfer the sw rasterized bitmaps to the compositor textures?
Would it be preferable to have only one codepath, or do you see acceptable to have few different but "first class" code paths, maybe one for gralloc and one for texture upload code path?
There are already n paths.
Or is the vision that all HW should use is gralloc if that's suboptimal for compositing, so be it?Gralloc is a private API on android, so chrome can't use it (without hacking around the ndk)For webview, I'd like to reduce differences from chrome. What's fast enough for chrome should be fast enough for webview as well. So I'd push for not using gralloc.
On Mon, Sep 15, 2014 at 12:29 PM, Bo Liu <bo...@chromium.org> wrote:On Mon, Sep 15, 2014 at 12:28 AM, Kimmo Kinnunen <kkin...@nvidia.com> wrote:On 15.09.2014 02:30, David Reveman wrote:
This is alright for now but we should try moving to the 1-copy
rasterizer once https://codereview.chromium.org/562833004/ lands. 1-copy
rasterizer can use gralloc or shared memory. We can use gralloc when
more efficient and we don't have to worry about file descriptor limits
in this case as gralloc buffers are only used as temporary staging
buffers and we can control how many are created.
I'm hoping that we can remove PixelBufferRasterWorkerPool and async
uploads soon.
We actually found that the lock/unlock graphic buffer is slow on
some vendors.
By definition that the graphic buffer can lock/unlock on
non-main thread.
Maybe we can operate the graphics buffer on the raster thread.
David,
Would it be possible to elaborate a bit on the Chrome plans wrt this (and WebView, if it differs)? What does the one copy rasterizer concretely mean?I created a document that describes the different mechanism for updated textures in chromium that I'm hoping will be sufficient:
Thanks for the sharing, and I have a question in below:TexImage2D/TexSubImage2D
Standard OpenGL mechanism to initialize or update a texture. This will copy the provided data into the command buffer and perform a matching texture upload on the GPU process side. Essential for WebGL support and sufficient in many use cases.Is that mean when not use GpuMemoryBuffer, actually we need to copy twice, the first is put the command into command buffer, and the second is flush the command buffer?
On 16.09.2014 22:56, 'David Reveman' via Graphics-dev wrote:
On Mon, Sep 15, 2014 at 12:28 AM, Kimmo Kinnunen
<kkin...@nvidia.com <mailto:kkin...@nvidia.com>> wrote:
David, Would it be possible to elaborate a bit on the Chrome plans
wrt this (and WebView, if it differs)? What does the one copy
rasterizer concretely mean?
I created a document that describes the different mechanism for
updated textures in chromium that I'm hoping will be sufficient:
https://docs.google.com/a/chromium.org/document/d/1J4lpHqVw9CmIiM3BeVCRT-SIzDcy-EWfUGBjP0yR_S8/edit?usp=sharing
It's pretty high level and it doesn't yet describe how 1-copy can be
used without gralloc/SurfaceTexture support. But as Eric mentioned,
1-copy updates without a native GpuMemoryBuffer implementation is
similar to using our old CHROMIUM_map_sub extension.
If I'm understanding correctly, CHROMIUM_map_sub
MapTexSubImage2DCHROMIUM call access flag "write only" tries to say that
the memory area returned by the function is readable, but the initial
contents is undefined. If this is preserved, I think GpuMemoryBuffer
could be implemented in terms of shared memory and glTexImage2D for platforms that benefit of texture upload instead of direct access to the data.
I'm looking at the doc and the GpuMemoryBuffer implementation. Both kind
of give the implication that this design is for hardware that can do cpu
access to gpu accessible memory. The design doc says things
like "update textures without having to perform a texture upload". It's
worth noting that even if you have these direct mapping APIs (gralloc,
etc) and they kind of work, it does not really mean that you necessarily
map the memory directly or that it's particularly efficient. Thus, if
the design explicitly *requires* actual direct access, it'd be nice to
know. Eg. the requirements -section could be specific as you can make it:
- unmap needs to be equivalent of a no-op (unmap is planned to be
called from a realtime thread ?)
- no extra copies are designed to be made (memory bw
optimization reason?)
- etc
As it stands currently, the design doc is a bit scary for HW that does
not benefit of direct mapping (or do benefit from storing the textures
in gpu-friendly formats or memory). If support for direct mapping is not an explicit goal, it'd be great to have a comment in GpuMemoryBuffer::Map regarding the buffer contents (maybe another variant of the call if the data is sometimes needed).
From the docs and discussions above, can I assume that the most efficient and stable way of texture upload for tiles on Android is :Use GpuMemoryBuffer as staging buffer and the 1-copy mechanism, and1, Use gralloc as backing when the map/unmap (lock/unlock) OP is fast, use glCopyTexSubImage2D to do the GPU upload;2, Use ashmem as backing when gralloc is slow or not stable, use normal glTexImage2D to do the CPU upload;