Use glDiscardFramebuffer to save memory bandwidth (issue 651863002 by dongseong.hwang@intel.com)

216 views
Skip to first unread message

dongseo...@intel.com

unread,
Oct 13, 2014, 6:11:37 AM10/13/14
to z...@chromium.org, k...@chromium.org, hongb...@intel.com, chromium...@chromium.org, piman...@chromium.org
Reviewers: Zhenyao Mo, Ken Russell, Hongbo Min,

Message:
Could you review this optimization? It is the same optimization of WebGL for
gles2 decoder: https://codereview.chromium.org/614883002

Description:
Use glDiscardFramebuffer to save memory bandwidth

Leverage glDiscardFramebuffer for tiled-based GPU to improve performance of
offscreen GL contexnt with preserved target buffer and also save memory
bandwidth.

BUG=418272

Please review this at https://codereview.chromium.org/651863002/

SVN Base: https://chromium.googlesource.com/chromium/src.git@master

Affected files (+8, -0 lines):
M gpu/command_buffer/service/gles2_cmd_decoder.cc


Index: gpu/command_buffer/service/gles2_cmd_decoder.cc
diff --git a/gpu/command_buffer/service/gles2_cmd_decoder.cc
b/gpu/command_buffer/service/gles2_cmd_decoder.cc
index
c55d9e387b11b81eb2f4448a4670fde2cade8a66..8c38e8a0765ac275a321b29005c2d7dffa5583e4
100644
--- a/gpu/command_buffer/service/gles2_cmd_decoder.cc
+++ b/gpu/command_buffer/service/gles2_cmd_decoder.cc
@@ -9369,6 +9369,14 @@ void GLES2DecoderImpl::DoSwapBuffers() {

offscreen_saved_color_texture_.swap(offscreen_target_color_texture_);
offscreen_target_frame_buffer_->AttachRenderTexture(
offscreen_target_color_texture_.get());
+
+ // Explicitly discard framebuffer to save GPU memory bandwidth for
+ // tile-based GPU arch.
+ if (feature_info_->feature_flags().ext_discard_framebuffer) {
+ const GLenum attachments[3] = {
+ GL_COLOR_ATTACHMENT0, GL_DEPTH_ATTACHMENT,
GL_STENCIL_ATTACHMENT};
+ glDiscardFramebufferEXT(GL_FRAMEBUFFER, 3, attachments);
+ }
}

// Ensure the side effects of the copy are visible to the parent


hongb...@intel.com

unread,
Oct 13, 2014, 10:27:50 PM10/13/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, chromium...@chromium.org, piman...@chromium.org

https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc
File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):

https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode9375
gpu/command_buffer/service/gles2_cmd_decoder.cc:9375: if
(feature_info_->feature_flags().ext_discard_framebuffer) {
According to my previous work, an attempt to clear a framebuffer on
tiled GPU is time-consuming, if glDiscardFramebuffer is used, it gives a
performance hint and allow GPU driver to set the pixel vale to whatever
it wishes, and the glClear could be skipped (I have verified it on ASUS
Nexus series, it works well without glClear if glDiscardFramebuffer is
used).

So my question is, where is glClear called to clear
|offscreen_target_frame_buffer_| and prepare a new frame? and can we
skip that glClear if have?

https://codereview.chromium.org/651863002/

dongseo...@intel.com

unread,
Oct 14, 2014, 8:09:31 AM10/14/14
to z...@chromium.org, k...@chromium.org, hongb...@intel.com, chromium...@chromium.org, piman...@chromium.org
Hongbo, Thank you for review.


https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc
File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):

https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode9375
gpu/command_buffer/service/gles2_cmd_decoder.cc:9375: if
(feature_info_->feature_flags().ext_discard_framebuffer) {
On 2014/10/14 02:27:48, Hongbo Min wrote:
> According to my previous work, an attempt to clear a framebuffer on
tiled GPU
> is time-consuming, if glDiscardFramebuffer is used, it gives a
performance hint
> and allow GPU driver to set the pixel vale to whatever it wishes, and
the
> glClear could be skipped (I have verified it on ASUS Nexus series, it
works well
> without glClear if glDiscardFramebuffer is used).

As far as I understand, your CL didn't affect whether glClear is skipped
or not. WebGLRenderingContextBase::clearIfComposited still do glClear
every frame.

Let me explain why your CL speed up in tiled-based GPU as I understand.
When binding another fbo, tiled-based GPU must stores tiled cache to gpu
memory (texture, stencil buffer and depth buffer). glDiscardFramebuffer
hints GPU to not read back tiled cache to gpu memory because the
software don't need anymore the changed fbo information. It saves huge
gpu bandwidth.

Let's go back to your CL. The fbo lifecycle is as follows:
1. WebGL replaces color texture to front fbo. At that time, you hint
glDiscardFramebuffer because DrawingBuffer doesn't update fbo so you
don't want to copy tiled cache to color texture redundantly. It wastes
bandwidth.
2. Something later, WebGLRenderingContextBase::clearIfComposited
performs glClear on the fbo.

My CL is the same to your WebGL CL. This code just bind fbo and replace
color texture, so we don't want to copy back tiled cache to color
texture redundantly. glDiscardFramebuffer can make it.


> So my question is, where is glClear called to clear
> |offscreen_target_frame_buffer_| and prepare a new frame? and can we
skip that
> glClear if have?

As I explain above, glClear is not related to glDiscardFramebuffer.

https://codereview.chromium.org/651863002/

hongb...@intel.com

unread,
Oct 14, 2014, 10:04:00 PM10/14/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, chromium...@chromium.org, piman...@chromium.org
Very elaborate description. Thanks.

On 2014/10/14 12:09:30, dshwang wrote:
> Hongbo, Thank you for review.


https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc
> File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):


https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode9375
> gpu/command_buffer/service/gles2_cmd_decoder.cc:9375: if
> (feature_info_->feature_flags().ext_discard_framebuffer) {
> On 2014/10/14 02:27:48, Hongbo Min wrote:
> > According to my previous work, an attempt to clear a framebuffer on
> tiled
GPU
> > is time-consuming, if glDiscardFramebuffer is used, it gives a
> performance
> hint
> > and allow GPU driver to set the pixel vale to whatever it wishes, and
> the
> > glClear could be skipped (I have verified it on ASUS Nexus series, it
> works
> well
> > without glClear if glDiscardFramebuffer is used).

> As far as I understand, your CL didn't affect whether glClear is skipped
> or
not.
> WebGLRenderingContextBase::clearIfComposited still do glClear every frame.

Indeed, my CL for webgl explicitly skips glClear in
WebGLRenderingContextBase::clearIfComposited which would return false early
if
the DrawingBuffer supports discarding framebuffer by checking
DrawingBuffer::discardFramebufferSupported. See
https://codereview.chromium.org/614883002/patch/40001/50001.
According to my experiments, on ASUS Nexus 7 device, only if glClear is
skipped,
the performance could get improved ~30% when using glDiscardFramebuffer.
Otherwise, glDiscardFramebuffer won't have an impact on fps improvement. I
have
no much idea about why glClear calling on framebuffer is so time-consuming.
Maybe a driver bug?



https://codereview.chromium.org/651863002/

hongb...@intel.com

unread,
Oct 15, 2014, 1:06:12 AM10/15/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com

dongseo...@intel.com

unread,
Oct 15, 2014, 6:06:32 AM10/15/14
to z...@chromium.org, k...@chromium.org, hongb...@intel.com, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
On 2014/10/15 02:03:59, Hongbo Min wrote:
> Very elaborate description. Thanks.

> On 2014/10/14 12:09:30, dshwang wrote:
> > As far as I understand, your CL didn't affect whether glClear is
> skipped or
> not.
> > WebGLRenderingContextBase::clearIfComposited still do glClear every
> frame.

> Indeed, my CL for webgl explicitly skips glClear in
> WebGLRenderingContextBase::clearIfComposited which would return false
> early if
> the DrawingBuffer supports discarding framebuffer by checking
> DrawingBuffer::discardFramebufferSupported. See
> https://codereview.chromium.org/614883002/patch/40001/50001.

Ah, I didn't notice this change. Thank you for correcting it.
However, I found a bug thanks to your explanation. glDiscardFramebuffer()
doesn't guarantee clearing FBO, although ASUS Nexus series clear FBO.
I'll fix this bug in https://codereview.chromium.org/656053002/

> > As I explain above, glClear is not related to glDiscardFramebuffer.

> According to my experiments, on ASUS Nexus 7 device, only if glClear is
skipped,
> the performance could get improved ~30% when using glDiscardFramebuffer.
> Otherwise, glDiscardFramebuffer won't have an impact on fps improvement. I
have
> no much idea about why glClear calling on framebuffer is so
> time-consuming.
> Maybe a driver bug?

You explain that skipping glClear mostly contribute on speed-up. Which test
cases do speed-up ~30%? If WebGL draws few triangles, it might be possible,
but
huge tests like Aquarium would speed-up little. Am I right?
Ah.. After the above fix, the speed-up will be rollback, unfortunately.
GLES driver may already have optimization like glDiscardFramebuffer. The
code in
which you add glDiscardFramebuffer doesn't update buffer of FBO, so driver
may
don't read back tiled cache to gpu memory.


https://codereview.chromium.org/651863002/

hongb...@intel.com

unread,
Oct 15, 2014, 6:11:15 AM10/15/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
The benchmarks I am working on is HexGL[1], and PlayCanvas[2] in Ludei demo
list.
[1] https://github.com/hmin/HexGL/tree/ffos-local
[2] https://cocoonjsservice.ludei.com/cocoonjslaunchersvr/demo-list/ (grep
PlayCanvas)

I strongly suggest you to try PlayCanvas, it gets a very big performance
boost.

https://codereview.chromium.org/651863002/

k...@chromium.org

unread,
Oct 15, 2014, 10:09:59 PM10/15/14
to dongseo...@intel.com, z...@chromium.org, hongb...@intel.com, sie...@chromium.org, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
I'm deferring to sievers@, aelias@ and vmiura@ since this would principally
affect Chrome on Android. I'm not sure any more how this part of the command
buffer relates to the compositor's on-screen display path. What sorts of
measurements have been done of this change? Does it measurably improve
performance?


https://codereview.chromium.org/651863002/

k...@chromium.org

unread,
Oct 15, 2014, 10:12:29 PM10/15/14
to dongseo...@intel.com, z...@chromium.org, hongb...@intel.com, sie...@chromium.org, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
Sorry. I didn't read the CL description carefully. Please file a new bug
about
applying this optimization to PNaCl instead of referring to the
already-closed
bug about WebGL. Thanks.


https://codereview.chromium.org/651863002/

hongb...@intel.com

unread,
Oct 16, 2014, 2:26:09 AM10/16/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, sie...@chromium.org, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com

hongb...@intel.com

unread,
Oct 16, 2014, 2:27:16 AM10/16/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, sie...@chromium.org, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
On 2014/10/16 02:09:58, Ken Russell wrote:
> I'm deferring to sievers@, aelias@ and vmiura@ since this would
> principally
> affect Chrome on Android. I'm not sure any more how this part of the
> command
> buffer relates to the compositor's on-screen display path. What sorts of
> measurements have been done of this change? Does it measurably improve
> performance?

According to my performance measurement, glDiscardFramebufferEXT may save
memory
bandwidth, but have no performance improvement if glClear is not skipped.

https://codereview.chromium.org/651863002/

dongseo...@intel.com

unread,
Oct 16, 2014, 10:08:38 AM10/16/14
to z...@chromium.org, k...@chromium.org, hongb...@intel.com, sie...@chromium.org, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
On 2014/10/16 02:12:28, Ken Russell wrote:
> Sorry. I didn't read the CL description carefully. Please file a new bug
> about
> applying this optimization to PNaCl instead of referring to the
> already-closed
> bug about WebGL. Thanks.

Thanks. I filed new bug and changed the description.

sievers@, could you review?
Unfortunately, I also didn't see meaningful speed-up. nevertheless, I submit
this CL because of recommended in theory.

https://codereview.chromium.org/651863002/

vmi...@chromium.org

unread,
Oct 16, 2014, 1:10:13 PM10/16/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, hongb...@intel.com, sie...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
Due to Chromium security needs, glDiscardFramebufferEXT imposes a forced
glClear
in GPU command buffer.

glDiscardFramebufferEXT marks framebuffer attachments as invalid (needs
clear),
which forces a glClear
the next time the buffer is used. Therefore replacing a glClear with
glDiscardFramebufferEXT generally
leads to no improvement, and actually adds some overhead.

There should be a win when glDiscardFramebufferEXT is used right at the |
end| of
a rendering pass when
there are unneeded attachments, e.g. usually DEPTH & STENCIL buffers can be
discarded to reduce
bandwidth on tiling GPUs.

In this case I think you could try using glDiscardFramebufferEXT with
GL_DEPTH_ATTACHMENT & GL_STENCIL_ATTACHMENT,
|before| doing the swap.

https://codereview.chromium.org/651863002/

dongseo...@intel.com

unread,
Oct 16, 2014, 4:17:37 PM10/16/14
to z...@chromium.org, k...@chromium.org, hongb...@intel.com, sie...@chromium.org, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
On 2014/10/16 17:10:12, vmiura wrote:

Thank you for good review.

> Due to Chromium security needs, glDiscardFramebufferEXT imposes a forced
glClear
> in GPU command buffer.

> glDiscardFramebufferEXT marks framebuffer attachments as invalid (needs
clear),
> which forces a glClear
> the next time the buffer is used. Therefore replacing a glClear with
> glDiscardFramebufferEXT generally
> leads to no improvement, and actually adds some overhead.

PNaCl spec said that
"The contents of ancillary buffers are always undefined after calling
SwapBuffers. The contents of the color buffer are undefined if the value of
the
PP_GRAPHICS3DATTRIB_SWAP_BEHAVIOR attribute of context is not
PP_GRAPHICS3DATTRIB_BUFFER_PRESERVED."
https://code.google.com/p/chromium/codesearch#chromium/src/ppapi/cpp/graphics_3d.h&l=156
https://developer.chrome.com/native-client/pepper_dev/c/struct_p_p_b___graphics3_d__1__0#a293c6941c0da084267ffba3954793497

So it's intended to make the front buffer (i.e.
|offscreen_target_frame_buffer_|) undefined. Currently, the front buffer on
PNaCl keeps 1 frame ago back buffer (i.e. |offscreen_saved_color_texture_|).
After this CL, it's really undefined. Is it ACCEPTABLE in terms of Chromium
security???

FYI, in my experiences about glDiscardFramebufferEXT,
Adreno GPU: clear fbo
Intel GPU with mesa driver: do nothing

> There should be a win when glDiscardFramebufferEXT is used right at the |
> end|
of
> a rendering pass when
> there are unneeded attachments, e.g. usually DEPTH & STENCIL buffers can
> be
> discarded to reduce
> bandwidth on tiling GPUs.

> In this case I think you could try using glDiscardFramebufferEXT with
> GL_DEPTH_ATTACHMENT & GL_STENCIL_ATTACHMENT,
> |before| doing the swap.

After re-thinking, we can use glDiscardFramebufferEXT(GL_DEPTH_ATTACHMENT &
GL_STENCIL_ATTACHMENT) with preserve-fbo because the spec said "The
contents of
ancillary buffers are always undefined"

In more detail, let me explain how to composite the back buffer on browser.
After SwapBuffer(), browser generates mailbox by the new back buffer and
draws
the mailbox on the browser.

https://codereview.chromium.org/651863002/

sie...@chromium.org

unread,
Oct 16, 2014, 7:04:19 PM10/16/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, hongb...@intel.com, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
On 2014/10/16 20:17:36, dshwang wrote:
> On 2014/10/16 17:10:12, vmiura wrote:

> PNaCl spec said that
> "The contents of ancillary buffers are always undefined after calling
> SwapBuffers. The contents of the color buffer are undefined if the value
> of
the
> PP_GRAPHICS3DATTRIB_SWAP_BEHAVIOR attribute of context is not
> PP_GRAPHICS3DATTRIB_BUFFER_PRESERVED."

https://code.google.com/p/chromium/codesearch#chromium/src/ppapi/cpp/graphics_3d.h&l=156

https://developer.chrome.com/native-client/pepper_dev/c/struct_p_p_b___graphics3_d__1__0#a293c6941c0da084267ffba3954793497

> So it's intended to make the front buffer (i.e.
> |offscreen_target_frame_buffer_|) undefined. Currently, the front buffer
> on
> PNaCl keeps 1 frame ago back buffer (i.e. |offscreen_saved_color_texture_|
> ).
> After this CL, it's really undefined. Is it ACCEPTABLE in terms of
> Chromium
> security???


That's a functional argument about not having to preserve the contents.
The other argument is a security argument about not exposing some video
memory
region the client shouldn't have access to, if the client copies out or
reads
back the discarded buffer. We usually flag the buffer with needs_clear, see
HandleDiscardBackbufferCHROMIUM(). The clear is done implicitly by the
decoder
every time you try to access the framebuffer (including draw since we don't
know
if the draw will cover the whole buffer), and Victor is saying that such an
implied clear might destroy the perf gains you otherwise get from the
discard.

@Victor: I thought that tilers are usually able to optimize Clear + full
Draw,
so that the Clear does not generate actual pixels being written.







https://codereview.chromium.org/651863002/

vmi...@chromium.org

unread,
Oct 16, 2014, 8:59:47 PM10/16/14
to dongseo...@intel.com, z...@chromium.org, k...@chromium.org, hongb...@intel.com, sie...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
> @Victor: I thought that tilers are usually able to optimize Clear + full
> Draw,
> so that the Clear does not generate actual pixels being written.

For tilers usually it's free or cheap, but on hybrid tilers like Adreno it's
more like drawing a full frame quad. Though, drawing a full frame quad is
still
cheaper than loading previous frame contents.

Where it's a loss is if we do glDiscardFramebufferEXT instead of glClear
because
GLES2Decoder will just force the glClear anyway, just in a more long winded
way.
i.e:

[Good case]

Client 1:
glClear(...);
draw stuff...

Service 1:
glClear(...);
draw stuff

[Bad case]

Client 2:
glDiscardFramebuffer(...)
draw stuff

Service 2:
glDiscardFramebuffer(...)
Invalidate all attachments. Mark framebuffers as potentially invalid.
// Force clear to black before "draw stuff"
glClearColor(0,0,0,0)
glColorMask(1,1,1,1)
glClearDepth(0)
etc.
glClear(...)
glClearColor(client state)
glClearMask(client state)
glClearDepth(client state)
etc.
draw stuff

https://codereview.chromium.org/651863002/

dongseo...@intel.com

unread,
Oct 17, 2014, 9:14:37 AM10/17/14
to z...@chromium.org, k...@chromium.org, hongb...@intel.com, sie...@chromium.org, vmi...@chromium.org, ael...@chromium.org, chromium...@chromium.org, piman...@chromium.org, shouq...@intel.com
@vmlura, @sievers, thank you for good explanation.
As a result, this CL regresses perf also. I close it as Invalid.

https://codereview.chromium.org/651863002/
Reply all
Reply to author
Forward
0 new messages