Multiple instances of AwTestContainerView hangs on UI thread inside HardwareView logic

153 views
Skip to first unread message

Eager Soul

unread,
Apr 19, 2015, 1:24:49 AM4/19/15
to chromi...@chromium.org
Hi,

I have started playing with chromium code for android and was able(!) to compile/test/run android_webview shell application from under "android_webview/test/shell".
At this point I copied AwTestContainerView into a separate project to start playing with it and it worked fine(!). Getting more confident with my experiment I ventured into using multiple AwTestContainerView instances in same application.
But unfortunately I hit some road blocks and wanted some help to run multiple instances of AwTestContainerView with hardware acceleration turned on.

Thanks a lot in advance!

I am compiling 42.0.2311.40 chromium and its dependencies. Using java 7 compiled bytecode on android L device.

Changes that I am playing with:
  • To begin with I should say that AwTestContainerView creates only one HardwareView for whole application, so I had to change that to have one HardwareView for each instance of AwTestContainerView. Rest of the code I use as-is.
  • For my logic I create multiple instances of AwTestContainerView but only one of them is visible at any time. Its like multiple tabs in chrome browser.
  • Everything works fine when I use just one instance but as soon as create a new instance, remove the first instance from parent ViewGroup and then add the second instance to parent ViewGroup, I get a deadlock on UI thread.
  • Deadlock happens inside AwTestContainerView.HardwareView.requestRender() at line 172. (https://chromium.googlesource.com/chromium/src.git/+/42.0.2311.40/android_webview/test/shell/src/org/chromium/android_webview/test/AwTestContainerView.java#172)
  • While debugging, what I have observed is that:
    • On removing first AwTestContainerView instance from parent ViewGroup, AwTestContainerView.onDetachedFromWindow() function is called. Internally it calls AwContents.onDetachedFromWindow() and which eventually triggers NativeGLDelegate.requestDrawGl(null /* = canvas */, true /* = waitForCompletion */, containerView) function.
    • This calls HardwareView.requestRender() and as waitForCompletion is true so it keeps on waiting for mSyncLock.wait() to come out.
    • But even on calling super.requestRender() from inside HardwareView.requestRender(), HardwareView.Renderer's onDrawFrame() is not getting called. Due to this mSyncLock is never notified and it keeps on waiting resulting in ANR on UI thread.

Couple of observations changes that I am confused with:

Bo Liu

unread,
Apr 19, 2015, 5:07:25 AM4/19/15
to eagersoula...@gmail.com, chromium-dev
Hardware acceleration in the production android webview requires private APIs not in the sdk; you can read about the android render thread in L if you are interested. Hardware acceleration in the test shell emulates those private APIs with GLSurfaceView. The issue is each GLSurfaceView creates its own thread to run GL, but the production code is only designed to handle a single render thread per process. So the workaround is only create one hardware accelerated view in the shell. It's not a big deal since it's only a test shell.

If you want multiple hardware accelerated views, start with content shell, or at least content shell's rendering pipeline.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Sriram Mouli

unread,
Apr 19, 2015, 2:53:12 PM4/19/15
to chromi...@chromium.org
Hi Bo,

I had forked an old tree of chromium a while back and have been using the android content shell "30.0.1599.38".
I observe a similar issue where there is a deadlock in the main thread when I try to use multiple instances on certain flavors of 5.0.1.
I am still not sure where it is getting stuck(seems a deadlock in layerhost[impl] and thread_proxy), but the steps are same as described above.

The code is very old and a lot has changed but just wanted to check what might be the possible areas causing this so as to figure out a workaround?


Thanks,
Sriram

Bo Liu

unread,
Apr 19, 2015, 6:50:01 PM4/19/15
to srim...@gmail.com, chromium-dev
I have no idea.
 


Thanks,
Sriram

Eager Soul

unread,
Apr 20, 2015, 3:22:41 AM4/20/15
to bo...@chromium.org, chromium-dev

Thanks Bo.

I will look towards understanding your second suggestion of understanding content shell's rendering logic.

I am quite new to chromium source code  so let me read some docs on this and then code. Any pointers that you can give will be good and will give me a jump start.

Eager Soul

unread,
Apr 21, 2015, 1:46:44 PM4/21/15
to bo...@chromium.org, chromium-dev

Can someone please point to documentation which a content project  embedder should read? I guess starting point would be to read content shell code but it would be better if there's some docs to read as well.

Also, there seems to be an issue with android_webview_apk that if you open it then put it in background by clicking back and then bring it back again using recents activity bar or from launcher icon then you will see the deadlock that i mentioned earlier. From initial looks, it seems that surfaceView is destroyed before onDetachedFromWindow() is called in AwTestContainerView and because of that requestRender() is a no-op resulting in mSyncLock to never come out of wait.

Sriram Mouli

unread,
Apr 22, 2015, 7:37:30 AM4/22/15
to chromi...@chromium.org, srim...@gmail.com, bo...@chromium.org
HI Bo,

Sorry for the generic question. I spent some time on debugging the issue on the version that I have gets stuck at,

"context->getShaderiv(shader, GL_COMPILE_STATUS, &compiled)" in "program_binding.cc"

Even though the code versions are different, The steps are exactly same as being mentioned by Eager Soul including the BG/FG issue.

Would you know why this call can wait?

@EagerSoul - Is it possible for you to compile and run content_shell to see if the issue exists? Please try if you get some time and also let us the device details that you are trying with?

Re# documents I had started with https://www.chromium.org/developers/design-documents and most of the time was spent reading commit logs of content_shell directories to understand the intent better :-( 

Bo Liu

unread,
Apr 22, 2015, 8:30:01 AM4/22/15
to Sriram Mouli, chromium-dev
On Wed, Apr 22, 2015 at 12:37 PM, Sriram Mouli <srim...@gmail.com> wrote:
HI Bo,

Sorry for the generic question. I spent some time on debugging the issue on the version that I have gets stuck at,

"context->getShaderiv(shader, GL_COMPILE_STATUS, &compiled)" in "program_binding.cc"

Even though the code versions are different, The steps are exactly same as being mentioned by Eager Soul including the BG/FG issue.

Would you know why this call can wait?

DeferredGpuCommandService is not scheduling tasks, which may be expected depending on the circumstance.

Sriram Mouli

unread,
Apr 22, 2015, 9:12:47 AM4/22/15
to chromi...@chromium.org, srim...@gmail.com, bo...@chromium.org

For clarity I am running the content shell as is and running on a actual device. Is this expected even then and if so is there a way to prevent this? Also can you please provide a little more details on your answer.
Please let me know if I should read or post in some specific group.

Thanks
Sriram

Bo Liu

unread,
Apr 22, 2015, 10:01:31 AM4/22/15
to Sriram Mouli, chromium-dev
On Wed, Apr 22, 2015 at 2:12 PM, Sriram Mouli <srim...@gmail.com> wrote:

For clarity I am running the content shell as is and running on a actual device. Is this expected even then and if so is there a way to prevent this? Also can you please provide a little more details on your answer.

If it's an unmodified content shell, then I have no idea. That should not happen.
 
Please let me know if I should read or post in some specific group.


Webview overrides the scheduling of the command service with DeferredGpuCommandService, so it's very easy to get into deadlocks if you are not careful. That shouldn't be the case for content shell though.

Sriram Mouli

unread,
Apr 26, 2015, 1:25:13 AM4/26/15
to chromi...@chromium.org, bo...@chromium.org, srim...@gmail.com
Hi Bo,

I was able to get the latest code base and run the content_shell. I can confirm that there is NO ANR when opening a second shell instance and the rendering works.
I am trying to walk through the changes and see what can be the difference with our version of code.

Meanwhile I saw some changes related to Adreno 420/Nexus 6. Namely the following commits attempt to provide a workaround for the Nexus 6 devices,

Since our issue is also observed on the same device I tried porting these, but not much difference. We managed to run GDB on our version of content shell and from there see some thing related in the adreno native lib. 
Request you to go through the stack trace if you have some time and may be provide some hints on what might be wrong.

TIA,
Sriram.

// START EXTRACT STACKTRACE

#0  0xb6ece8fc in __ioctl () from /tmp/administrator-adb-gdb-libs/system/lib/libc.so
#1  0xb6ee6778 in ioctl () from /tmp/administrator-adb-gdb-libs/system/lib/libc.so
#2  0xab7757c4 in gsl_ldd_control () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/libgsl.so
#3  0xab775d3a in ioctl_kgsl_cmdstream_waittimestampevent () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/libgsl.so
#4  0xab7742aa in ?? () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/libgsl.so
#5  0xab75fca0 in gsl_command_waittimestamp () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/libgsl.so
#6  0xab68b850 in EsxCmdMgr::WaitForTimestamp(EsxTimestamp const*) () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#7  0xab68b89c in EsxCmdMgr::WaitForTimestamp(EsxTimestamp const*) () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#8  0xab696efc in EsxGfxMem::Destroy(EsxContext const*) () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#9  0xab622fb8 in EsxResource::SetGfxMem(EsxContext*, unsigned int, EsxGfxMem*) ()
   from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#10 0xab622fe4 in EsxResource::FreeSubResource(EsxContext*, EsxSubResource*) ()
   from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#11 0xab623094 in EsxResource::SetSubResource(EsxContext*, unsigned int, EsxSubResource*) ()
   from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#12 0xab6230d2 in EsxResource::Destroy(EsxContext*) () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#13 0xab628d38 in EsxTextureObject::Destroy(EsxContext*) () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#14 0xab611e22 in EsxGlObject::DecRefCount(EsxContext*) () from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#15 0xab5d83aa in EsxContext::GlBindTexture(unsigned int, unsigned int) ()
   from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so
#16 0xab6098c2 in EsxGlApiParamValidate::GlBindTexture(EsxDispatch*, unsigned int, unsigned int) ()
   from /tmp/administrator-adb-gdb-libs/system/vendor/lib/egl/libGLESv2_adreno.so



// END EXTRACT STACKTRACE
stack_trace.txt

Bo Liu

unread,
Apr 26, 2015, 12:12:09 PM4/26/15
to Sriram Mouli, chromium-dev
On Sat, Apr 25, 2015 at 10:25 PM, Sriram Mouli <srim...@gmail.com> wrote:
Hi Bo,

I was able to get the latest code base and run the content_shell. I can confirm that there is NO ANR when opening a second shell instance and the rendering works.
I am trying to walk through the changes and see what can be the difference with our version of code.

Meanwhile I saw some changes related to Adreno 420/Nexus 6. Namely the following commits attempt to provide a workaround for the Nexus 6 devices,

These were workarounds for rendering glitches. They would not cause lock ups.

Sriram Mouli

unread,
Apr 29, 2015, 6:53:16 AM4/29/15
to bo...@chromium.org, chromium-dev
Hi Bo,

I figured out the reason in my version of code to a Android Workaround that was added for Qualcomm GPU's in the older code base. Removing that fixes my issue.

Thanks for the pointers and sorry for connecting two unrelated issues.

--
Rgds
Sriram
Reply all
Reply to author
Forward
0 new messages