Chromium Android Vitals

235 views
Skip to first unread message

Ros

unread,
Jan 6, 2021, 3:42:08 AM1/6/21
to Chromium-dev

Hey Chromium devs,

I am working on a Chromium fork for Android and over the time we got the feeling that some Chromium versions are a bit unstable. We see increasing ANR and crash rates after some updates and judging from the backtraces we see, most of them are not caused by the code that we added. Unfortunately the backtraces don't always contain enough information to be sure where the crashes are coming from and it is very time intense to try to reproduce them. Often times this is not possible at all.

My question is, is there a way to better understand or even discuss these kind of crashes with Chromium devs? Does it make sense to report the critical issues to crbug.com?

Thanks!

Torne (Richard Coles)

unread,
Jan 6, 2021, 11:21:32 AM1/6/21
to Jens Jensen, Chromium-dev
On Wed, 6 Jan 2021 at 03:43, Ros <wangenh...@gmail.com> wrote:
Hey Chromium devs,

I am working on a Chromium fork for Android and over the time we got the feeling that some Chromium versions are a bit unstable. We see increasing ANR and crash rates after some updates and judging from the backtraces we see, most of them are not caused by the code that we added. Unfortunately the backtraces don't always contain enough information to be sure where the crashes are coming from and it is very time intense to try to reproduce them. Often times this is not possible at all.

How are you collecting the backtraces and ANRs? Are you using Crashpad like Chrome itself does, or something else? Can you give an example of what the backtraces you get look like, in "good" and "bad" cases?

Making sure crashes get reported in an actionable way is difficult on a large and complex codebase like Chromium, especially on Android where there are many additional complications, so the first thing I'd suggest is to make sure that you're not "missing out" on any of the work we've already done to capture useful crash information, rather than looking into specific instances of crashes.
 
My question is, is there a way to better understand or even discuss these kind of crashes with Chromium devs? Does it make sense to report the critical issues to crbug.com?

I'm not sure if we have a specific policy on reporting issues from forks (maybe someone else can comment). In instances where you can find a way to reproduce them it's worth checking if the same reproduction triggers a crash in Chrome (in which case you should definitely file a bug), but I totally understand that often that's not practical; we have to deal with many crashes without being able to reproduce them too.
If you have a symbolized stack for a particular crash then it's possible (though sometimes pretty fiddly) for us to look for similar/identical crash stacks in Chrome's crash report data which might point at existing bugs or just further information that you don't have, but I'm not sure that filing these on crbug.com is the best way to do that.

Ros

unread,
Jan 6, 2021, 12:00:07 PM1/6/21
to Chromium-dev, to...@chromium.org, Chromium-dev, Ros
Hey Torne,

we are collecting ANRs and crashes with the Google Play console and Firebase. We currently see two major issues which are happenign exclusively on Samsung devices. A native crash on 3 different device types running Andoird 8 and the other on a bunch of Samsung Galaxy devices running Android 10. The latter is the more critical one, as it affects much more users, this is the backtrace we get on Google Play:

android.view.ViewRootImpl$CalledFromWrongThreadException:
at android.view.ViewRootImpl.checkThread (ViewRootImpl.java:9873)
at android.view.ViewRootImpl.requestLayout (ViewRootImpl.java:1871)
at android.view.View.requestLayout (View.java:26335)
at com.android.internal.policy.DecorView.drawableChanged (DecorView.java:2102)
at com.android.internal.policy.DecorView.onConfigurationChanged (DecorView.java:2681)
at android.view.View.dispatchConfigurationChanged (View.java:14987)
at android.view.ViewGroup.dispatchConfigurationChanged (ViewGroup.java:1619)
at android.view.ViewRootImpl.updateConfiguration (ViewRootImpl.java:5075)
at android.app.ActivityThread.handleActivityConfigurationChanged (ActivityThread.java:6411)
at android.app.ActivityThread$ActivityClientRecord.lambda$init$0$ActivityThread$ActivityClientRecord (ActivityThread.java:666)
at android.app.-$$Lambda$ActivityThread$ActivityClientRecord$HOrG1qglSjSUHSjKBn2rXtX0gGg.onConfigurationChanged (Unknown Source:2)
at android.view.ViewRootImpl.performConfigurationChange (ViewRootImpl.java:5035)
at android.view.ViewRootImpl.performTraversals (ViewRootImpl.java:2828)
at android.view.ViewRootImpl.doTraversal (ViewRootImpl.java:2225)
at android.view.ViewRootImpl$TraversalRunnable.run (ViewRootImpl.java:9126)
at android.view.Choreographer$CallbackRecord.run (Choreographer.java:999)
at android.view.Choreographer.doCallbacks (Choreographer.java:797)
at android.view.Choreographer.doFrame (Choreographer.java:732)
at android.view.Choreographer$FrameDisplayEventReceiver.run (Choreographer.java:984)
at android.os.Handler.handleCallback (Handler.java:883)
at android.os.Handler.dispatchMessage (Handler.java:100)
at android.os.Looper.loop (Looper.java:237)
at android.app.ActivityThread.main (ActivityThread.java:8167)
at java.lang.reflect.Method.invoke (Native Method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run (RuntimeInit.java:496)
at com.android.internal.os.ZygoteInit.main (ZygoteInit.java:1100)


The native crash discloses even less helpful information. It affects only a hand full of users but is triggerd 150+ times per day:

#00 pc 000000000000d67c /system/lib64/libutils.so (android::RefBase::incStrong(void const*) const+8)

We did not use crashpad yet. Do you think that we could get more useful information with it, even for crashes that happen in production on our users devices? On first glance it looks like it's quite some effort to set it up and maintain it (compared to Firebase).

Thanks,
Ros

Torne (Richard Coles)

unread,
Jan 6, 2021, 2:54:57 PM1/6/21
to Reaz Hawlader Reaz Hawlader, Jens Jensen, Chromium-dev
On Wed, 6 Jan 2021 at 13:29, Reaz Hawlader Reaz Hawlader <reaz...@gmail.com> wrote:

On Wed, 6 Jan 2021, 11:03 pm Ros, <wangenh...@gmail.com> wrote:
Hey Torne,

we are collecting ANRs and crashes with the Google Play console and Firebase.

The Play console's crash reports for native crashes are based on Android's debuggerd output, which for many builds of Chromium is basically useless. debuggerd relies on being able to unwind the stack on-device using the binary's unwind tables, but this information significantly inflates Chromium's binary size and so isn't always included. This is controlled by the GN arg `exclude_unwind_tables`, which defaults to true if `is_official_build=true`.

Setting `exclude_unwind_tables=false` will allow Android to unwind the stack successfully in most cases, but will make your binary/APK significantly larger, especially on arm64, x86, and x86_64 (the effect is somewhat less bad on 32-bit arm). With that you would get a stack backtrace with many more stack frames (but just as hex offsets into the native library; you still have to look the symbols up yourself).

I suspect that Firebase's handling of native crashes has the same limitation, but we haven't used this in Chrome/WebView so have no direct experience with it.
This appears to be a known crash on some Samsung devices on particular OS versions which showed up in significant numbers from 87.0.4280.49 in our crash db and is tracked by crbug.com/1144660 (unfortunately only viewable to users with bug-editing permissions). Chromium makes a Dialog from a non-main-looper thread and some Samsung-specific code in the framework handles this incorrectly. It should be fixed by this revert: https://chromium-review.googlesource.com/c/chromium/src/+/2538417 (which was cherrypicked to 88).
 
The native crash discloses even less helpful information. It affects only a hand full of users but is triggerd 150+ times per day:

#00 pc 000000000000d67c /system/lib64/libutils.so (android::RefBase::incStrong(void const*) const+8)

Does it really only contain one stack frame, or is there a "#01" line as well that just has unhelpful-seeming information?
Even without the unwind tables being present in Chromium (as I discussed above), the system should be able to unwind through its *own* libraries like libutils at least to the point of the first stack frame that's in Chromium. If it's not even making it that far then there's nothing anyone can really do to diagnose it with this information.
 
We did not use crashpad yet. Do you think that we could get more useful information with it, even for crashes that happen in production on our users devices?

Crashpad works by creating its own "minidump" when a crash occurs, independently of the crash dumping facilities in the Android OS, and then uploading that minidump to a server. The minidump contains all the raw data from the thread's stack, which allows the server to unwind the stack using debug information saved from the time the binary was built. The dump also includes various Chromium-specific debugging information ("crash keys" and other data) which can help narrow down the causes of crashes and do various kinds of statistical analyses beyond the capabilities of the Play Store UI.

The advantage of this is that we can usually generate a valid stack backtrace on the server without having to ship the large unwind table data in the APK. The disadvantage is that it can't always unwind through the *non-chromium* parts of the stack correctly (e.g. android system libraries) as the server doesn't have access to the unwind table data for the other libraries on the device.

We rely almost entirely on Crashpad reports, rather than the Android OS's crash reports, for Chrome and WebView, so that we can avoid shipping the unwind tables and can have more control over the crash dumping process.
 
On first glance it looks like it's quite some effort to set it up and maintain it (compared to Firebase).

Yes. You have to run your own server, process the data yourself, and so on. Chrome already has all of this infrastructure for handling crashes on Windows/Mac/Linux/ChromeOS, so using it for Android as well costs us relatively little, but if you aren't already using Crashpad then the setup involved here would be significant (and I'm not an expert on what is involved; I'm fairly familiar with how the actual crash dumping process on the device works, but not with any of the backend parts).

So, you'd have to make your own call for your use whether you want to:

1) Adopt crashpad so that you get the same kind of crash handling we use for Chrome/WebView, at the cost of having to set up all your own infrastructure for it.

2) Build and ship your binaries with `exclude_unwind_tables=false` so that debuggerd can generate more useful crash information and feed it into the Play Store's crash reporting system (and probably firebase as well), at the cost of your APK being larger.

3) Just accept that you can't really do anything to debug crashes in native code unless you can reproduce them locally.


For uncaught Java exceptions the normal Android mechanisms for reporting are generally fine; you normally get the entire stack as you showed above. We also report Java exceptions via Crashpad, but mostly just so that we have uniform/consistent handling and avoid having to check multiple different systems, rather than because there's any technical issue with the "normal" way of handling it.
 

Thanks,
Ros
to...@chromium.org schrieb am Mittwoch, 6. Januar 2021 um 17:21:32 UTC+1:
On Wed, 6 Jan 2021 at 03:43, Ros <wangenh...@gmail.com> wrote:
Hey Chromium devs,

I am working on a Chromium fork for Android and over the time we got the feeling that some Chromium versions are a bit unstable. We see increasing ANR and crash rates after some updates and judging from the backtraces we see, most of them are not caused by the code that we added. Unfortunately the backtraces don't always contain enough information to be sure where the crashes are coming from and it is very time intense to try to reproduce them. Often times this is not possible at all.

How are you collecting the backtraces and ANRs? Are you using Crashpad like Chrome itself does, or something else? Can you give an example of what the backtraces you get look like, in "good" and "bad" cases?

Making sure crashes get reported in an actionable way is difficult on a large and complex codebase like Chromium, especially on Android where there are many additional complications, so the first thing I'd suggest is to make sure that you're not "missing out" on any of the work we've already done to capture useful crash information, rather than looking into specific instances of crashes.
 
My question is, is there a way to better understand or even discuss these kind of crashes with Chromium devs? Does it make sense to report the critical issues to crbug.com?

I'm not sure if we have a specific policy on reporting issues from forks (maybe someone else can comment). In instances where you can find a way to reproduce them it's worth checking if the same reproduction triggers a crash in Chrome (in which case you should definitely file a bug), but I totally understand that often that's not practical; we have to deal with many crashes without being able to reproduce them too.
If you have a symbolized stack for a particular crash then it's possible (though sometimes pretty fiddly) for us to look for similar/identical crash stacks in Chrome's crash report data which might point at existing bugs or just further information that you don't have, but I'm not sure that filing these on crbug.com is the best way to do that.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/af1509dd-9997-402a-b3a9-5ce240c37ffan%40chromium.org.

Reaz Hawlader Reaz Hawlader

unread,
Jan 6, 2021, 11:15:05 PM1/6/21
to wangenh...@gmail.com, Chromium-dev, to...@chromium.org
On Wed, 6 Jan 2021, 11:03 pm Ros, <wangenh...@gmail.com> wrote:
--

Ros

unread,
Jan 7, 2021, 8:58:44 AM1/7/21
to Chromium-dev, to...@chromium.org, Ros, Chromium-dev
Hey Torne,

Thanks for the detailed answer and the helpful information you shared. We will run some tests with `exclude_unwind_tables` in our alpha channel to see how much more helpful information we can get from backtraces. We can tolerate the additional file size at this channel if it helps us to develop a more robust version for our stable channel.
I think at this point setting up crashpad is not an option, but it's something we will look into for the future. Regarding the two backtraces I've shared - for the first one I checked the commit with the revert you shared but it turned out that the Chromium version we use (86) does not contain the changes that would be reverted. So this cannot be the cause for the crashes in this case. We found another commit which we hope will help with the CalledFromWronThreadExceptions on Samsung devices: https://chromium-review.googlesource.com/c/chromium/src/+/2545773

The native crash does not have a "#01" line but a second one with "#00". We already tried to symbolicate it with the help of third_party/android_platform/development/scripts/stack but this does not give us additional information. Maybe setting exclude_unwind_tables=false in our next alpha release will help us with this crash.

Of course we understand that it's impossible to have 0 crashes with a complex and huge code base like Chromium and a platform like Android with its device fragmentation. But we always try to stay below the bad behavior threshold to make sure our Play Store ranking will not be penalized by Google. Can you share some information about how this is handled for Chrome? Do you announce fixes for severe issues (not only in terms of security) in a place we (and other Chromium forks) could observe to be aware of issues and fixes ahead of time and not only notice them after we released a new version? Usually we are 2-3 versions behind the latest Chromium, so in theory that could be possible.

Best regards,
Ros

Torne (Richard Coles)

unread,
Jan 7, 2021, 10:01:37 AM1/7/21
to Ros, Chromium-dev
On Thu, 7 Jan 2021 at 08:58, Ros <wangenh...@gmail.com> wrote:
Hey Torne,

Thanks for the detailed answer and the helpful information you shared. We will run some tests with `exclude_unwind_tables` in our alpha channel to see how much more helpful information we can get from backtraces. We can tolerate the additional file size at this channel if it helps us to develop a more robust version for our stable channel.
I think at this point setting up crashpad is not an option, but it's something we will look into for the future. Regarding the two backtraces I've shared - for the first one I checked the commit with the revert you shared but it turned out that the Chromium version we use (86) does not contain the changes that would be reverted. So this cannot be the cause for the crashes in this case. We found another commit which we hope will help with the CalledFromWronThreadExceptions on Samsung devices: https://chromium-review.googlesource.com/c/chromium/src/+/2545773

Yeah, this is another related CL, I think from the history things changed back and forth here a bit so there might be slightly different combinations of things enabled on different versions. I'm not familiar with this code, I was just trying to briefly summarise the bug I found linked to a similar crash stack.

The native crash does not have a "#01" line but a second one with "#00". We already tried to symbolicate it with the help of third_party/android_platform/development/scripts/stack but this does not give us additional information. Maybe setting exclude_unwind_tables=false in our next alpha release will help us with this crash.

All the stack script does is take addresses in the binary and tell you what the address corresponds to, so if the stack backtrace doesn't contain the addresses of the stack frames it can't do anything to help you.
 
Of course we understand that it's impossible to have 0 crashes with a complex and huge code base like Chromium and a platform like Android with its device fragmentation. But we always try to stay below the bad behavior threshold to make sure our Play Store ranking will not be penalized by Google. Can you share some information about how this is handled for Chrome?

We have a lot of people and automated alerting monitoring bug reports, crash data, UMA metrics, and so on, and the release team prioritise fixing issues that are having a disproportionate impact on users, including respinning stable releases when necessary. I assume people do also keep an eye on the Android vitals metrics collected through Play, but I don't know the process; we primarily rely on Chrome's own monitoring, not Android's, for most things.
 
Do you announce fixes for severe issues (not only in terms of security) in a place we (and other Chromium forks) could observe to be aware of issues and fixes ahead of time and not only notice them after we released a new version?

As far as I know we don't generally announce bug fixes anywhere, but you could consider monitoring the changes that go into the stable branch after the initial stable release - that's likely to be a set of things we felt were very important, as the bar to cherrypick changes to stable is high.
Unfortunately for crash-related issues a lot of the corresponding crbugs are restricted to only be visible to project members because some crashes may turn out to be exploitable security bugs, and so by default we don't make bugs that contain full crash data/stacks/etc public. Individual bugs can be made public if it's unlikely they have a security impact but it's not always the case that anyone spends time to determine that, especially after a fix has already been landed.
Ideally you shouldn't rely just on announcements to know what security issues exist either - lots of bugfixes can ultimately be relevant for security, even if no CVE exists or even if the author of the fix didn't realise it could be security-related at the time.

Ros

unread,
Jan 8, 2021, 10:11:49 AM1/8/21
to Chromium-dev, to...@chromium.org, Chromium-dev, Ros
Thanks Torne,

specially the last part of your answer is helpful. We will closely monitor the stable branch also after the releases from now on. I'm also looking forward to see the results of using exclude_unwind_tables in alpha releases. Let me say again that I appreciate your fast and informative responses to my questions. Have a nice weekend.

Pradeep H

unread,
Feb 27, 2023, 2:24:18 AM2/27/23
to Chromium-dev, Ros, to...@chromium.org
Hi Ros,

We are trying to integrate Firebase Crashlytics into Chromium currently v100.0.4896.127
We are able to get Java Crash reports in Firebase console but Native crashes are not getting recorded in firebase and in Teminal we can see following error 
libcrashlytics could not be loaded. This APK may not have been compiled for this device's architecture. NDK crashes will not be reported to Crashlytics.

Can you suggest how this issue can be fixed ?

Thanks,
Pradeep

Reply all
Reply to author
Forward
0 new messages