Issue 8688 in angleproject: modern_combat_5 consistently fails on Pixel 6 perf

2 views
Skip to first unread message

abdol… via monorail

unread,
May 6, 2024, 5:40:47 PM5/6/24
to angleproj...@googlegroups.com
Status: Available
Owner: ----
Priority: Medium
Type: Defect

New issue 8688 by abdolr...@google.com: modern_combat_5 consistently fails on Pixel 6 perf
https://bugs.chromium.org/p/angleproject/issues/detail?id=8688

It seems that the trace `modern_combat_5` fails regularly during `angle_trace_perf_vulkan_tests` on the Pixel 6 perf bot (but not always).

Example: https://ci.chromium.org/ui/p/angle/builders/ci/android-pixel6-perf/2646/overview

The issue has not been reproduced yet. However, it looks like the test passes on the first run, but crashes on the next two runs.

--
You received this message because:
1. The project was configured to send all issue notifications to this address

You may adjust your notification preferences at:
https://bugs.chromium.org/hosting/settings

abdol… via monorail

unread,
May 7, 2024, 5:19:15 PM5/7/24
to angleproj...@googlegroups.com

Comment #1 on issue 8688 by abdolr...@google.com: modern_combat_5 consistently fails on Pixel 6 perf
https://bugs.chromium.org/p/angleproject/issues/detail?id=8688#c1

After some runs on the bot and looking at the crash traces, the crash seems to be due to a segfault from UpdateClientBufferData().

It was also seen that prior to this, some glMapBufferRange() returning null. Logging the GL error in MapBufferRange() yields 0x507, which is GL_CONTEXT_LOST. This is from GetValidGlobalContext() returning null in `GL_MapBufferRange()`.

abdol… via monorail

unread,
May 13, 2024, 4:40:15 PM5/13/24
to angleproj...@googlegroups.com
Updates:
Cc: rom...@google.com

Comment #2 on issue 8688 by abdolr...@google.com: modern_combat_5 consistently fails on Pixel 6 perf
https://bugs.chromium.org/p/angleproject/issues/detail?id=8688#c2

Update:
* I was able to repro this on the `main` branch on a local Pixel 6 device with the old build (used by some of the bots). (It may need locking the clocks to repro.)
* However, the device can also crash when not hot. Not sure if the crash rate increases due to high temperature.

* The context-lost error is coming from vkQueueSubmit(), where there is a Vulkan device-lost error.
* I tried using vkQueueWaitIdle() after the submission to catch the device-lost, but I was unable to see a crash with this change.

* A fence-related message is logged in logcat before the crash, possibly as a result of the driver crashing, similar to below:
```
05-09 15:37:43.659 22982 23014 E Fence : waitForever: Throttling EGL Production: fence 132 didn't signal in 3000 ms
05-09 15:37:43.659 22982 23014 I Fence : waitForever: fence(mali-mali.timeline254955-262) status(0)
05-09 15:37:43.659 22982 23014 I Fence : waitForever: sync point: timeline(mali.timeline) drv(mali) status(0) timestamp(0.000000)
```

* The submissions that show the error seem to occur during the call glDrawElements(..., 1368, ...), and seems like the first submission of those frames (e.g., Frame 170). The submission before it is in between the frames during EGL_SwapBuffers().

rom… via monorail

unread,
May 14, 2024, 10:38:23 AM5/14/24
to angleproj...@googlegroups.com
Updates:
Owner: rom...@google.com

Comment #3 on issue 8688 by rom...@google.com: modern_combat_5 consistently fails on Pixel 6 perf
https://bugs.chromium.org/p/angleproject/issues/detail?id=8688#c3

(No comment was entered for this change.)

rom… via monorail

unread,
May 14, 2024, 10:49:52 AM5/14/24
to angleproj...@googlegroups.com

Comment #4 on issue 8688 by rom...@google.com: modern_combat_5 consistently fails on Pixel 6 perf
https://bugs.chromium.org/p/angleproject/issues/detail?id=8688#c4

Amirali's cl for running a single test on bot and dumping logcat: https://crrev.com/c/5514930
Reply all
Reply to author
Forward
0 new messages