Help tracking down recent Mac regression causing flakiness

296 views
Skip to first unread message

Ken Russell

unread,
Feb 5, 2024, 3:52:34 PMFeb 5
to Chromium-dev
Hi Chromium developers,

Could I please ask Mac experts to help triage the root cause of a recent regression on macOS which is leading to severe flakiness in random tests?

Please see https://issues.chromium.org/issues/323927831 - here's the stack trace:

Received signal 10 BUS_ADRERR 00010de78000
0   Chromium Framework                  0x000000012c15b0b2 base::debug::CollectStackTrace(void const**, unsigned long) + 18
1   Chromium Framework                  0x000000012c148df3 base::debug::StackTrace::StackTrace() + 19
2   Chromium Framework                  0x000000012c15b001 base::debug::(anonymous namespace)::StackDumpSignalHandler(int, __siginfo*, void*) + 1329
3   libsystem_platform.dylib            0x00007ff80191f5ed _sigtramp + 29
4   ???                                 0x0000000000000000 0x0 + 0
5   AppKit                              0x00007ff805467b12 -[NSPersistentUIFileManager(Snapshot) writeWindowSnapshot:length:width:height:bytesPerRow:toFile:encryptingWithKey:uuid:checksum:fd:] + 3521
6   AppKit                              0x00007ff8054680d3 __172-[NSPersistentUIFileManager(Snapshot) writeWindowSnapshot:length:width:height:bytesPerRow:encryptingWithKey:uuid:checkChecksum:forWindowID:synchronously:completionHandler:]_block_invoke + 356
7   AppKit                              0x00007ff8054683c1 ___NSPersistentUIDispatchQueueAsync_block_invoke + 47
8   libdispatch.dylib                   0x00007ff80174fd91 _dispatch_call_block_and_release + 12
9   libdispatch.dylib                   0x00007ff801751033 _dispatch_client_callout + 8
10  libdispatch.dylib                   0x00007ff801757200 _dispatch_lane_serial_drain + 769
11  libdispatch.dylib                   0x00007ff801757d39 _dispatch_lane_invoke + 366
12  libdispatch.dylib                   0x00007ff8017623fc _dispatch_workloop_worker_thread + 765
13  libsystem_pthread.dylib             0x00007ff8018eec55 _pthread_wqthread + 327
14  libsystem_pthread.dylib             0x00007ff8018edbbf start_wqthread + 15
[end of stack trace]

This is extremely urgent - it's significantly affecting the CQ - appreciate your help.

Thanks,

-Ken

Ken Russell

unread,
Feb 6, 2024, 1:20:32 PMFeb 6
to Chromium-dev
Hi again Chromium developers,

Could PartitionAlloc experts please look at this bug too? Something changed recently to cause crashes on this Grand Central Dispatch thread and it is critical that we track down why - the Mac GPU bots on Chromium's CQ have become inexplicably, unacceptably flaky. PartitionAlloc is implicated in at least one of the stack traces.

Thank you,

-Ken


Ken Russell

unread,
Feb 8, 2024, 6:27:02 PMFeb 8
to Chromium-dev, Brian Sheedy
Hi again Chromium colleagues,

FYI Brian Sheedy landed a workaround for this issue - in the test harness, retry any tests which fail in the first browser launch in the given shard. See https://chromium-review.googlesource.com/c/chromium/src/+/5277621 .

Perhaps this issue principally affects our automated testing machines, but I hope more folks can comment on https://issues.chromium.org/issues/323927831 with ideas on root causes.

Thanks,

-Ken


Reply all
Reply to author
Forward
0 new messages