we have discovered an interesting but unfortunate interaction of Chromium's zombification of Objective-C objects on macOS with our product, Squish GUI Tester.
For context: Squish simulates user input on a GUI application and validates the on-screen results. For that purpose, it reacts to user input events that go through the GUI toolkit and it has far-reaching introspection capabilities. However, it must never interfere with the tested application in any way such as keeping strong references to GUI control instances. This makes it somewhat prone to dangling pointers as we are seeing with Chromium/CEF.
Chromium swizzles -dealloc to track object destruction, and Squish does the same. Squish swizzles -dealloc (and -alloc) as early as possible to track object lifetime in the application under test. Chromium then swizzles -dealloc again via ObjcEvilDoers::ZombieEnable when it is being initialized:
https://source.chromium.org/chromium/chromium/src/+/refs/tags/137.0.7119.1:components/crash/core/common/objc_zombie.hWhat happens at runtime now is that Squish stores references to instances of classes that cannot otherwise be looked up such as, but not only, NSMenu. On creation of an NSMenu instance, a pointer is stored in a list in Squish. The pointer will be removed again when we see the [NSMenu dealloc] invocation.
With zombification enabled, our swizzled -dealloc method doesn't seem to get called immediately, if at all. So when Chromium overwrites the instance memory and Squish still finds it in its cache, we dereference a dangling pointer. As far as we understand the zombie code, zombies are cleaned up after a certain number of them have accumulated.
When recording user interactions, Squish iterates known objects in order to find the name of the object that was interacted with. These look-ups are spontaneous in nature. Therefore they may run into zombified objects that Chromium didn't remove from the treadmill yet. This makes the issue hard to reproduce reliably.
We see that ZombieDealloc skips the -dealloc call if zombification is enabled:
https://source.chromium.org/chromium/chromium/src/+/refs/tags/137.0.7119.1:components/crash/core/common/objc_zombie.mm;l=123Skipping the dealloc invocation in ZombieDealloc results in Squish's custom -dealloc to be skipped, in turn creating the dangling pointers.
We can confirm that the problem goes away if we do a custom build and disable zombification in chrome_main_delegate.cc where ObjcEvilDoers::ZombieEnable(true, ...) is hardcoded.
If Squish swizzles -dealloc after Chromium, the problem would not occur because Squish's -dealloc implementation would be called before Chromium's. Alas Squish must swizzle as early as possible to catch all created objects. Additionally the application under test could initialize Chromium at a later time and Squish cannot determine when it happened.
Do you see any options to make this configurable at run time to avoid the need for a custom-built Chromium/CEF library?
From a Chromium foreigner's perspective, one option would be to control ZombieEnable via an environment variable. Alternatively, the ZombieEnable/ZombieDisable functions could be exported, though that appears to be less convenient to use. Or maybe yet another option we couldn't think of?