Attention is currently required from: Joshua Peraza, Ben Hamilton, Justin Cohen, Robert Sesek, Mark Mentovai.
Patch set 5:Commit-Queue +1
1 comment:
Patchset:
This is weird -- casting a wide net of reviewers to see if anyone can identify if this fix makes sense, and more importantly, if it implies a larger problem with the iOS handler and not simply the test fixture.
Right now on iOS the best way to test crashes is via XCUITests, which involve a test runner process and the application host process. I just added some logic to the application host to log stderr to a file via freopen so the test runner can ask the host for the previous apps previous run's stderr messages -- and then verify whether some condition was met.
This introduced a lot of flake which I am able to reproduce locally when running the tests repeatedly (usually it takes between 5 and 20 runs to reproduce, but sometimes it never reproduces). Some debugging showed the flake is because the intermediate dump file the handler produces has stderr logging embedded in it. Specifically:
2022-03-10 21:50:41.174 ios_crash_xcuitests[80537:44420872] The EDOHostService (0x600000d0c5b0) is created and listening on 12345
<followed by the intermediate dump binary format>
That's weird.
The order of events prior to this fix CL would be:
freopen(/path/to/test/stderr.log, "a", stderr)
Crashpad client initialization:
Cache an intermediate dump writer via:
fd_ = open_dprotected_np(/path/to/crashpad/database/cached.dump.file, NSFileProtectionNone, see: https://support.apple.com/guide/security/data-protection-classes-secb010e978a/web)
EDO initialization:
An NSLog("The EDOHostService (0xFF) is created..")
Note: NSLog will also write to STDERR when it's redirected away (see: https://developer.apple.com/documentation/foundation/1395074-nslogv?language=objc)
Crash:
SIGABRT is handled by Crashpad, which writes the intermediate dump to the fd_ created from open_dprotected_np in Crashpad.
Result:
Sometimes, every 20 or so runs locally, it appears that the NSLog write goes to the cached intermediate dump writer's fd_, making the intermediate dump invalid, failing the test.
Solution:
Initialize Crashpad first before using freopen. I'm unable to reproduce the failure on the bots or locally after this change.
But I also don't understand how this can happen.
Thank you for reading!
To view, visit change 3517773. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Joshua Peraza, Ben Hamilton, Justin Cohen, Robert Sesek.
1 comment:
File test/ios/host/cptest_application_delegate.mm:
Patch Set #5, Line 152: CHECK(freopen(GetStderrOutputFile().value().c_str(), "a", stderr) != nullptr);
Bad idea:
`freopen` is, in effect, `fclose` followed by `fopen`, recycling the `FILE*`. That means that, for your use, `stderr` stays “alive” but winds up pointed to your new file. But `fclose` will do a `close` on the underlying file descriptor, in this case, `close(STDERR_FILENO)` or `close(2)`. Usually, the `open` will follow quickly enough that the new file to be associated with `stderr` winds up at the same file descriptor, but there’s no guarantee of this. Really, it’s a race.
```
mark@arm-and-hammer zsh% cat t_freopen_stderr.cc
// clang++ -std=c++11 -Wall -Werror -Os t_freopen_stderr.cc -o t_freopen_stderr
#include <err.h>
#include <fcntl.h>
#include <paths.h>
#include <stdio.h>
#include <unistd.h>
#include <thread>
int main(int argc, char * argv[]) {
std::thread thread([]() {
while (true) {
int fd = open(_PATH_DEVNULL, O_RDONLY);
if (fd < 0) {
err(3, "open");
}
close(fd);
}
});size_t attempt = 0;
int fd;
do {
if (!freopen(_PATH_DEVNULL, "a", stderr)) {
err(2, "freopen");
}
++attempt;
} while ((fd = fileno(stderr)) == STDERR_FILENO);
printf("%d ≠ %d, attempt %zu\n", fd, STDERR_FILENO, attempt);return EXIT_FAILURE;
}
mark@arm-and-hammer zsh% clang++ -std=c++11 -Wall -Werror -Os t_freopen_stderr.cc -o t_freopen_stderr
mark@arm-and-hammer zsh% ./t_freopen_stderr
3 ≠ 2, attempt 9
zsh: abort ./t_freopen_stderr
```
The `FILE*` is recycled, but the file descriptor may not be.
12.0.1 https://github.com/apple-oss-distributions/Libc/blob/Libc-1506.40.4/stdio/FreeBSD/freopen.c#L147 (in modern use we always see __DARWIN_UNIX03 semantics).
Not everything that wants to write to stderr will do so via the stdio interface. There are enough things that expect to be able to write directly to `STDERR_FILENO` that it must be considered reserved. `NSLog` is likely one of those things. If you really don’t want stderr, the smart move is to not just close it, but to dup /dev/null over `STDERR_FILENO`: that keeps `STDERR_FILENO` occupied, so that nothing else will open at that FD, and writes from various other code (like `NSLog`) won’t wind up going to some arbitrary file. The same applies to the other two standard streams, stdin and stdout.
Since `freopen(…, stderr)` implies `close(STDERR_FILENO)`, you can’t do that either.
To view, visit change 3517773. To unsubscribe, or for help writing mail filters, visit settings.