ios: Use atomic renames instead of flock() in CrashReportDatabaseMac [crashpad/crashpad : main]

14 views
Skip to first unread message

Justin Cohen (Gerrit)

unread,
May 13, 2026, 3:30:42 PM (3 days ago) May 13
to Mark Mentovai, crashpa...@luci-project-accounts.iam.gserviceaccount.com, crashp...@chromium.org
Attention needed from Mark Mentovai

Justin Cohen voted and added 1 comment

Votes added by Justin Cohen

Commit-Queue+1

1 comment

Patchset-level comments
File-level comment, Patchset 23 (Latest):
Justin Cohen . resolved

OK -- this one dates back to a comment made by Ben Hamilton that has been living rent free in my head since 2022 (see https://chromium-review.git.corp.google.com/c/crashpad/crashpad/+/3868711/comments/529290ef_f69f163d)

What do you think of this change? Chrome (and all the other iOS crashpad clients) see quite a few watchdog kills for holding locks in shared containers.

Open in Gerrit

Related details

Attention is currently required from:
  • Mark Mentovai
Submit Requirements:
  • requirement satisfiedCode-Owners
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedReview-Enforcement
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: crashpad/crashpad
Gerrit-Branch: main
Gerrit-Change-Id: Ia6f906b5ea6b50ed449ee841aac06ceaf34df602
Gerrit-Change-Number: 7815316
Gerrit-PatchSet: 23
Gerrit-Owner: Justin Cohen <justi...@google.com>
Gerrit-Reviewer: Justin Cohen <justi...@google.com>
Gerrit-Reviewer: Mark Mentovai <ma...@chromium.org>
Gerrit-Attention: Mark Mentovai <ma...@chromium.org>
Gerrit-Comment-Date: Wed, 13 May 2026 19:30:40 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: Yes
satisfied_requirement
unsatisfied_requirement
open
diffy

Mark Mentovai (Gerrit)

unread,
May 13, 2026, 3:35:29 PM (3 days ago) May 13
to Justin Cohen, crashpa...@luci-project-accounts.iam.gserviceaccount.com, crashp...@chromium.org
Attention needed from Justin Cohen

Mark Mentovai added 2 comments

Patchset-level comments
Justin Cohen . resolved

OK -- this one dates back to a comment made by Ben Hamilton that has been living rent free in my head since 2022 (see https://chromium-review.git.corp.google.com/c/crashpad/crashpad/+/3868711/comments/529290ef_f69f163d)

What do you think of this change? Chrome (and all the other iOS crashpad clients) see quite a few watchdog kills for holding locks in shared containers.

Mark Mentovai

OK -- this one dates back to a comment made by Ben Hamilton that has been living rent free in my head since 2022 (see https://chromium-review.git.corp.google.com/c/crashpad/crashpad/+/3868711/comments/529290ef_f69f163d)

Time to evict it?

Line 60, Patchset 23 (Latest):constexpr char kProcessingDirectory[] = "processing";
Mark Mentovai . unresolved

This introduces a brand new state, which would require a very careful review of several interfaces here, but I’m not going to do this because I think that this is actually infeasible.

How do things migrate out of the “processing” state if the uploader crashes (or the system loses power, etc.) while a report is in that state?

Open in Gerrit

Related details

Attention is currently required from:
  • Justin Cohen
Submit Requirements:
    • requirement satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: crashpad/crashpad
    Gerrit-Branch: main
    Gerrit-Change-Id: Ia6f906b5ea6b50ed449ee841aac06ceaf34df602
    Gerrit-Change-Number: 7815316
    Gerrit-PatchSet: 23
    Gerrit-Owner: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Mark Mentovai <ma...@chromium.org>
    Gerrit-Attention: Justin Cohen <justi...@google.com>
    Gerrit-Comment-Date: Wed, 13 May 2026 19:35:27 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Justin Cohen <justi...@google.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Justin Cohen (Gerrit)

    unread,
    May 13, 2026, 3:43:59 PM (3 days ago) May 13
    to Mark Mentovai, crashpa...@luci-project-accounts.iam.gserviceaccount.com, crashp...@chromium.org
    Attention needed from Mark Mentovai

    Justin Cohen added 1 comment

    Line 60, Patchset 23 (Latest):constexpr char kProcessingDirectory[] = "processing";
    Mark Mentovai . unresolved

    This introduces a brand new state, which would require a very careful review of several interfaces here, but I’m not going to do this because I think that this is actually infeasible.

    How do things migrate out of the “processing” state if the uploader crashes (or the system loses power, etc.) while a report is in that state?

    Justin Cohen

    Thank you -- I appreciate that this would require a detailed review.

    Reports migrate out of the processing state via `CleanDatabase(time_t lockfile_ttl)`

    When `CleanDatabase()` runs, it scans the processing directory. Any report whose file modification time is older than `lockfile_ttl` is recovered by renaming it back to the pending directory. This mirrors the existing timeout-based recovery
    mechanism used for expired lockfiles (client/crash_report_database_generic.cc lines 750–766) and stale new reports (client/crash_report_database_mac.mm lines 733–750.

    In both `handler/prune_crash_reports_thread.cc` and `client/ios_handler/prune_intermediate_dumps_and_crash_reports_thread.cc` lockfile_ttl is set to 3 days.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Mark Mentovai
    Submit Requirements:
    • requirement satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: crashpad/crashpad
    Gerrit-Branch: main
    Gerrit-Change-Id: Ia6f906b5ea6b50ed449ee841aac06ceaf34df602
    Gerrit-Change-Number: 7815316
    Gerrit-PatchSet: 23
    Gerrit-Owner: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Mark Mentovai <ma...@chromium.org>
    Gerrit-Attention: Mark Mentovai <ma...@chromium.org>
    Gerrit-Comment-Date: Wed, 13 May 2026 19:43:56 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Mark Mentovai <ma...@chromium.org>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mark Mentovai (Gerrit)

    unread,
    May 13, 2026, 4:29:18 PM (3 days ago) May 13
    to Justin Cohen, crashpa...@luci-project-accounts.iam.gserviceaccount.com, crashp...@chromium.org
    Attention needed from Justin Cohen

    Mark Mentovai added 1 comment

    Line 60, Patchset 23 (Latest):constexpr char kProcessingDirectory[] = "processing";
    Mark Mentovai . unresolved

    This introduces a brand new state, which would require a very careful review of several interfaces here, but I’m not going to do this because I think that this is actually infeasible.

    How do things migrate out of the “processing” state if the uploader crashes (or the system loses power, etc.) while a report is in that state?

    Justin Cohen

    Thank you -- I appreciate that this would require a detailed review.

    Reports migrate out of the processing state via `CleanDatabase(time_t lockfile_ttl)`

    When `CleanDatabase()` runs, it scans the processing directory. Any report whose file modification time is older than `lockfile_ttl` is recovered by renaming it back to the pending directory. This mirrors the existing timeout-based recovery
    mechanism used for expired lockfiles (client/crash_report_database_generic.cc lines 750–766) and stale new reports (client/crash_report_database_mac.mm lines 733–750.

    In both `handler/prune_crash_reports_thread.cc` and `client/ios_handler/prune_intermediate_dumps_and_crash_reports_thread.cc` lockfile_ttl is set to 3 days.

    Mark Mentovai

    Thank you -- I appreciate that this would require a detailed review.

    Reports migrate out of the processing state via `CleanDatabase(time_t lockfile_ttl)`

    When `CleanDatabase()` runs, it scans the processing directory. Any report whose file modification time is older than `lockfile_ttl` is recovered by renaming it back to the pending directory. This mirrors the existing timeout-based recovery
    mechanism used for expired lockfiles (client/crash_report_database_generic.cc lines 750–766) and stale new reports (client/crash_report_database_mac.mm lines 733–750.

    In both `handler/prune_crash_reports_thread.cc` and `client/ios_handler/prune_intermediate_dumps_and_crash_reports_thread.cc` lockfile_ttl is set to 3 days.

    I don’t love it. `flock` gives us immediate recovery and retry.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Justin Cohen
    Submit Requirements:
    • requirement satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: crashpad/crashpad
    Gerrit-Branch: main
    Gerrit-Change-Id: Ia6f906b5ea6b50ed449ee841aac06ceaf34df602
    Gerrit-Change-Number: 7815316
    Gerrit-PatchSet: 23
    Gerrit-Owner: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Mark Mentovai <ma...@chromium.org>
    Gerrit-Attention: Justin Cohen <justi...@google.com>
    Gerrit-Comment-Date: Wed, 13 May 2026 20:29:16 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Mark Mentovai <ma...@chromium.org>
    Comment-In-Reply-To: Justin Cohen <justi...@google.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Justin Cohen (Gerrit)

    unread,
    May 14, 2026, 10:03:41 AM (2 days ago) May 14
    to Mark Mentovai, crashpa...@luci-project-accounts.iam.gserviceaccount.com, crashp...@chromium.org
    Attention needed from Mark Mentovai

    Justin Cohen added 1 comment

    Line 60, Patchset 23:constexpr char kProcessingDirectory[] = "processing";
    Mark Mentovai . unresolved

    This introduces a brand new state, which would require a very careful review of several interfaces here, but I’m not going to do this because I think that this is actually infeasible.

    How do things migrate out of the “processing” state if the uploader crashes (or the system loses power, etc.) while a report is in that state?

    Justin Cohen

    Thank you -- I appreciate that this would require a detailed review.

    Reports migrate out of the processing state via `CleanDatabase(time_t lockfile_ttl)`

    When `CleanDatabase()` runs, it scans the processing directory. Any report whose file modification time is older than `lockfile_ttl` is recovered by renaming it back to the pending directory. This mirrors the existing timeout-based recovery
    mechanism used for expired lockfiles (client/crash_report_database_generic.cc lines 750–766) and stale new reports (client/crash_report_database_mac.mm lines 733–750.

    In both `handler/prune_crash_reports_thread.cc` and `client/ios_handler/prune_intermediate_dumps_and_crash_reports_thread.cc` lockfile_ttl is set to 3 days.

    Mark Mentovai

    Thank you -- I appreciate that this would require a detailed review.

    Reports migrate out of the processing state via `CleanDatabase(time_t lockfile_ttl)`

    When `CleanDatabase()` runs, it scans the processing directory. Any report whose file modification time is older than `lockfile_ttl` is recovered by renaming it back to the pending directory. This mirrors the existing timeout-based recovery
    mechanism used for expired lockfiles (client/crash_report_database_generic.cc lines 750–766) and stale new reports (client/crash_report_database_mac.mm lines 733–750.

    In both `handler/prune_crash_reports_thread.cc` and `client/ios_handler/prune_intermediate_dumps_and_crash_reports_thread.cc` lockfile_ttl is set to 3 days.

    I don’t love it. `flock` gives us immediate recovery and retry.

    Justin Cohen

    How about ten minutes? In client/settings.h we set the kUploadReportTimeoutSeconds to 20 seconds iOS and 60 seconds on macOS.

    I can recover any "processing" reports older than 10 minutes back to pending.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Mark Mentovai
    Submit Requirements:
    • requirement satisfiedCode-Owners
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    • requirement is not satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: crashpad/crashpad
    Gerrit-Branch: main
    Gerrit-Change-Id: Ia6f906b5ea6b50ed449ee841aac06ceaf34df602
    Gerrit-Change-Number: 7815316
    Gerrit-PatchSet: 24
    Gerrit-Owner: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Justin Cohen <justi...@google.com>
    Gerrit-Reviewer: Mark Mentovai <ma...@chromium.org>
    Gerrit-Attention: Mark Mentovai <ma...@chromium.org>
    Gerrit-Comment-Date: Thu, 14 May 2026 14:03:38 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy
    Reply all
    Reply to author
    Forward
    0 new messages