grt@
Could you please review this?
I believe this is in line with [your recommendation](https://chromium-review.googlesource.com/c/chromium/src/+/6865893/comments/5aa4564c_08ae4256).
Thank you.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Commit-Queue | +1 |
Hi joenotcharles@.
I'm currently facing an issue with slow database checkpointing in the HTTP cache that uses the SQL disk cache. To address this, I've created this CL that utilizes MatchingScenarioObserver, which I believe you implemented.
I was hoping you could take a look at my approach. Do you think this is an appropriate use of MatchingScenarioObserver for this purpose? Also, I would be grateful for any advice on potential pitfalls or things I should be particularly careful about with this implementation.
Thank you for your time and expertise.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
observer_list->AddMatchingObserver(this);
can you use a `base::ScopedObservation` for this so that the observer is removed automatically in `~StoragePartitionImpl`?
NET_EXPORT extern const base::FeatureParam<int>
kSqlDiskCacheForceCheckpointThreshold;
// If the number of pages recorded in the WAL file of the SQL disk cache's DB
// exceeds this value and the browser is idle, a checkpoint is executed.
NET_EXPORT extern const base::FeatureParam<int>
kSqlDiskCacheIdleCheckpointThreshold;
since these are accessed on every commit to the db, shall we use `BASE_DECLARE_FEATURE_PARAM` and `BASE_FEATURE_PARAM` for them (see base/feature_list.h)?
&kDiskCacheBackendExperiment, "SqlDiskCacheIdleCheckpointThreshold", 10000};
this is 10x the SQLite default, yes? is there an advantage to letting the WAL grow so much? this will mean that each checkpoint is more expensive, no?
.set_wal_commit_callback(base::BindRepeating(
am i correct in thinking that this whole class is only used if the client is in the DiskCacheBackendExperiment?
const bool force_checkpoint =
one side effect here is that if the system is never reaches the idle state, we end up having a more janky experience than before introducing this code since we will checkpoint doubly-sized WALs synchronously.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
📍 Job win-10-perf/system_health.common_desktop complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/15d14139910000
📍 Job win-10-perf/system_health.common_desktop complete.
See results at: https://pinpoint-dot-chromeperf.appspot.com/job/1673faa4510000
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
observer_list->AddMatchingObserver(this);
can you use a `base::ScopedObservation` for this so that the observer is removed automatically in `~StoragePartitionImpl`?
Done
NET_EXPORT extern const base::FeatureParam<int>
kSqlDiskCacheForceCheckpointThreshold;
// If the number of pages recorded in the WAL file of the SQL disk cache's DB
// exceeds this value and the browser is idle, a checkpoint is executed.
NET_EXPORT extern const base::FeatureParam<int>
kSqlDiskCacheIdleCheckpointThreshold;
since these are accessed on every commit to the db, shall we use `BASE_DECLARE_FEATURE_PARAM` and `BASE_FEATURE_PARAM` for them (see base/feature_list.h)?
Done
&kDiskCacheBackendExperiment, "SqlDiskCacheIdleCheckpointThreshold", 10000};
this is 10x the SQLite default, yes? is there an advantage to letting the WAL grow so much? this will mean that each checkpoint is more expensive, no?
Ah, I think this value should be 1000, the same as the SQLite's default value. I've fixed it.
.set_wal_commit_callback(base::BindRepeating(
am i correct in thinking that this whole class is only used if the client is in the DiskCacheBackendExperiment?
Yes.
This is used only when DiskCacheBackendExperiment feature is enabled with backend=sql param
const bool force_checkpoint =
one side effect here is that if the system is never reaches the idle state, we end up having a more janky experience than before introducing this code since we will checkpoint doubly-sized WALs synchronously.
I've set the default value for SqlDiskCacheForceCheckpointThreshold to be 20 times the default value in SQLite. This is because, during my testing on an Android device, I observed that the number of pages could accumulate to around 10,000 before the system became idle.
I plan to conduct further experiments using various thresholds and monitor the UMA data to determine a more appropriate, smaller threshold.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +1 |
Overall seems good to me. Consider checking with @oliv...@chromium.org, who may have some intuition about thresholds and such.
const bool force_checkpoint =
Tsuyoshi Horoone side effect here is that if the system is never reaches the idle state, we end up having a more janky experience than before introducing this code since we will checkpoint doubly-sized WALs synchronously.
I've set the default value for SqlDiskCacheForceCheckpointThreshold to be 20 times the default value in SQLite. This is because, during my testing on an Android device, I observed that the number of pages could accumulate to around 10,000 before the system became idle.
I plan to conduct further experiments using various thresholds and monitor the UMA data to determine a more appropriate, smaller threshold.
Do you think that we're reaching 10,000 pages in the test because it's a particularly heavy-weight test, or do you think that'll be the norm?
Perhaps we should consider lowering the limit for idle detection since a given checkpoint will be cheaper if fewer pages have accumulated.
It seems to me that these are exactly the same tradeoffs that V8 must make for timing GCs. Is there anything we can learn from the policies used there?
// Notifies the network context that the browser is idle. This can be used
@joenot...@google.com: based on [this comment](https://source.chromium.org/chromium/chromium/src/+/main:components/performance_manager/scenario_api/performance_scenario_memory.cc;drc=3f8932533ccf9426786d1e0416d2cad13f1c991d;l=66), i have the impression that one day we should be able to directly observe the idle scenario in the network service. how far off is that today?
@ho...@chromium.org: if the above is correct, would that let us move the observation out of `StoragePartitionImpl` and into `NetworkContext` or directly into `SqlPersistentStoreImpl` (or elsewhere, depending on threading requirements and where we want to add the dependency on the performance manager's scenario API)? if true, let's leave a TODO(crbug.com/365586676) to clean this up once the scenario API is available in the network service.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
const bool force_checkpoint =
Tsuyoshi Horoone side effect here is that if the system is never reaches the idle state, we end up having a more janky experience than before introducing this code since we will checkpoint doubly-sized WALs synchronously.
Greg ThompsonI've set the default value for SqlDiskCacheForceCheckpointThreshold to be 20 times the default value in SQLite. This is because, during my testing on an Android device, I observed that the number of pages could accumulate to around 10,000 before the system became idle.
I plan to conduct further experiments using various thresholds and monitor the UMA data to determine a more appropriate, smaller threshold.
Do you think that we're reaching 10,000 pages in the test because it's a particularly heavy-weight test, or do you think that'll be the norm?
Perhaps we should consider lowering the limit for idle detection since a given checkpoint will be cheaper if fewer pages have accumulated.
It seems to me that these are exactly the same tradeoffs that V8 must make for timing GCs. Is there anything we can learn from the policies used there?
I confirmed that after navigating through about three consecutive news site pages, a checkpoint of about 10,000 pages was executed when the browser became idle. Under normal user behavior, I anticipate an idle state would occur after each page transition, so a threshold of a few thousand pages might be sufficient. I plan to investigate the actual UMA data.
Lowering the threshold for the idle state could be a good idea. However, it's known that in certain environments, the disk synchronization process that [occurs](https://source.chromium.org/chromium/chromium/src/+/main:third_party/sqlite/src/src/wal.c;l=2260;drc=af3f9228a9de368fed823968f2dec49ae5558e12) during a checkpoint can be extremely slow (specifically [FlushFileBuffers](https://source.chromium.org/chromium/chromium/src/+/main:third_party/sqlite/src/src/os_win.c;l=3338;drc=d7cb1ea7ba2c0d168762db7a6814e34ac6524cf0) on Windows). In those situations, it might be better for overall performance to accumulate as many pages as possible before executing a checkpoint. I will examine the UMA data closely regarding this matter as well.
// Notifies the network context that the browser is idle. This can be used
@joenot...@google.com: based on [this comment](https://source.chromium.org/chromium/chromium/src/+/main:components/performance_manager/scenario_api/performance_scenario_memory.cc;drc=3f8932533ccf9426786d1e0416d2cad13f1c991d;l=66), i have the impression that one day we should be able to directly observe the idle scenario in the network service. how far off is that today?
@ho...@chromium.org: if the above is correct, would that let us move the observation out of `StoragePartitionImpl` and into `NetworkContext` or directly into `SqlPersistentStoreImpl` (or elsewhere, depending on threading requirements and where we want to add the dependency on the performance manager's scenario API)? if true, let's leave a TODO(crbug.com/365586676) to clean this up once the scenario API is available in the network service.
The browser's state (scenario) is shared between processes via shared memory. My understanding is that, at the time this comment was [written](https://chromium-review.googlesource.com/c/chromium/src/+/5846016) in September 2024, the state was only shared between the browser and renderer processes, and the comment noted that it should be made visible to utility processes as well. However, since [CL 6018369](https://chromium-review.googlesource.com/c/chromium/src/+/6018369) landed in November 2024, this shared memory has been made readable by all child processes, including the network service process. Therefore, I believe this comment is no longer accurate.
According to [this design document](https://docs.google.com/document/d/1Q2mK4kBKAUI5AgOUSANM6WQDgKEoLvmrVB2ySROqzCQ/edit?tab=t.0#heading=h.bcnk63ipv3e8), sending an IPC from the browser process to the renderer process to signal a scenario change was not part of the original plan, and I believe it has not been implemented to date.
Is my understanding correct?
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
nidhijaju@
Could you please review histograms.xml?
Thank you.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +1 |
histograms lgtm
<variants name="SqlPersistentStoreCheckpointType">
It's not very clear what these variants mean. Would it be worth adding a summary that we use in the histogram summary?
It's not very clear what these variants mean. Would it be worth adding a summary that we use in the histogram summary?
Done
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +1 |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +1 |
// the network context is available, call NotifyBrowserIdle.
Nit: at first I thought this could miss checkpoints, if the browser becomes idle just before the `network_context` is assigned. In that case as long as the browser remains idle, the observer wouldn't be notified again so OnNetworkIdle would never be called.
But I see now that the idle state is also tested whenever new data is committed, so OnCommitCallback should eventually run the checkpoint if that happens. This could use a comment explaining that.
// Notifies the network context that the browser is idle. This can be used
Tsuyoshi Horo@joenot...@google.com: based on [this comment](https://source.chromium.org/chromium/chromium/src/+/main:components/performance_manager/scenario_api/performance_scenario_memory.cc;drc=3f8932533ccf9426786d1e0416d2cad13f1c991d;l=66), i have the impression that one day we should be able to directly observe the idle scenario in the network service. how far off is that today?
@ho...@chromium.org: if the above is correct, would that let us move the observation out of `StoragePartitionImpl` and into `NetworkContext` or directly into `SqlPersistentStoreImpl` (or elsewhere, depending on threading requirements and where we want to add the dependency on the performance manager's scenario API)? if true, let's leave a TODO(crbug.com/365586676) to clean this up once the scenario API is available in the network service.
The browser's state (scenario) is shared between processes via shared memory. My understanding is that, at the time this comment was [written](https://chromium-review.googlesource.com/c/chromium/src/+/5846016) in September 2024, the state was only shared between the browser and renderer processes, and the comment noted that it should be made visible to utility processes as well. However, since [CL 6018369](https://chromium-review.googlesource.com/c/chromium/src/+/6018369) landed in November 2024, this shared memory has been made readable by all child processes, including the network service process. Therefore, I believe this comment is no longer accurate.
According to [this design document](https://docs.google.com/document/d/1Q2mK4kBKAUI5AgOUSANM6WQDgKEoLvmrVB2ySROqzCQ/edit?tab=t.0#heading=h.bcnk63ipv3e8), sending an IPC from the browser process to the renderer process to signal a scenario change was not part of the original plan, and I believe it has not been implemented to date.
Is my understanding correct?
Yes. The current state can be checked from the network process, but the PerformanceScenarioObserver is only notified on changes in the browser process.
One way to do everything in the network process right now would be to add a timer that polls to see if the state's become idle every few seconds as long as `wal_pages_` is above the threshold. This shouldn't add much overhead (eg. extra wakeups) since the timer would only fire repeatedly when the browser's already non-idle.
However, with my Mojo reviewer hat on, this IPC LGTM as is.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// the network context is available, call NotifyBrowserIdle.
Nit: at first I thought this could miss checkpoints, if the browser becomes idle just before the `network_context` is assigned. In that case as long as the browser remains idle, the observer wouldn't be notified again so OnNetworkIdle would never be called.
But I see now that the idle state is also tested whenever new data is committed, so OnCommitCallback should eventually run the checkpoint if that happens. This could use a comment explaining that.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
18 is the latest approved patch-set.
The change was submitted with unreviewed changes in the following files:
```
The name of the file: content/browser/storage_partition_impl.cc
Insertions: 6, Deletions: 2.
@@ -3869,8 +3869,12 @@
void StoragePartitionImpl::OnScenarioMatchChanged(
performance_scenarios::ScenarioScope scope,
bool matches_pattern) {
- // If the scenario matches performance_scenarios::kDefaultIdleScenarios and
- // the network context is available, call NotifyBrowserIdle.
+ // If the scenario matches `performance_scenarios::kDefaultIdleScenarios` and
+ // the network context is available, call `NotifyBrowserIdle()`. It's
+ // possible to miss an idle notification if the browser becomes idle before
+ // the network context is initialized. This is not a problem because the idle
+ // state is also checked when data is committed to the cache, so a checkpoint
+ // will eventually be triggered.
if (matches_pattern && network_context_owner_->network_context.get()) {
network_context_owner_->network_context->NotifyBrowserIdle();
}
```
net: Defer disk cache checkpointing until idle
This CL modifies the SQL-based disk cache backend to defer WAL
checkpointing, primarily until the browser is idle. This helps to avoid
blocking the main database sequence during active periods, improving
performance.
A `PerformanceScenarioObserver` is added to `StoragePartitionImpl` to
detect when the browser becomes idle. When idleness is detected, it
notifies the `NetworkContext` via a new Mojo method
`NotifyBrowserIdle()`.
Checkpointing is now triggered in three scenarios:
1. Immediately after a commit if the WAL file size exceeds a high-water
mark (`kSqlDiskCacheForceCheckpointThreshold`).
2. Immediately after a commit if the browser is already idle and the WAL
file size exceeds a lower threshold
(`kSqlDiskCacheIdleCheckpointThreshold`).
3. When the browser transitions to an idle state, if the WAL file size
is above the idle threshold. This is triggered by
`NotifyBrowserIdle()`.
This logic is controlled by two new feature parameters for the thresholds.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |