Commit-Queue | +1 |
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Code-Review | +1 |
Seems exceptionally reasonable. Thank you. LGTM.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Martin, would you mind reviewing site_engagement_service.cc from WebsiteSettings POV? Do you suggest anything better than the DCHECK approach?
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I added some suggestions about how we can validate various invariants.
It's on my wishlist (and I've filed a bug for this), that HostContentSettingsMap should have more ergonomic APIs to work with origin-only settings, so that one doesn't have to make (and DCHECK) all these assumptions. Unfortunately, we still don't have it - so we have to call the clumsier APIs that work with ContentSettingsPattern and then DCHECK that the patterns are as we expect...
bool PatternMatchesUrlSet(const ContentSettingsPattern& pattern,
"PatternMatches" sounds like it refers to `ContentSettingsPattern::Matches()` which however checks {scheme, host, subdomain, port, path (file urls only)} for exact or wildcard match.
This only matches the scheme. How about "PatternSchemeMatchesUrlSet" or so?
We're also not expecting wildcard schemes in the pattern - DCHECK that the pattern `MatchesSingleOrigin()`?
ContentSettingsForOneType sites =
Why the rename?
Site engagement really stores origins, and this seems useful for the readability of the code below.
(Especially since in other places in Chrome, we use the term "site" to refer to something like eTLD+1. Though of course, having this entire concept named site engagement doesn't help with clarity.)
// provider other than PrefProvider writes to this database.
We can also DCHECK this.
`ContentSettingsForOneType` is just a vector of `ContentSettingsPatternSource`, where the `source` attribute tells you which provider the setting came from.
And I think technically there is one entry from the default provdier.
So we can do `DHECK(std::find_if_not(sites.begin(), sites.end(), [](const auto& entry) { return entry.source == kPrefProvider || entry.source == kDefaultProvider; }) == sites.end()`.
absl::flat_hash_set<std::string> patterns_to_dedupe_;
style nit: not an attribute, so no need to end it with `_`.
std::string pattern(site.primary_pattern.ToString());
Since we only care about primary, let's DCHECK that secondary is not used (i.e. it's wildcard)?
if (DCHECK_IS_ON() || site.incognito) {
auto [_, inserted] = patterns_to_dedupe_.insert(pattern);
if (!inserted) {
DCHECK(map->IsOffTheRecord());
continue;
}
}
Hmm, I'm not sure I understand what this is doing.
In production, when DCHECK is off, you'll only run this code if the exception comes from Incognito. But that also means you'll never have collision in the set, because you can't have duplicates between exceptions that are from Incognito. Only between regular and Incognito ones. So you'll never call `continue`, and you'll fall through to the rest of the code below, i.e. you'll process all exceptions. Thus never actually doing any de-duplication?
Because `patterns_to_dedupe_` is only here to check for the duplicates by failing to insert, right? The actual de-duplication is decided by continuing the rest of the operations below or not.
And conversely, when DCHECK() is on, we change how we iterate through the exceptions, which should not happen. DCHECK should just check an invariant, not affect the control flow.
If you implement the DCHECK that I mentioned in another comment, to check that exceptions only come from the default provider (which only has one - the global wildcard) and pref provider (which we know can only have duplicates between regular and Incognito), perhaps that's enough of validation and we don't need this logic here.
Or if we want to keep this validation, I think we'll need to split the logic for what to DCHECK and when to `continue`?
GURL url(std::move(pattern));
Note that pattern->String->URL is a tricky operation.
The pattern ["https", "example.com", port wildcard] will translate to string "https://example.com" which will translate to the URL "https://example.com:443", because the GURL() constructor will automatically add the default port.
However, in this case, it's safe, since we're not expecting port wildcards. Hence it may be useful to `DCHECK(pattern.MatchesSingleOrigin())` as I mentioned earlier.
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I added some suggestions about how we can validate various invariants.
It's on my wishlist (and I've filed a bug for this), that HostContentSettingsMap should have more ergonomic APIs to work with origin-only settings, so that one doesn't have to make (and DCHECK) all these assumptions. Unfortunately, we still don't have it - so we have to call the clumsier APIs that work with ContentSettingsPattern and then DCHECK that the patterns are as we expect...
Thank you for the review Martin. Very much appreciated.
bool PatternMatchesUrlSet(const ContentSettingsPattern& pattern,
"PatternMatches" sounds like it refers to `ContentSettingsPattern::Matches()` which however checks {scheme, host, subdomain, port, path (file urls only)} for exact or wildcard match.
This only matches the scheme. How about "PatternSchemeMatchesUrlSet" or so?
We're also not expecting wildcard schemes in the pattern - DCHECK that the pattern `MatchesSingleOrigin()`?
Done
Why the rename?
Site engagement really stores origins, and this seems useful for the readability of the code below.
(Especially since in other places in Chrome, we use the term "site" to refer to something like eTLD+1. Though of course, having this entire concept named site engagement doesn't help with clarity.)
I didn't mean it as a rename since the code came from `GetEngagementOriginsFromContentSettings` which uses `site` to refer to these objects, but I agree it isn't correct, good flag.
I renamed to `pattern_source` and `pattern_sources`, since I feel like `origin` is not correct to refer to the pattern source either.
// provider other than PrefProvider writes to this database.
We can also DCHECK this.
`ContentSettingsForOneType` is just a vector of `ContentSettingsPatternSource`, where the `source` attribute tells you which provider the setting came from.
And I think technically there is one entry from the default provdier.
So we can do `DHECK(std::find_if_not(sites.begin(), sites.end(), [](const auto& entry) { return entry.source == kPrefProvider || entry.source == kDefaultProvider; }) == sites.end()`.
Beautiful 🙏. Done.
style nit: not an attribute, so no need to end it with `_`.
Done
Since we only care about primary, let's DCHECK that secondary is not used (i.e. it's wildcard)?
Done
if (DCHECK_IS_ON() || site.incognito) {
auto [_, inserted] = patterns_to_dedupe_.insert(pattern);
if (!inserted) {
DCHECK(map->IsOffTheRecord());
continue;
}
}
Hmm, I'm not sure I understand what this is doing.
In production, when DCHECK is off, you'll only run this code if the exception comes from Incognito. But that also means you'll never have collision in the set, because you can't have duplicates between exceptions that are from Incognito. Only between regular and Incognito ones. So you'll never call `continue`, and you'll fall through to the rest of the code below, i.e. you'll process all exceptions. Thus never actually doing any de-duplication?
Because `patterns_to_dedupe_` is only here to check for the duplicates by failing to insert, right? The actual de-duplication is decided by continuing the rest of the operations below or not.
And conversely, when DCHECK() is on, we change how we iterate through the exceptions, which should not happen. DCHECK should just check an invariant, not affect the control flow.
If you implement the DCHECK that I mentioned in another comment, to check that exceptions only come from the default provider (which only has one - the global wildcard) and pref provider (which we know can only have duplicates between regular and Incognito), perhaps that's enough of validation and we don't need this logic here.
Or if we want to keep this validation, I think we'll need to split the logic for what to DCHECK and when to `continue`?
Oops you're totally right this is a no-op in production code. I was intending on checking the map for non-incognito rules after this check but didn't check that code in :P
In any case, I realized this relies on the implementation detail that incognito patterns are returned first from `GetSettingsForOneType` and that's probably not great, so in the current PS I just dedupe after the fact for incognito.
The DCHECK here is replaced with a "no duplicates" DCHECK one-liner at the end.
GURL url(std::move(pattern));
Note that pattern->String->URL is a tricky operation.
The pattern ["https", "example.com", port wildcard] will translate to string "https://example.com" which will translate to the URL "https://example.com:443", because the GURL() constructor will automatically add the default port.
However, in this case, it's safe, since we're not expecting port wildcards. Hence it may be useful to `DCHECK(pattern.MatchesSingleOrigin())` as I mentioned earlier.
Yes, I agree this is probably not great. However it is the current behavior which I didn't want to change. Added the DCHECK.
Is there a recommended approach from content settings code for this? I am happy to add a TODO to use e.g. ToRepresentativeUrl().
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Martin: looks like the DCHECK is failing in a browser test that overrides the default provider. I'm not familiar enough with this code to know if that's a test-only issue or a problem with the actual code assumptions.
If it's a problem with the code's assumptions I guess I could do the deduping unconditionally :/
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Martin: looks like the DCHECK is failing in a browser test that overrides the default provider. I'm not familiar enough with this code to know if that's a test-only issue or a problem with the actual code assumptions.
If it's a problem with the code's assumptions I guess I could do the deduping unconditionally :/
Since it's specifically tests that are fiddling with content settings providers failing, I'm sure it's a test issue. Otherwise this would trip up a lot more browsertests.
Though I admit I don't immediately see it. Those tests are overriding the supervised provider, and it seems they only do so for camera & mic. It doesn't look like the MockProvider has any values to return for site engagement, but I guess it does?
I can help debug if you show me what the return value of `GetContentSettingsForOneType()` is in these browsertests (or I can patch in the CL when I have time later).
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Martin ŠrámekMartin: looks like the DCHECK is failing in a browser test that overrides the default provider. I'm not familiar enough with this code to know if that's a test-only issue or a problem with the actual code assumptions.
If it's a problem with the code's assumptions I guess I could do the deduping unconditionally :/
Since it's specifically tests that are fiddling with content settings providers failing, I'm sure it's a test issue. Otherwise this would trip up a lot more browsertests.
Though I admit I don't immediately see it. Those tests are overriding the supervised provider, and it seems they only do so for camera & mic. It doesn't look like the MockProvider has any values to return for site engagement, but I guess it does?
I can help debug if you show me what the return value of `GetContentSettingsForOneType()` is in these browsertests (or I can patch in the CL when I have time later).
It contains a single entry:
```
[(https://127.0.0.1:43141, *) source=3 value={
"lastEngagementTime": 1.3402030661875894e+16,
"lastShortcutLaunchTime": 0.0,
"pointsAddedToday": 3.0,
"rawScore": 3.0
}
]
```
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |