Attention is currently required from: Jeremy Roman.
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Liviu Tinta.
9 comments:
Patchset:
+domenic for the discussion around decoding search parameters
File net/base/url_search_params.h:
Patch Set #8, Line 28: void sort();
here and elsewhere: Google style requires that this be called "Sort" with an uppercase S (etc)
https://google.github.io/styleguide/cppguide.html#Function_Names
Patch Set #8, Line 34: const std::vector<std::pair<base::StringPiece, std::string>>& Params() const;
This, on the other hand, can be an inline accessor and so can be named `params`.
https://google.github.io/styleguide/cppguide.html#Function_Names
Patch Set #8, Line 39: std::vector<std::pair<base::StringPiece, std::string>> params_;
I think it's unintuitive that these string pieces point into the original URL, and that's a potential lifetime issue. I'd just use std::string for the keys here; in practice a lot of query parameters will fit inside the small string optimization anyhow.
File net/base/url_search_params.cc:
Patch Set #8, Line 23: params_.emplace_back(it.GetKey(), it.GetUnescapedValue());
The URL spec runs the application/x-www-form-urlencoded parser (https://url.spec.whatwg.org/#concept-urlencoded-parser) on the input, which replaces 0x2B (+) with 0x20 (space) and then does percent-decoding and UTF-8 decoding on both the key and the value.
This is ultimately something that should be figured out on the spec level (I think Domenic is planning on doing that), but my gut is that if NVS is specified, we should be able to assume that is an acceptable way of interpreting the query string. (To do otherwise might be possible but is finicky.)
i.e., A=1&%41=2 becomes «("A", "1"), ("A", "2")»
If so, this should become something like:
```
namespace {
// The notion of unescaping used in the application/x-www-form-urlencoded parser.
// https://url.spec.whatwg.org/#concept-urlencoded-parser
std::string UnescapeUrlencoded(base::StringPiece input) {
std::string result = input;
// Replace any 0x2B (+) with 0x20 (SP).
for (char& c : result) {
if (c == '+')
c = ' ';
}
// Run UTF-8 decoding without BOM on the percent-decoding.
url::RawCanonOutputT<char> canon_output;
url::DecodeURLEscapeSequences(
input_without_plus.data(),
input_without_plus.size(),
url::DecodeURLMode::kUTF8,
&canon_output);
result.assign(canon_output.data(), canon_output.length());
return result;
}
} // namespace
params_.emplace_back(UnescapeUrlencoded(it.GetKey()),
UnescapeUrlencoded(it.GetValue()));
```
Annoyingly, we seem to have different unescaping functions in base/, net/ and url/, all with different quirks. url::DecodeURLEscapeSequences seems most correct on some inspection, except that it doesn't replace +.
If so, some unit tests for these edge cases wouldn't hurt.
Patch Set #8, Line 44: params_.erase(
`base::EraseIf` (from base/containers/cxx20_erase_map.h) is handy here.
```
base::EraseIf(params_, [&](const auto& pair) { return names.contains(pair.first); });
```
File net/http/http_no_vary_search_data.h:
Patch Set #8, Line 43: const HttpNoVarySearchData& no_vary_search);
nit: Might as well just make this a member function, rather than static?
nit: make it clearer what the sense of the result is, either with a comment or with a name that implies it (UrlsAreEquivalent or similar)?
nit: elsewhere you're using "Url" in names instead of "URL" for consistency with Google style; do so here too?
File net/http/http_no_vary_search_data.cc:
Patch Set #8, Line 44: GURL request_gurl(request_url);
nit: "url" vs "gurl" doesn't clearly distinguish how these are different, and in fact they aren't until the call to ReplaceComponents. We could have fresh variables for the "without parameters" versions, but it might be more concise to just do it in the if statement itself.
```
// Check URLs without query and reference (fragment) for equality first.
GURL::Replacements replacements;
replacements.ClearRef();
replacements.ClearQuery();
if (request_url.ReplaceComponents(replacements) !=
cached_url.ReplaceComponents(replacements)) {
return false;
}
```
Patch Set #8, Line 64: std::vector<base::StringPiece> keys_to_remove;
This seems overcomplicated; we could just have UrlSearchParams have a "DeleteAllExcept" function which does this more efficiently by having the same implementation as DeleteAllWithNames except with a single ! before the call to contains.
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Jeremy Roman.
8 comments:
File net/base/url_search_params.h:
Patch Set #8, Line 28: void sort();
here and elsewhere: Google style requires that this be called "Sort" with an uppercase S (etc) […]
Done
Patch Set #8, Line 34: const std::vector<std::pair<base::StringPiece, std::string>>& Params() const;
This, on the other hand, can be an inline accessor and so can be named `params`. […]
Done
Patch Set #8, Line 39: std::vector<std::pair<base::StringPiece, std::string>> params_;
I think it's unintuitive that these string pieces point into the original URL, and that's a potentia […]
Done
File net/base/url_search_params.cc:
Patch Set #8, Line 23: params_.emplace_back(it.GetKey(), it.GetUnescapedValue());
The URL spec runs the application/x-www-form-urlencoded parser (https://url.spec.whatwg. […]
GURL already is UTF8 and percent encoded.
Seems to me that, with GURL already in UTF8 and percent encoded, using the same approach as for `QueryIterator::GetUnescapedValue()` to unescape keys should be enough. I've added `QueryIterator::GetUnescapedKey()` which has similar implementation to `QueryIterator::GetUnescapedValue()`. This implementation replaces + with space as well. I've also added tests to cover the cases you mentioned above (A=1&%41=2).
Patch Set #8, Line 44: params_.erase(
`base::EraseIf` (from base/containers/cxx20_erase_map.h) is handy here. […]
Done
File net/http/http_no_vary_search_data.h:
Patch Set #8, Line 43: const HttpNoVarySearchData& no_vary_search);
nit: Might as well just make this a member function, rather than static? […]
Done
File net/http/http_no_vary_search_data.cc:
Patch Set #8, Line 44: GURL request_gurl(request_url);
nit: "url" vs "gurl" doesn't clearly distinguish how these are different, and in fact they aren't un […]
Done
Patch Set #8, Line 64: std::vector<base::StringPiece> keys_to_remove;
This seems overcomplicated; we could just have UrlSearchParams have a "DeleteAllExcept" function whi […]
Done
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Liviu Tinta.
1 comment:
File net/base/url_search_params.cc:
Patch Set #8, Line 23: params_.emplace_back(it.GetKey(), it.GetUnescapedValue());
GURL already is UTF8 and percent encoded. […]
It's the percent _decoding_ that I have questions about. GURL gives valid ASCII, but that doesn't mean that the percent-decoding a substring of it is valid ASCII or even valid UTF-8.
For example, the string %C3%A9=foo is valid: after percent-decoding and UTF-8 decoding it gives the key-value pair ("é", "foo"). But the string %C3=foo has an incomplete UTF-8 sequence only after percent-decoding -- even though the underlying encoded URL component is valid ASCII. It should be decoded to ("\ufffd", "foo"), using a UTF-8 replacement character -- at least if we're following the application/x-www-form-urlencoded parser.
By inspecting the code, this does not seem to be base::UnescapeURLComponent does.
(I think the URL is always valid ASCII; if this weren't the case we'd also have to look at UTF-8 sequences which are only partially percent-encoded. But I think that cannot happen because every UTF-8 sequence is either an ASCII character or consists only of non-ASCII characters.)
There's also this long list of things which are never unescaped by base::UnescapeURLComponent, presumably because when displayed in the UI they are visually confusing. https://source.chromium.org/chromium/chromium/src/+/main:base/strings/escape.cc;l=280-383;drc=7473aada23e43186bbbd4491f8f7f5630e34e109
There's no counterpart to this in the application/x-www-form-urlencoded spec.
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Jeremy Roman.
1 comment:
File net/base/url_search_params.cc:
Patch Set #8, Line 23: params_.emplace_back(it.GetKey(), it.GetUnescapedValue());
It's the percent _decoding_ that I have questions about. […]
Done
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Liviu Tinta.
Patch set 14:Code-Review +1
6 comments:
File net/base/url_search_params.cc:
Patch Set #14, Line 46: // E.g. a space character might be encoded as '+' or as "%20". A digit might
nit: digit -> character (or possibly "code point")?
In the context of text encoding, I would have expected "digit" to refer to ASCII '0' to '9' (or possibly the extended Unicode digit class).
File net/base/url_search_params_unittest.cc:
nit: maybe worth an invalid UTF-8 sequence to check the U+FFFD REPLACEMENT CHARACTER behavior, even though it's incredibly silly and niche?
File net/http/http_no_vary_search_data.cc:
Patch Set #14, Line 44: request_url.ReplaceComponents(replacements);
Delete this and the next line. This operation doesn't do anything in-place but simply returns a value which is discarded here and on the next line. (The if statement uses them correctly already.)
Patch Set #14, Line 47: cached_url.ReplaceComponents(replacements))
nit: braces {} are required since the if statement as a whole spans more than two lines
https://google.github.io/styleguide/cppguide.html#Conditionals
Patch Set #14, Line 57: no_vary_params().end());
This is exactly a copy of no_vary_params(). We can just pass that in without copying.
```
request_search_params.DeleteAllWithNames(no_vary_params());
cached_search_params.DeleteAllWithNames(no_vary_params());
```
ditto below
File net/http/http_no_vary_search_data_unittest.cc:
Patch Set #14, Line 642: "No-Vary-Search: key-order\r\n\r\n",
Could we have a test case checking that we did use a stable sort (that preserves relative order of params with the same key)? e.g., ?a=1&a=2&b=3 vs ?a=2&b=3&a=1
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola.
6 comments:
File net/base/url_search_params.cc:
Patch Set #14, Line 46: // E.g. a space character might be encoded as '+' or as "%20". A digit might
nit: digit -> character (or possibly "code point")? […]
Done
File net/base/url_search_params_unittest.cc:
nit: maybe worth an invalid UTF-8 sequence to check the U+FFFD REPLACEMENT CHARACTER behavior, even […]
Done
File net/http/http_no_vary_search_data.cc:
Patch Set #14, Line 44: request_url.ReplaceComponents(replacements);
Delete this and the next line. […]
Done
Patch Set #14, Line 47: cached_url.ReplaceComponents(replacements))
nit: braces {} are required since the if statement as a whole spans more than two lines […]
Done
Patch Set #14, Line 57: no_vary_params().end());
This is exactly a copy of no_vary_params(). We can just pass that in without copying. […]
Done
File net/http/http_no_vary_search_data_unittest.cc:
Patch Set #14, Line 642: "No-Vary-Search: key-order\r\n\r\n",
Could we have a test case checking that we did use a stable sort (that preserves relative order of p […]
Done
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Adam Rice, Domenic Denicola.
1 comment:
Patchset:
Adam, ptal.
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Liviu Tinta.
16 comments:
File net/base/url_search_params.h:
Patch Set #17, Line 18: // Class that exposes functionality similar to blink::UrlSearchParams for
I would prefer to explain what this class does independently without making reference to Blink at all.
Patch Set #17, Line 24: // Runs a stable sort by key of all of the query search params.
Please add copy constructor and assignment operators (with = delete if you don't need this class to be copyable).
Patch Set #17, Line 33: void DeleteAll();
DeleteAll() doesn't seem to be used outside tests. Can you remove it?
File net/base/url_search_params.cc:
Patch Set #17, Line 41: } // namespace
Blank line after "}" please.
Patch Set #17, Line 48: // representation (e.g. ?%63=2 should be the same as ?c=2). Codepoints that
This comment seems to be incorrect. DecodeURLEscapeSequences() converts all valid escapes to UTF-16, regardless of whether they can be encoded as ASCII.
File net/base/url_search_params_unittest.cc:
Patch Set #17, Line 17: UrlSearchParams search_params(GURL("https://a.com/index.html?a=1&b=2&c=3"));
Although it makes no difference here, it's good practice to use non-existent domains in tests, like `https://a.test/`
Patch Set #17, Line 19: EXPECT_EQ(search_params.params()[0].first, "a");
These expectations are quite verbose and hard to read. I suggest you change them to
```
EXPECT_THAT(search_params.params(),
ElementsAre(Pair("a", "1), Pair("b", "2"), Pair("c", "3")));
```
Patch Set #17, Line 107: EXPECT_EQ(search_params.params()[0].first, "é");
It would be good to have a test that an encoded é and an unencoded é are treated the same, ie. that the sort is still stable if you mix them.
It would be good to have tests with encoded "&" and "=" characters in the keys or values.
It would also be good to test keys with no values, ie. https://a.test/?foo&bar.
It would be good to have an invalid escape test, eg. https://a.test/?a=%2&b=%3
It would be good to test empty keys, eg. https://a.test?=5&=1
File net/http/http_no_vary_search_data.h:
Patch Set #17, Line 40: bool UrlsAreEquivalent(const GURL& request_url, const GURL& cached_url) const;
Optionally, you could call this method just "AreEquivalent", as it is obvious from the argument that it is about URLs.
To make it clearer that the method is not sensitive to the order of the arguments, you could name them just "a" and "b" instead of "request_url" and "cached_url".
File net/http/http_no_vary_search_data_unittest.cc:
Patch Set #17, Line 597: GURL request_url;
I suggest putting const on these members. I think it should compile.
Patch Set #17, Line 610: std::string headers =
These local variables could be const too.
Patch Set #17, Line 617: EXPECT_EQ(no_vary_search_data.UrlsAreEquivalent(test_data.request_url,
Optionally, you could add some information that would make it easier to debug test failures, ie.
```
EXPECT_EQ(...) << "request_url=" << request_url
<< " cached_url=" << cached_url
<< " headers=" << raw_headers
<< " match=" << expected_match;
```
Patch Set #17, Line 622: NoVarySearchCompareTestData no_vary_search_compare_tests[] = {
I suggest making this array const to make it obvious that are not going to change it.
Patch Set #17, Line 624: // are not equivalent.
Just to be extra careful, it would be good to have additional tests for when the scheme, username, password, or path are different.
I think we should test that % encoding of characters works correctly here as well, since HttpNoVarySearchData could in principle be re-implemented without UrlSearchParams.
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Adam Rice, Domenic Denicola.
16 comments:
File net/base/url_search_params.h:
Patch Set #17, Line 18: // Class that exposes functionality similar to blink::UrlSearchParams for
I would prefer to explain what this class does independently without making reference to Blink at al […]
Done
Patch Set #17, Line 24: // Runs a stable sort by key of all of the query search params.
Please add copy constructor and assignment operators (with = delete if you don't need this class to […]
Done
Patch Set #17, Line 33: void DeleteAll();
DeleteAll() doesn't seem to be used outside tests. […]
Done
File net/base/url_search_params.cc:
Patch Set #17, Line 41: } // namespace
Blank line after "}" please.
Done
Patch Set #17, Line 48: // representation (e.g. ?%63=2 should be the same as ?c=2). Codepoints that
This comment seems to be incorrect. […]
Done
File net/base/url_search_params_unittest.cc:
Patch Set #17, Line 17: UrlSearchParams search_params(GURL("https://a.com/index.html?a=1&b=2&c=3"));
Although it makes no difference here, it's good practice to use non-existent domains in tests, like […]
Done
Patch Set #17, Line 19: EXPECT_EQ(search_params.params()[0].first, "a");
These expectations are quite verbose and hard to read. I suggest you change them to […]
Done
Patch Set #17, Line 107: EXPECT_EQ(search_params.params()[0].first, "é");
It would be good to have a test that an encoded é and an unencoded é are treated the same, ie. […]
Done
It would be good to have tests with encoded "&" and "=" characters in the keys or values. […]
Done
File net/http/http_no_vary_search_data.h:
Patch Set #17, Line 40: bool UrlsAreEquivalent(const GURL& request_url, const GURL& cached_url) const;
Optionally, you could call this method just "AreEquivalent", as it is obvious from the argument that […]
Done
File net/http/http_no_vary_search_data_unittest.cc:
Patch Set #17, Line 597: GURL request_url;
I suggest putting const on these members. I think it should compile.
Done
Patch Set #17, Line 610: std::string headers =
These local variables could be const too.
Done
Patch Set #17, Line 617: EXPECT_EQ(no_vary_search_data.UrlsAreEquivalent(test_data.request_url,
Optionally, you could add some information that would make it easier to debug test failures, ie. […]
Done
Patch Set #17, Line 622: NoVarySearchCompareTestData no_vary_search_compare_tests[] = {
I suggest making this array const to make it obvious that are not going to change it.
Done
Patch Set #17, Line 624: // are not equivalent.
Just to be extra careful, it would be good to have additional tests for when the scheme, username, p […]
Done
I think we should test that % encoding of characters works correctly here as well, since HttpNoVaryS […]
Done
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Liviu Tinta.
Patch set 20:Code-Review +1
1 comment:
Patchset:
lgtm
To view, visit change 4027122. To unsubscribe, or for help writing mail filters, visit settings.
Attention is currently required from: Domenic Denicola, Liviu Tinta.
Patch set 20:Commit-Queue +2
Chromium LUCI CQ submitted this change.
[No-Vary-Search] Add logic to check two URLs for equivalence
Add UrlSearchParams class in net/base to be able to sort search
parameters by key, remove search parameters by key, list search
parameters with their corresponding values to check for equality.
Add logic to check two URLs for equivalence in the presence of
No-Vary-Search conditions.
Bug: 1378075, 1378072
Change-Id: Icde99c8b30451d52cb68970d5eb3aee868c9175c
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4027122
Reviewed-by: Adam Rice <ri...@chromium.org>
Commit-Queue: Liviu Tinta <liviu...@chromium.org>
Reviewed-by: Jeremy Roman <jbr...@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1076209}
---
M net/BUILD.gn
A net/base/url_search_params.cc
A net/base/url_search_params.h
A net/base/url_search_params_unittest.cc
M net/http/http_no_vary_search_data.cc
M net/http/http_no_vary_search_data.h
M net/http/http_no_vary_search_data_unittest.cc
7 files changed, 636 insertions(+), 9 deletions(-)