How can I canonicalize a URL when url::Canonicalize isn't canonical enough?
```
#include "url/url_util.h"
void canonicalize(const std::string src) {
constexpr bool trim_path_end = false;
std::string dst;
url::StdStringCanonOutput o(&dst);
url::Parsed p;
url::Canonicalize(src.data(), src.size(), trim_path_end, NULL, &o, &p);
o.Complete();
LOG(ERROR) << dst;
}
// etc
canonicalize("
http://a.example.com/x-y=z");
canonicalize("
http://b.example.com/x%2Dy%3Dz");
```
prints
```
http://a.example.com/x-y=z
http://b.example.com/x-y%3Dz
```
so the "%2D" and "%3D" are treated differently. I'm guessing because
"=" / "%3D" is a reserved character:
https://en.wikipedia.org/wiki/URL_encoding#Types_of_URI_characters
It looks deliberate, although it's not obvious why:
https://source.chromium.org/chromium/chromium/src/+/main:url/url_canon_unittest.cc;l=1301;drc=ed519e442491476fbf09e2e419efb27716a94bed
Neither "url/url_canon.h", "url/url_util.h" or
https://source.chromium.org/chromium/chromium/src/+/main:url/README.md
give detail on what "URL canonicalization" means exactly.
My problem is that I'm have URLs whose path contains what looks like a
base-64 encoded something, and base-64 uses "=" for padding. Some
times my URLs have "=" and other times they have "%3D" and I'd like to
canonicalize them so I can compare for equality.
Converting a std::string to a GURL and then callling path() doesn't
help, since that basically just invokes url::Canonicalize().
This is for ChromiumOS Fusebox
(
https://source.chromium.org/chromium/chromium/src/+/main:chrome/browser/ash/fusebox/README.md)
drag-and-drop, where third party JS can offer filesystem URLs as
drag-and-drop data sources. I want to compare these URLs (and their
prefixes) with an allow-list, and thought that canonicalization would
facilitate that, but url::Canonicalize doesn't collapse "=" and "%3D"
to the same thing.
I'm writing C++ (not JS), so I don't have access to JS's
decodeURIComponent() or decodeURI().
"url/url_util.h" does offer a DecodeURLEscapeSequences C++ function
but I'm hesitant to use it because, IIUC, it's not idempotent. Given
"%2541" input, DecodeURLEscapeSequences'ing it once gives "%41" but
DecodeURLEscapeSequences'ing it twice gives "A".
Do I have to roll my own URL canonicalization, separate from
url::Canonicalize from "url/url_util.h"?