Andrzej Fiedukowicz (Gerrit)

unread,

Mar 20, 2026, 10:16:58 AM (4 days ago) Mar 20

to Alicja Opalinska, chromium...@chromium.org

Attention needed from Alicja Opalinska

Andrzej Fiedukowicz added 1 comment

File services/metrics/public/cpp/ukm_source.cc

Line 33, Patchset 1 (Latest): return url.spec().substr(0, kMaxURLLength);

Andrzej Fiedukowicz . unresolved

I don't think we want to truncate the URL. I think what we want is to report TLD instead. You can probably get the eTLD+1 directly from URL.

```
net::registry_controlled_domains::GetDomainAndRegistry(
    url, 
    net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
```

One thing that Stephan mention yesterday is that we probably still want to check if the eTLD+1 itself is not longer then our limit. However this shouldn't be the case as per https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4.

Maybe worth commenting on that or even DCHECKing?

Open in Gerrit

Related details

Attention is currently required from:

Alicja Opalinska

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Sun Yueru (Gerrit)

unread,

Mar 20, 2026, 1:25:52 PM (4 days ago) Mar 20

to Alicja Opalinska, Andrzej Fiedukowicz, chromium...@chromium.org

Attention needed from Alicja Opalinska

Sun Yueru added 1 comment

File services/metrics/public/cpp/ukm_source.cc

Line 33, Patchset 1 (Latest): return url.spec().substr(0, kMaxURLLength);

Andrzej Fiedukowicz . unresolved

I don't think we want to truncate the URL. I think what we want is to report TLD instead. You can probably get the eTLD+1 directly from URL.
```
net::registry_controlled_domains::GetDomainAndRegistry(
    url, 
    net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
```
One thing that Stephan mention yesterday is that we probably still want to check if the eTLD+1 itself is not longer then our limit. However this shouldn't be the case as per https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4.
Maybe worth commenting on that or even DCHECKing?

Sun Yueru

I actually think that truncation is a good approach. The part that makes some URLs excessively long is the query params, so truncating only affects this part, and we will be able to derive all the other forms on the server-side, i.e. eTLD+1, origin, url_without_query, this is a net gain from only reporting eTLD+1.

In this case, we probably want to make it explicit and unambiguous that a truncation happened, e.g. set a `is_truncated` bit or even just append `[...]` to the end of URL.

Open in Gerrit

Related details

Attention is currently required from:

Alicja Opalinska

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Alicja Opalinska (Gerrit)

unread,

Mar 20, 2026, 2:45:29 PM (4 days ago) Mar 20

to Sun Yueru, Andrzej Fiedukowicz, chromium...@chromium.org

Attention needed from Andrzej Fiedukowicz and Sun Yueru

Alicja Opalinska added 1 comment

File services/metrics/public/cpp/ukm_source.cc

Line 33, Patchset 1: return url.spec().substr(0, kMaxURLLength);

Andrzej Fiedukowicz . unresolved

I don't think we want to truncate the URL. I think what we want is to report TLD instead. You can probably get the eTLD+1 directly from URL.
```
net::registry_controlled_domains::GetDomainAndRegistry(
    url, 
    net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
```
One thing that Stephan mention yesterday is that we probably still want to check if the eTLD+1 itself is not longer then our limit. However this shouldn't be the case as per https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4.
Maybe worth commenting on that or even DCHECKing?

Sun Yueru

I actually think that truncation is a good approach. The part that makes some URLs excessively long is the query params, so truncating only affects this part, and we will be able to derive all the other forms on the server-side, i.e. eTLD+1, origin, url_without_query, this is a net gain from only reporting eTLD+1.
In this case, we probably want to make it explicit and unambiguous that a truncation happened, e.g. set a `is_truncated` bit or even just append `[...]` to the end of URL.

Alicja Opalinska

Is it good now, with the [...] at the end? Or is it best to have a way to set is_truncated?

Open in Gerrit

Related details

Attention is currently required from:

Andrzej Fiedukowicz
Sun Yueru

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Sun Yueru (Gerrit)

unread,

Mar 20, 2026, 3:06:44 PM (4 days ago) Mar 20

to Alicja Opalinska, Stephane Doyon, Robert Kaplow, Andrzej Fiedukowicz, chromium...@chromium.org

Attention needed from Alicja Opalinska and Andrzej Fiedukowicz

Sun Yueru added 1 comment

File services/metrics/public/cpp/ukm_source.cc

Line 33, Patchset 1: return url.spec().substr(0, kMaxURLLength);

Andrzej Fiedukowicz . unresolved

I don't think we want to truncate the URL. I think what we want is to report TLD instead. You can probably get the eTLD+1 directly from URL.
```
net::registry_controlled_domains::GetDomainAndRegistry(
    url, 
    net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
```
One thing that Stephan mention yesterday is that we probably still want to check if the eTLD+1 itself is not longer then our limit. However this shouldn't be the case as per https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4.
Maybe worth commenting on that or even DCHECKing?

Sun Yueru

I actually think that truncation is a good approach. The part that makes some URLs excessively long is the query params, so truncating only affects this part, and we will be able to derive all the other forms on the server-side, i.e. eTLD+1, origin, url_without_query, this is a net gain from only reporting eTLD+1.
In this case, we probably want to make it explicit and unambiguous that a truncation happened, e.g. set a `is_truncated` bit or even just append `[...]` to the end of URL.

Alicja Opalinska

Is it good now, with the [...] at the end? Or is it best to have a way to set is_truncated?

Sun Yueru

On the server-side, we need to look for the same truncation indicator to distinguish URLs that were truncated and the URLs that were exactly 2048 characters long.

Only for the former case, since they don't represent the original "full_url" as seen by the users, these can be replaced by "URLTooLong" as before, but their other variants like `origin` can then be derived and recorded in the logs.

Server-side handling should land first before this client-side change does.

I think the alternative of setting a `is_truncated` bit may be a cleaner signal, but from the top of my head, I can't think of any real use cases for it. To count what proportion of full URLs were truncated, counting `URLTooLong` would yield the same answer as `is_truncated`.

Open in Gerrit

Related details

Attention is currently required from:

Alicja Opalinska
Andrzej Fiedukowicz

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Andrzej Fiedukowicz (Gerrit)

unread,

Mar 23, 2026, 4:48:31 AM (yesterday) Mar 23

to Alicja Opalinska, Stephane Doyon, Robert Kaplow, Sun Yueru, chromium...@chromium.org

Attention needed from Alicja Opalinska

Andrzej Fiedukowicz added 2 comments

File services/metrics/public/cpp/ukm_source.cc

Line 33, Patchset 1: return url.spec().substr(0, kMaxURLLength);

Andrzej Fiedukowicz . unresolved

I don't think we want to truncate the URL. I think what we want is to report TLD instead. You can probably get the eTLD+1 directly from URL.
```
net::registry_controlled_domains::GetDomainAndRegistry(
    url, 
    net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES);
```
One thing that Stephan mention yesterday is that we probably still want to check if the eTLD+1 itself is not longer then our limit. However this shouldn't be the case as per https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4.
Maybe worth commenting on that or even DCHECKing?

Sun Yueru

I actually think that truncation is a good approach. The part that makes some URLs excessively long is the query params, so truncating only affects this part, and we will be able to derive all the other forms on the server-side, i.e. eTLD+1, origin, url_without_query, this is a net gain from only reporting eTLD+1.
In this case, we probably want to make it explicit and unambiguous that a truncation happened, e.g. set a `is_truncated` bit or even just append `[...]` to the end of URL.

Alicja Opalinska

Is it good now, with the [...] at the end? Or is it best to have a way to set is_truncated?

Sun Yueru

On the server-side, we need to look for the same truncation indicator to distinguish URLs that were truncated and the URLs that were exactly 2048 characters long.
Only for the former case, since they don't represent the original "full_url" as seen by the users, these can be replaced by "URLTooLong" as before, but their other variants like `origin` can then be derived and recorded in the logs.
Server-side handling should land first before this client-side change does.
I think the alternative of setting a `is_truncated` bit may be a cleaner signal, but from the top of my head, I can't think of any real use cases for it. To count what proportion of full URLs were truncated, counting `URLTooLong` would yield the same answer as `is_truncated`.

Andrzej Fiedukowicz

Ohh, I see now, thank you for explaining that Sun!

Only for the former case, since they don't represent the original "full_url" as seen by the users, these can be replaced by "URLTooLong" as before, but their other variants like origin can then be derived and recorded in the logs.

How will we determine if the URL was cutoff on query and not on path itself (i.e. `url_without_query` length is in itself bigger then `2048`)?

I also am not huge fan of `[...]` suffix in URL, this seems very special casey and we will always wonder what if it accidentally is at the end of the string, it will also mean the lookup of 2kB every time we want to check if string was truncated (assuming we use zero terminated string). I think it would be significantly better to have `is_truncated` flag.

Additionally if we go with extra flag, we can even solve the problem of where the url was truncated by calling it `truncated_at` and adding values to it `{ not_truncated, at_query, at_path, at_origin }`. The logic will be slightly more complex but also slightly more complete.

WDYT?

Line 33, Patchset 2 (Latest): const std::string kTruncatedIndicator = "[...]";

Andrzej Fiedukowicz . unresolved

probably could be constexpr?

Open in Gerrit

Related details

Attention is currently required from:

Alicja Opalinska

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Sun Yueru (Gerrit)

unread,

Mar 23, 2026, 3:43:16 PM (21 hours ago) Mar 23

to Alicja Opalinska, Stephane Doyon, Robert Kaplow, Andrzej Fiedukowicz, chromium...@chromium.org

Attention needed from Alicja Opalinska

Sun Yueru added 1 comment

File services/metrics/public/cpp/ukm_source.cc

Sun Yueru

How will we determine if the URL was cutoff on query and not on path itself (i.e. url_without_query length is in itself bigger then 2048)?

It seems extremely unlikely that the `url_without_query` part would exceed this length. If truncation does happen here (we would see the URL missing the `?` symbol in this part), the truncated `url_without_query` likely does not represent a real URL of the site so it wouldn't be logged by the server either (due to server-side check).

Checking the logs... This does happen! https://paste.googleplex.com/5202623947997184

Additionally if we go with extra flag, we can even solve the problem of where the url was truncated by calling it truncated_at

While I think `[...]` is sufficiently unambiguous (since if a real URL contains this substring, it would be escaped, e.g. `[` becomes `%5B` in the `full_url` field), I agree with you on using `truncated_at` for better clarity. We can keep it as simple as a `uint32` field for now, currently set to 2048. If we ever want to bump up `kMaxURLLength`, this field tracks that status. WDYT?

satisfied_requirement

unsatisfied_requirement

open

diffy

Andrzej Fiedukowicz (Gerrit)

unread,

8:35 AM (4 hours ago) 8:35 AM

to Alicja Opalinska, Stephane Doyon, Robert Kaplow, Sun Yueru, chromium...@chromium.org

Attention needed from Alicja Opalinska

Andrzej Fiedukowicz added 1 comment

File services/metrics/public/cpp/ukm_source.cc

Andrzej Fiedukowicz

Hm, but `truncated_at` that represents the number of character will not be able to answer the question of weather the truncation happened within query or path?

If we went with enum `{ not_truncated, at_query, at_path, at_origin }` we would be able to tell where it was truncated and we can still derive the number of characters it was truncated at from the data.

satisfied_requirement

unsatisfied_requirement

open

diffy

Alicja Opalinska (Gerrit)

unread,

9:49 AM (3 hours ago) 9:49 AM

to Stephane Doyon, Robert Kaplow, Sun Yueru, Andrzej Fiedukowicz, chromium...@chromium.org

Attention needed from Andrzej Fiedukowicz and Sun Yueru

Alicja Opalinska added 1 comment

File services/metrics/public/cpp/ukm_source.cc

Alicja Opalinska

Can you take a look again, Sun? I applied afie's idea basically with is_truncated being an enum, which will be an information for the server which urls to replace with "URLTooLong".

Open in Gerrit

Related details

Attention is currently required from:

Andrzej Fiedukowicz
Sun Yueru

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Andrzej Fiedukowicz (Gerrit)

unread,

11:28 AM (2 hours ago) 11:28 AM

to Alicja Opalinska, Chromium LUCI CQ, Stephane Doyon, Robert Kaplow, Sun Yueru, chromium...@chromium.org

Attention needed from Alicja Opalinska and Sun Yueru

Andrzej Fiedukowicz added 3 comments

File components/ukm/ukm_service_unittest.cc

Line 1333, Patchset 3 (Latest): EXPECT_EQ(path_string.substr(0, 2048 - 5) + "[...]",

Andrzej Fiedukowicz . unresolved

Hm, where are those `[...]` coming from in this version of the code?

File services/metrics/public/cpp/ukm_source.cc

Line 44, Patchset 3 (Latest): if (kMaxURLLength > query_start) {

Andrzej Fiedukowicz . unresolved

nitty nit: I would probably spell it out in reverse `query_start < kMaxURLLength` (here and below). The reason is that at least from my perspective we are thinking from left to right and we actually want to check if `query_start` is before the limit.

I guess this is also off by one, if `query_start == kMaxURLLength`, then that means that we cut exactly whole QUERY but we still didn't touch a PATH. No? So it should be `query_start <= kMaxURLLength` and `path_start <= kMaxURLLength`

Line 37, Patchset 3 (Latest):

  // Verify whether the kMaxURLLength-character cutoff occurred after the QUERY,
  // after the PATH, or within the ORIGIN
  const url::Parsed& parsed = url.parsed_for_possibly_invalid_spec();
  int query_start = parsed.CountCharactersBefore(url::Parsed::QUERY, true);
  int path_start = parsed.CountCharactersBefore(url::Parsed::PATH, true);

  UrlInfo::TruncationStatus status;
  if (kMaxURLLength > query_start) {
    status = UrlInfo::AT_QUERY;
  } else if (kMaxURLLength > path_start) {
    status = UrlInfo::AT_PATH;
  } else {
    status = UrlInfo::AT_ORIGIN;
  }

Andrzej Fiedukowicz . unresolved

nit: This block look like a great candidate to extract to a separate function like `std::pair<std::string, TruncationStatus> truncatedUrlSpec(const GURL& url, int32_t limit)`. WDYT?

Open in Gerrit

Related details

Attention is currently required from:

Alicja Opalinska
Sun Yueru

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Sun Yueru (Gerrit)

unread,

12:01 PM (1 hour ago) 12:01 PM

to Alicja Opalinska, Chromium LUCI CQ, Andrzej Fiedukowicz, Stephane Doyon, Robert Kaplow, chromium...@chromium.org

Attention needed from Alicja Opalinska and Andrzej Fiedukowicz

Sun Yueru added 5 comments

File services/metrics/public/cpp/ukm_source.cc

Line 31, Patchset 3 (Latest):void SetShortenedURL(const GURL& url, UrlInfo* url_info) {

Sun Yueru . unresolved

nit: this could use a new name, "truncated" would be more descriptive than "shortened" now. Maybe `SetUrlInfoWithTruncationStatus`?

Line 33, Patchset 1: return url.spec().substr(0, kMaxURLLength);

Andrzej Fiedukowicz . resolved

Sun Yueru

I see, fair enough, let's go with this then.

File third_party/metrics_proto/ukm/source.proto

Line 5, Patchset 3 (Latest):// This file was exported from google3, do not edit manually.

Sun Yueru . unresolved

Proto change should be made in google3 first then sync'ed to chromium.

Line 85, Patchset 3 (Latest): AT_QUERY = 1;

Sun Yueru . unresolved

Can we align these names to those in UKM instead of the GURL library?
E.g. TRUNCATED_AT_FULL_PATH = 1, TRUNCATED_AT_URL_WITHOUT_QUERY = 2, TRUNCATED_AT_ORIGIN = 3

Line 89, Patchset 3 (Latest): TruncationStatus is_truncated = 3;

Sun Yueru . unresolved

Let's call this `truncation_status` too. `is_*` by convention is used for boolean types.

Open in Gerrit

Related details

Attention is currently required from:

Alicja Opalinska
Andrzej Fiedukowicz

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Reply all

Reply to author

Forward

Report truncated URL if exceeds 2048 char. [chromium/src : main]

Andrzej Fiedukowicz (Gerrit)

Andrzej Fiedukowicz added 1 comment

Related details

Sun Yueru (Gerrit)

Sun Yueru added 1 comment

Related details

Alicja Opalinska (Gerrit)

Alicja Opalinska added 1 comment

Related details

Sun Yueru (Gerrit)

Sun Yueru added 1 comment

Related details

Andrzej Fiedukowicz (Gerrit)

Andrzej Fiedukowicz added 2 comments

Related details

Sun Yueru (Gerrit)

Sun Yueru added 1 comment

Andrzej Fiedukowicz (Gerrit)

Andrzej Fiedukowicz added 1 comment

Alicja Opalinska (Gerrit)

Alicja Opalinska added 1 comment

Related details

Andrzej Fiedukowicz (Gerrit)

Andrzej Fiedukowicz added 3 comments

Related details

Sun Yueru (Gerrit)

Sun Yueru added 5 comments

Related details