Add BCP47 initial extension support to LanguageCode [chromium/src : main]

0 views
Skip to first unread message

Manish Goregaokar (Gerrit)

unread,
May 28, 2026, 4:54:23 PM (4 days ago) May 28
to Danilo Françoso Tedeschi, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
Attention needed from Danilo Françoso Tedeschi and Greg Thompson

Manish Goregaokar added 1 comment

Patchset-level comments
File-level comment, Patchset 5 (Latest):
Manish Goregaokar . resolved

ICU4X has APIs for this but I guess you don't want to have to reparse

Open in Gerrit

Related details

Attention is currently required from:
  • Danilo Françoso Tedeschi
  • Greg Thompson
Submit Requirements:
  • requirement satisfiedCode-Coverage
  • requirement satisfiedCode-Owners
  • requirement is not satisfiedCode-Review
  • requirement is not satisfiedReview-Enforcement
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
Gerrit-Change-Number: 7883222
Gerrit-PatchSet: 5
Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
Gerrit-Attention: Danilo Françoso Tedeschi <da...@google.com>
Gerrit-Attention: Greg Thompson <g...@chromium.org>
Gerrit-Comment-Date: Thu, 28 May 2026 20:54:12 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
satisfied_requirement
unsatisfied_requirement
open
diffy

Manish Goregaokar (Gerrit)

unread,
May 28, 2026, 5:21:57 PM (4 days ago) May 28
to Danilo Françoso Tedeschi, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
Attention needed from Danilo Françoso Tedeschi and Greg Thompson

Manish Goregaokar voted Code-Review+1

Code-Review+1
Open in Gerrit

Related details

Attention is currently required from:
  • Danilo Françoso Tedeschi
  • Greg Thompson
Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement satisfiedCode-Owners
    • requirement satisfiedCode-Review
    • requirement satisfiedReview-Enforcement
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
    Gerrit-Change-Number: 7883222
    Gerrit-PatchSet: 5
    Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
    Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
    Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
    Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
    Gerrit-Attention: Danilo Françoso Tedeschi <da...@google.com>
    Gerrit-Attention: Greg Thompson <g...@chromium.org>
    Gerrit-Comment-Date: Thu, 28 May 2026 21:21:45 +0000
    Gerrit-HasComments: No
    Gerrit-Has-Labels: Yes
    satisfied_requirement
    open
    diffy

    Greg Thompson (Gerrit)

    unread,
    May 29, 2026, 6:13:11 AM (3 days ago) May 29
    to Danilo Françoso Tedeschi, Manish Goregaokar, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
    Attention needed from Danilo Françoso Tedeschi

    Greg Thompson added 16 comments

    File base/i18n/bcp47_extensions.h
    Line 85, Patchset 6 (Latest): static_assert(c >= 'a' && c <= 'z', "Invalid BCP47 extension identifier");
    Greg Thompson . unresolved

    Gemini says: BCP47 allows singletons to be digits (`0-9`) in addition to letters.
    This `static_assert` should be updated to allow digit singletons:
    ```cpp
    static_assert((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'), "Invalid BCP47 extension identifier");
    ```

    Greg says: this appears to be correct as per https://www.rfc-editor.org/info/rfc5646/#section-2.1

    Line 29, Patchset 6 (Latest): std::string_view AsStringView() const { return value_; }
    Greg Thompson . unresolved

    how about just `value()`?

    Line 27, Patchset 6 (Latest): explicit Extension(base::PassKey<base::LanguageCode>, std::string value)
    Greg Thompson . unresolved

    nit: omit `explicit` for multi-arg ctors

    File base/i18n/language_code.h
    Line 89, Patchset 6 (Latest): // LanguageCodeBuilder::GetInstance().FromString("en-US-u-ca-gregory");
    Greg Thompson . resolved

    that's me! :-)

    File base/i18n/language_code.cc
    Line 15, Patchset 6 (Latest):std::string_view FindExtension(std::string_view code, char ext_id) {
    Greg Thompson . unresolved

    nit: blank line after namespace open

    Line 16, Patchset 6 (Latest): std::string needle = StrCat({"-", std::string_view(&ext_id, 1), "-"});
    Greg Thompson . unresolved

    Gemini says: Using the address of a local `char` to create a `std::string_view` is fragile and generally considered an anti-pattern, even if it's technically memory-safe here because it's consumed immediately.

    Consider using `std::string` directly:
    ```cpp
    std::string needle = base::StrCat({"-", std::string(1, ext_id), "-"});
    ```
    Or simply:
    ```cpp
    std::string needle = std::string("-") + ext_id + "-";
    ```

    Greg says: I had a similar thought when i read this, so i think it's a good idea to change it.

    Line 20, Patchset 6 (Latest): return "";
    Greg Thompson . unresolved

    optional nit: it amounts to the same thing, but how about `{}` here to mean "empty string view"

    Line 23, Patchset 6 (Latest): for (size_t i = pos + 1; i + 2 < code.size(); i++) {
    Greg Thompson . unresolved

    nit: i think it's slightly more readable to do `code = code.substr(pos);` or `std::string_view extension = code.substr(pos);` just before this and adjust the code below accordingly.

    Line 23, Patchset 6 (Latest): for (size_t i = pos + 1; i + 2 < code.size(); i++) {
    Greg Thompson . unresolved

    start at index `3`? we don't need to look at any of the characters matched by `needle`.

    Line 25, Patchset 6 (Latest): if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&
    Greg Thompson . unresolved

    Gemini says: BCP47 extension singletons can be digits (`0-9`) as per RFC 5646 section 2.1.
    Using `base::IsAsciiAlpha` will fail to recognize digit singletons (e.g., `-1-`), causing it to incorrectly include the next extension as part of the current one.

    You should use `base::IsAsciiAlphaNumeric` (or `absl::ascii_isalnum`) instead.

    Line 25, Patchset 6 (Latest): if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&
    code[i + 2] == '-') {
    Greg Thompson . unresolved

    nit: swap the order of these last two terms so that the most expensive check is last.

    Line 27, Patchset 6 (Latest): return code.substr(pos, i - pos);
    Greg Thompson . unresolved

    Gemini says (and Greg has not validated, so please verify its claim): Please fix this WARNING reported by autoreview issue finding: There's a critical issue here regarding the private use extension (`x`).

    Unlike regular extensions, `x` allows subtags of length 1 (e.g., `en-US-x-a-b`).
    If `FindExtension` is looking for `x`, this loop will incorrectly stop at `-a-` and truncate the private use value.
    Furthermore, if you are searching for `a`, `code.find(needle)` might accidentally find `-a-` *inside* the private use extension!

    To fix both issues, you can take advantage of the fact that `x` is always the last extension:
    ```cpp
    size_t x_pos = code.find("-x-");
    std::string_view search_space = code;
    if (x_pos != std::string_view::npos && ext_id != 'x') {
    // Ignore everything inside the private use extension.
    search_space = code.substr(0, x_pos);
    }
      std::string needle = std::string("-") + ext_id + "-";
    size_t pos = search_space.find(needle);
      if (pos == std::string_view::npos) {
    return "";
    }
      // 'x' is always the last extension, so it spans to the end of the string.
    if (ext_id == 'x') {
    return search_space.substr(pos);
    }
      for (size_t i = pos + 1; i + 2 < search_space.size(); i++) {
    if (search_space[i] == '-' && base::IsAsciiAlphaNumeric(search_space[i + 1]) &&
    search_space[i + 2] == '-') {
    return search_space.substr(pos, i - pos);
    }
    }
      return search_space.substr(pos);
    ```
    Line 123, Patchset 6 (Latest): if (extension.size() < 3) {
    Greg Thompson . unresolved

    nit: "3" happens a lot. consider something like `kExtensionPrefixLength` or something in an unnamed namespace at the top of the file.

    Line 136, Patchset 6 (Latest): return R(base::PassKey<LanguageCode>(), std::move(*value));
    Greg Thompson . unresolved

    `*std::move(value)` as per go/totw/181#moving-from-the-value-of-an-abslstatusor

    File base/i18n/language_code_unittest.cc
    File-level comment, Patchset 6 (Latest):
    Greg Thompson . unresolved

    Could you add some tests for malformed inputs?

    Line 355, Patchset 6 (Latest): "en-US-u-ca-gregory-x-private-a-test");
    Greg Thompson . unresolved

    Gemini says (and, again, Greg has not validated): This test implies a misunderstanding of BCP47.

    Because the private use extension (`x`) is always the last subtag sequence and it allows 1-character subtags, `-a-test` is actually a continuation of the `x` extension, NOT a new extension `a`.

    The value of `x` here should be `private-a-test`, and `ext<'a'>()` should return `std::nullopt`.

    (This test currently passes only because of the buggy parsing logic in `FindExtension` which incorrectly stops at 1-character subtags inside the `x` extension).

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Danilo Françoso Tedeschi
    Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Owners
      • requirement satisfiedCode-Review
      • requirement is not satisfiedNo-Unresolved-Comments
      • requirement satisfiedReview-Enforcement
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
      Gerrit-Change-Number: 7883222
      Gerrit-PatchSet: 6
      Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
      Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
      Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
      Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
      Gerrit-Attention: Danilo Françoso Tedeschi <da...@google.com>
      Gerrit-Comment-Date: Fri, 29 May 2026 10:12:55 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Danilo Françoso Tedeschi (Gerrit)

      unread,
      May 29, 2026, 4:04:27 PM (3 days ago) May 29
      to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
      Attention needed from Greg Thompson and Manish Goregaokar

      Danilo Françoso Tedeschi added 15 comments

      File base/i18n/bcp47_extensions.h
      Line 85, Patchset 6: static_assert(c >= 'a' && c <= 'z', "Invalid BCP47 extension identifier");
      Greg Thompson . resolved

      Gemini says: BCP47 allows singletons to be digits (`0-9`) in addition to letters.
      This `static_assert` should be updated to allow digit singletons:
      ```cpp
      static_assert((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'), "Invalid BCP47 extension identifier");
      ```

      Greg says: this appears to be correct as per https://www.rfc-editor.org/info/rfc5646/#section-2.1

      Danilo Françoso Tedeschi

      Done

      Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }
      Greg Thompson . unresolved

      how about just `value()`?

      Danilo Françoso Tedeschi

      dunno, for it .value() is too much associated with optionality, don't you think?

      Line 27, Patchset 6: explicit Extension(base::PassKey<base::LanguageCode>, std::string value)
      Greg Thompson . resolved

      nit: omit `explicit` for multi-arg ctors

      Danilo Françoso Tedeschi

      Done

      File base/i18n/language_code.cc
      Line 15, Patchset 6:std::string_view FindExtension(std::string_view code, char ext_id) {
      Greg Thompson . resolved

      nit: blank line after namespace open

      Danilo Françoso Tedeschi

      Done

      Line 16, Patchset 6: std::string needle = StrCat({"-", std::string_view(&ext_id, 1), "-"});
      Greg Thompson . resolved

      Gemini says: Using the address of a local `char` to create a `std::string_view` is fragile and generally considered an anti-pattern, even if it's technically memory-safe here because it's consumed immediately.

      Consider using `std::string` directly:
      ```cpp
      std::string needle = base::StrCat({"-", std::string(1, ext_id), "-"});
      ```
      Or simply:
      ```cpp
      std::string needle = std::string("-") + ext_id + "-";
      ```

      Greg says: I had a similar thought when i read this, so i think it's a good idea to change it.

      Danilo Françoso Tedeschi

      Done

      Line 20, Patchset 6: return "";
      Greg Thompson . resolved

      optional nit: it amounts to the same thing, but how about `{}` here to mean "empty string view"

      Danilo Françoso Tedeschi

      Done

      Line 23, Patchset 6: for (size_t i = pos + 1; i + 2 < code.size(); i++) {
      Greg Thompson . resolved

      nit: i think it's slightly more readable to do `code = code.substr(pos);` or `std::string_view extension = code.substr(pos);` just before this and adjust the code below accordingly.

      Danilo Françoso Tedeschi

      Done

      Line 23, Patchset 6: for (size_t i = pos + 1; i + 2 < code.size(); i++) {
      Greg Thompson . resolved

      start at index `3`? we don't need to look at any of the characters matched by `needle`.

      Danilo Françoso Tedeschi

      Done

      Line 25, Patchset 6: if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&
      Greg Thompson . resolved

      Gemini says: BCP47 extension singletons can be digits (`0-9`) as per RFC 5646 section 2.1.
      Using `base::IsAsciiAlpha` will fail to recognize digit singletons (e.g., `-1-`), causing it to incorrectly include the next extension as part of the current one.

      You should use `base::IsAsciiAlphaNumeric` (or `absl::ascii_isalnum`) instead.

      Danilo Françoso Tedeschi

      Done

      Line 25, Patchset 6: if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&

      code[i + 2] == '-') {
      Greg Thompson . resolved

      nit: swap the order of these last two terms so that the most expensive check is last.

      Danilo Françoso Tedeschi

      Done

      Line 27, Patchset 6: return code.substr(pos, i - pos);
      Greg Thompson . resolved

      Gemini says (and Greg has not validated, so please verify its claim): Please fix this WARNING reported by autoreview issue finding: There's a critical issue here regarding the private use extension (`x`).

      Unlike regular extensions, `x` allows subtags of length 1 (e.g., `en-US-x-a-b`).
      If `FindExtension` is looking for `x`, this loop will incorrectly stop at `-a-` and truncate the private use value.
      Furthermore, if you are searching for `a`, `code.find(needle)` might accidentally find `-a-` *inside* the private use extension!

      To fix both issues, you can take advantage of the fact that `x` is always the last extension:
      ```cpp
      size_t x_pos = code.find("-x-");
      std::string_view search_space = code;
      if (x_pos != std::string_view::npos && ext_id != 'x') {
      // Ignore everything inside the private use extension.
      search_space = code.substr(0, x_pos);
      }
        std::string needle = std::string("-") + ext_id + "-";
      size_t pos = search_space.find(needle);
        if (pos == std::string_view::npos) {
      return "";
      }
        // 'x' is always the last extension, so it spans to the end of the string.
      if (ext_id == 'x') {
      return search_space.substr(pos);
      }
        for (size_t i = pos + 1; i + 2 < search_space.size(); i++) {
      if (search_space[i] == '-' && base::IsAsciiAlphaNumeric(search_space[i + 1]) &&
      search_space[i + 2] == '-') {
      return search_space.substr(pos, i - pos);
      }
      }
        return search_space.substr(pos);
      ```
      Danilo Françoso Tedeschi

      very nice catch. I rewrote it a little differently, let me know what you think.

      Line 123, Patchset 6: if (extension.size() < 3) {
      Greg Thompson . resolved

      nit: "3" happens a lot. consider something like `kExtensionPrefixLength` or something in an unnamed namespace at the top of the file.

      Danilo Françoso Tedeschi

      Done

      Line 136, Patchset 6: return R(base::PassKey<LanguageCode>(), std::move(*value));
      Greg Thompson . resolved

      `*std::move(value)` as per go/totw/181#moving-from-the-value-of-an-abslstatusor

      Danilo Françoso Tedeschi

      Done

      File base/i18n/language_code_unittest.cc
      File-level comment, Patchset 6:
      Greg Thompson . resolved

      Could you add some tests for malformed inputs?

      Danilo Françoso Tedeschi

      Done

      Line 355, Patchset 6: "en-US-u-ca-gregory-x-private-a-test");
      Greg Thompson . resolved

      Gemini says (and, again, Greg has not validated): This test implies a misunderstanding of BCP47.

      Because the private use extension (`x`) is always the last subtag sequence and it allows 1-character subtags, `-a-test` is actually a continuation of the `x` extension, NOT a new extension `a`.

      The value of `x` here should be `private-a-test`, and `ext<'a'>()` should return `std::nullopt`.

      (This test currently passes only because of the buggy parsing logic in `FindExtension` which incorrectly stops at 1-character subtags inside the `x` extension).

      Danilo Françoso Tedeschi

      Done

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Greg Thompson
      • Manish Goregaokar
      Submit Requirements:
        • requirement satisfiedCode-Coverage
        • requirement satisfiedCode-Owners
        • requirement is not satisfiedCode-Review
        • requirement is not satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: chromium/src
        Gerrit-Branch: main
        Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
        Gerrit-Change-Number: 7883222
        Gerrit-PatchSet: 10
        Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
        Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Greg Thompson <g...@chromium.org>
        Gerrit-Comment-Date: Fri, 29 May 2026 20:04:13 +0000
        Gerrit-HasComments: Yes
        Gerrit-Has-Labels: No
        Comment-In-Reply-To: Greg Thompson <g...@chromium.org>
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy

        Greg Thompson (Gerrit)

        unread,
        4:08 AM (13 hours ago) 4:08 AM
        to Danilo Françoso Tedeschi, Manish Goregaokar, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
        Attention needed from Danilo Françoso Tedeschi and Manish Goregaokar

        Greg Thompson added 14 comments

        Commit Message
        Line 15, Patchset 10 (Latest):- New API: Adds LanguageCode::GetExtension<T>(traits) to retrieve specific extension types.
        Greg Thompson . unresolved
        File base/i18n/bcp47_extensions.h
        Line 41, Patchset 10 (Latest):// Represents a Unicode BCP47 extension ('u-').
        Greg Thompson . unresolved

        ...and this holds the subtags for the Unicode extension in a language tag.

        Line 22, Patchset 10 (Latest):// Represents a generic BCP47 extension (e.g., 'x-' for private use).
        Greg Thompson . unresolved

        to be precise: an instance of this holds an extension's subtags in a language tag, right?

        Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }
        Greg Thompson . unresolved

        how about just `value()`?

        Danilo Françoso Tedeschi

        dunno, for it .value() is too much associated with optionality, don't you think?

        Greg Thompson

        let's look at bcp47 to see what it calls this... https://www.rfc-editor.org/info/rfc5646/#section-2.2.6 calls the stuff that follows the single-character extension identifier the "extension subtags" (see items 5 and 6 -- 5 says "Each extension subtag MUST be from two to eight characters long..." and 6 says "Each singleton MUST be followed by at least one extension subtag."). so how about renaming `value_` to `subtags_` and call this getter `subtags()`? this seems most consistent with the language of the spec.

        File base/i18n/language_code.h
        Line 78, Patchset 10 (Latest): // Retrieves a BCP47 extension subtag from the language code.
        Greg Thompson . unresolved

        nit: the blob after a singleton is one or more subtags, so maybe say "Retrieves the subtag(s) for an extension to a BCP47 language tag."

        Line 39, Patchset 10 (Latest):class BASE_I18N_EXPORT LanguageCode {
        Greg Thompson . unresolved

        now that i've been scrutinizing BCP47, i think this class is more aligned with what the spec calls a "Language Tag". the spec uses the term "code" to refer to the actual value of any subtag in a language tag (e.g., "en" is the language code for English; "US" is the region code for the US, etc). maybe too late to rename the class now, but perhaps it's worth clarifying in the doc comments.

        File base/i18n/language_code.cc
        Line 10, Patchset 10 (Latest):#include "base/strings/strcat.h"
        Greg Thompson . unresolved

        This `#include` doesn't seem to be used in this file and can be safely removed.

        Line 18, Patchset 10 (Latest):// Finds the position of start of the first extension.
        Greg Thompson . unresolved

        nit: remove extra space

        Line 33, Patchset 10 (Latest):// Returns the extension keyed by the singleton <ext_id>. It only returns the
        Greg Thompson . unresolved

        maybe rephrase to "Returns the subtags for the extension identified by the singleton `ext_id`." (using backticks rather than angle brackets around the argument name as per https://google.github.io/styleguide/cppguide.html#Function_Comments.)

        Line 35, Patchset 10 (Latest):std::string_view GetExtensionValue(std::string_view code, char ext_id) {
        Greg Thompson . unresolved

        i wonder if a good way to structure this is to have a primitive similar to `FindFirstExtension` that operates on a string view and returns either the offset of the first singleton found (i.e., the middle character in the discovered '-S-' sequence) or npos (no singleton found). maybe call this `FindNextSingleton`. then this function can use it repeatedly until it finds the singleton that equals `ext_id`; aborting if it finds "x" or an empty view. when it finds a match, it calls `FindNextSingleton` one more time to find the start of the next singleton. if this isn't npos and is > this_ext + 2, we return the substr from this_ext + 2 to next_ext. or something like that. i don't think we need `needle` in this case -- we just walk one by one through each singleton until we find the one we want, and return everything up to the next singleton (with special handling for "x"). wdyt? would this simplify things at all?

        (i don't think we'd need/want the "Skip the first two characters..." part of the fn above.)

        Line 36, Patchset 10 (Latest): std::string needle = ext_id + std::string("-");
        Greg Thompson . unresolved
        You can avoid the string heap allocation here by using a `std::string_view` over a local char array:
        ```cpp
        const char needle_chars[] = {ext_id, '-'};
        std::string_view needle(needle_chars, 2u);
        ```
        Line 53, Patchset 10 (Latest): CHECK_GT(next_extension_pos, 0u)
        Greg Thompson . unresolved

        crashing on invalid inputs seems problematic. should we not simply reject/ignore them?

        it might be nice to reference the spec (e.g., "BCP47 2.2.6. says that each extension subtag MUST be from two to eight characters long")

        Line 155, Patchset 10 (Latest): std::string_view extension_value = GetExtensionValue(code_.AsString(), key);
        Greg Thompson . unresolved

        maybe `GetExtensionSubtags` is more aligned with the language of the spec.

        File base/i18n/language_code_unittest.cc
        Line 130, Patchset 10 (Latest): ... EXPECT_THAT(
        Greg Thompson . unresolved

        remove "... "

        Open in Gerrit

        Related details

        Attention is currently required from:
        • Danilo Françoso Tedeschi
        • Manish Goregaokar
        Submit Requirements:
        • requirement satisfiedCode-Coverage
        • requirement satisfiedCode-Owners
        • requirement is not satisfiedCode-Review
        • requirement is not satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: chromium/src
        Gerrit-Branch: main
        Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
        Gerrit-Change-Number: 7883222
        Gerrit-PatchSet: 10
        Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
        Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Comment-Date: Mon, 01 Jun 2026 08:07:43 +0000
        Gerrit-HasComments: Yes
        Gerrit-Has-Labels: No
        Comment-In-Reply-To: Danilo Françoso Tedeschi <da...@google.com>
        Comment-In-Reply-To: Greg Thompson <g...@chromium.org>
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy

        Danilo Françoso Tedeschi (Gerrit)

        unread,
        11:44 AM (5 hours ago) 11:44 AM
        to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
        Attention needed from Greg Thompson and Manish Goregaokar

        Danilo Françoso Tedeschi added 7 comments

        Commit Message
        Line 15, Patchset 10:- New API: Adds LanguageCode::GetExtension<T>(traits) to retrieve specific extension types.
        Greg Thompson . resolved
        Danilo Françoso Tedeschi

        formated it.

        File base/i18n/bcp47_extensions.h
        Line 41, Patchset 10:// Represents a Unicode BCP47 extension ('u-').
        Greg Thompson . resolved

        ...and this holds the subtags for the Unicode extension in a language tag.

        Danilo Françoso Tedeschi

        Done

        Line 22, Patchset 10:// Represents a generic BCP47 extension (e.g., 'x-' for private use).
        Greg Thompson . resolved

        to be precise: an instance of this holds an extension's subtags in a language tag, right?

        Danilo Françoso Tedeschi

        Added a method for returning the singleton identifier for the extension. This way we get a proper extension representation from the `Extension` class.

        Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }
        Greg Thompson . resolved

        how about just `value()`?

        Danilo Françoso Tedeschi

        dunno, for it .value() is too much associated with optionality, don't you think?

        Greg Thompson

        let's look at bcp47 to see what it calls this... https://www.rfc-editor.org/info/rfc5646/#section-2.2.6 calls the stuff that follows the single-character extension identifier the "extension subtags" (see items 5 and 6 -- 5 says "Each extension subtag MUST be from two to eight characters long..." and 6 says "Each singleton MUST be followed by at least one extension subtag."). so how about renaming `value_` to `subtags_` and call this getter `subtags()`? this seems most consistent with the language of the spec.

        Danilo Françoso Tedeschi

        Done

        Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }
        Greg Thompson . resolved

        how about just `value()`?

        Danilo Françoso Tedeschi

        dunno, for it .value() is too much associated with optionality, don't you think?

        Greg Thompson

        let's look at bcp47 to see what it calls this... https://www.rfc-editor.org/info/rfc5646/#section-2.2.6 calls the stuff that follows the single-character extension identifier the "extension subtags" (see items 5 and 6 -- 5 says "Each extension subtag MUST be from two to eight characters long..." and 6 says "Each singleton MUST be followed by at least one extension subtag."). so how about renaming `value_` to `subtags_` and call this getter `subtags()`? this seems most consistent with the language of the spec.

        Danilo Françoso Tedeschi

        Done

        File base/i18n/language_code_unittest.cc
        Line 124, Patchset 11 (Latest):
        // Singletons can be digits (except 'x', 'i').
        EXPECT_THAT(LanguageCodeBuilder::GetInstance().FromString("en-1-myext"),
        Optional(Property(&LanguageCode::ToString, Eq("en-1-myext"))));
        Danilo Françoso Tedeschi . resolved

        REmoving the possibility of digits as singletons as they are not supported from ICU4x now and are not even considered by the standards.

        Line 130, Patchset 10: ... EXPECT_THAT(
        Greg Thompson . resolved

        remove "... "

        Danilo Françoso Tedeschi

        Done

        Open in Gerrit

        Related details

        Attention is currently required from:
        • Greg Thompson
        • Manish Goregaokar
        Submit Requirements:
        • requirement satisfiedCode-Coverage
        • requirement satisfiedCode-Owners
        • requirement is not satisfiedCode-Review
        • requirement is not satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: chromium/src
        Gerrit-Branch: main
        Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
        Gerrit-Change-Number: 7883222
        Gerrit-PatchSet: 11
        Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
        Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Greg Thompson <g...@chromium.org>
        Gerrit-Comment-Date: Mon, 01 Jun 2026 15:44:29 +0000
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy

        Danilo Françoso Tedeschi (Gerrit)

        unread,
        1:03 PM (4 hours ago) 1:03 PM
        to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
        Attention needed from Greg Thompson and Manish Goregaokar

        Danilo Françoso Tedeschi added 10 comments

        File base/i18n/bcp47_extensions.h
        Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }
        Greg Thompson . resolved

        how about just `value()`?

        Danilo Françoso Tedeschi

        dunno, for it .value() is too much associated with optionality, don't you think?

        Danilo Françoso Tedeschi

        Done

        File base/i18n/language_code.h
        Line 78, Patchset 10: // Retrieves a BCP47 extension subtag from the language code.
        Greg Thompson . resolved

        nit: the blob after a singleton is one or more subtags, so maybe say "Retrieves the subtag(s) for an extension to a BCP47 language tag."

        Danilo Françoso Tedeschi

        Done

        Line 39, Patchset 10:class BASE_I18N_EXPORT LanguageCode {
        Greg Thompson . unresolved

        now that i've been scrutinizing BCP47, i think this class is more aligned with what the spec calls a "Language Tag". the spec uses the term "code" to refer to the actual value of any subtag in a language tag (e.g., "en" is the language code for English; "US" is the region code for the US, etc). maybe too late to rename the class now, but perhaps it's worth clarifying in the doc comments.

        Danilo Françoso Tedeschi

        I do not think it is too late to rename it. I pick the name just because in google3 that is what they ended up choosing. I am ok with changing it.

        the other option would be to call it `Locale` as well, but I also agree `LanguageTag` also sounds more aligned with BCP47.

        File base/i18n/language_code.cc
        Line 10, Patchset 10:#include "base/strings/strcat.h"
        Greg Thompson . resolved

        This `#include` doesn't seem to be used in this file and can be safely removed.

        Danilo Françoso Tedeschi

        Done

        Line 18, Patchset 10:// Finds the position of start of the first extension.
        Greg Thompson . resolved

        nit: remove extra space

        Danilo Françoso Tedeschi

        Done

        Line 33, Patchset 10:// Returns the extension keyed by the singleton <ext_id>. It only returns the
        Greg Thompson . resolved

        maybe rephrase to "Returns the subtags for the extension identified by the singleton `ext_id`." (using backticks rather than angle brackets around the argument name as per https://google.github.io/styleguide/cppguide.html#Function_Comments.)

        Danilo Françoso Tedeschi

        Done

        Line 35, Patchset 10:std::string_view GetExtensionValue(std::string_view code, char ext_id) {
        Greg Thompson . resolved

        i wonder if a good way to structure this is to have a primitive similar to `FindFirstExtension` that operates on a string view and returns either the offset of the first singleton found (i.e., the middle character in the discovered '-S-' sequence) or npos (no singleton found). maybe call this `FindNextSingleton`. then this function can use it repeatedly until it finds the singleton that equals `ext_id`; aborting if it finds "x" or an empty view. when it finds a match, it calls `FindNextSingleton` one more time to find the start of the next singleton. if this isn't npos and is > this_ext + 2, we return the substr from this_ext + 2 to next_ext. or something like that. i don't think we need `needle` in this case -- we just walk one by one through each singleton until we find the one we want, and return everything up to the next singleton (with special handling for "x"). wdyt? would this simplify things at all?

        (i don't think we'd need/want the "Skip the first two characters..." part of the fn above.)

        Danilo Françoso Tedeschi

        Let me know what you think of the approach I implemented.

        Line 36, Patchset 10: std::string needle = ext_id + std::string("-");
        Greg Thompson . resolved
        You can avoid the string heap allocation here by using a `std::string_view` over a local char array:
        ```cpp
        const char needle_chars[] = {ext_id, '-'};
        std::string_view needle(needle_chars, 2u);
        ```
        Danilo Françoso Tedeschi

        Done

        Line 53, Patchset 10: CHECK_GT(next_extension_pos, 0u)
        Greg Thompson . resolved

        crashing on invalid inputs seems problematic. should we not simply reject/ignore them?

        it might be nice to reference the spec (e.g., "BCP47 2.2.6. says that each extension subtag MUST be from two to eight characters long")

        Danilo Françoso Tedeschi

        Done

        Line 155, Patchset 10: std::string_view extension_value = GetExtensionValue(code_.AsString(), key);
        Greg Thompson . resolved

        maybe `GetExtensionSubtags` is more aligned with the language of the spec.

        Danilo Françoso Tedeschi

        I called it GetExtensionString as I changed the code to include the singleton in the output of it.

        Open in Gerrit

        Related details

        Attention is currently required from:
        • Greg Thompson
        • Manish Goregaokar
        Submit Requirements:
        • requirement satisfiedCode-Coverage
        • requirement satisfiedCode-Owners
        • requirement is not satisfiedCode-Review
        • requirement is not satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: chromium/src
        Gerrit-Branch: main
        Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
        Gerrit-Change-Number: 7883222
        Gerrit-PatchSet: 13
        Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
        Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Greg Thompson <g...@chromium.org>
        Gerrit-Comment-Date: Mon, 01 Jun 2026 17:03:04 +0000
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy

        Danilo Françoso Tedeschi (Gerrit)

        unread,
        1:11 PM (4 hours ago) 1:11 PM
        to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org
        Attention needed from Greg Thompson and Manish Goregaokar

        Danilo Françoso Tedeschi added 1 comment

        Danilo Françoso Tedeschi . resolved

        Created a bug for ICU4x not accepting digits as extension singletons: https://github.com/unicode-org/icu4x/issues/8018

        Open in Gerrit

        Related details

        Attention is currently required from:
        • Greg Thompson
        • Manish Goregaokar
        Submit Requirements:
        • requirement satisfiedCode-Coverage
        • requirement satisfiedCode-Owners
        • requirement is not satisfiedCode-Review
        • requirement is not satisfiedNo-Unresolved-Comments
        • requirement is not satisfiedReview-Enforcement
        Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
        Gerrit-MessageType: comment
        Gerrit-Project: chromium/src
        Gerrit-Branch: main
        Gerrit-Change-Id: I09c7b4df56fe985b46888f83c46877ff0593af00
        Gerrit-Change-Number: 7883222
        Gerrit-PatchSet: 14
        Gerrit-Owner: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Danilo Françoso Tedeschi <da...@google.com>
        Gerrit-Reviewer: Greg Thompson <g...@chromium.org>
        Gerrit-Reviewer: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Manish Goregaokar <manis...@google.com>
        Gerrit-Attention: Greg Thompson <g...@chromium.org>
        Gerrit-Comment-Date: Mon, 01 Jun 2026 17:11:05 +0000
        Gerrit-HasComments: Yes
        Gerrit-Has-Labels: No
        satisfied_requirement
        unsatisfied_requirement
        open
        diffy
        Reply all
        Reply to author
        Forward
        0 new messages