Manish Goregaokar (Gerrit)

unread,

May 28, 2026, 4:54:23 PM (4 days ago) May 28

to Danilo Françoso Tedeschi, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Danilo Françoso Tedeschi and Greg Thompson

Manish Goregaokar added 1 comment

Patchset-level comments

File-level comment, Patchset 5 (Latest):

Manish Goregaokar . resolved

ICU4X has APIs for this but I guess you don't want to have to reparse

Open in Gerrit

Related details

Attention is currently required from:

Danilo Françoso Tedeschi
Greg Thompson

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Manish Goregaokar (Gerrit)

unread,

May 28, 2026, 5:21:57 PM (4 days ago) May 28

to Danilo Françoso Tedeschi, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Danilo Françoso Tedeschi and Greg Thompson

Manish Goregaokar voted Code-Review+1

Code-Review

+1

Open in Gerrit

Related details

Attention is currently required from:

Danilo Françoso Tedeschi
Greg Thompson

Submit Requirements:

Code-Coverage
Code-Owners

Code-Review
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

open

diffy

Greg Thompson (Gerrit)

unread,

May 29, 2026, 6:13:11 AM (3 days ago) May 29

to Danilo Françoso Tedeschi, Manish Goregaokar, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Danilo Françoso Tedeschi

Greg Thompson added 16 comments

File base/i18n/bcp47_extensions.h

Line 85, Patchset 6 (Latest): static_assert(c >= 'a' && c <= 'z', "Invalid BCP47 extension identifier");

Greg Thompson . unresolved

Gemini says: BCP47 allows singletons to be digits (`0-9`) in addition to letters.
This `static_assert` should be updated to allow digit singletons:
```cpp
static_assert((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'), "Invalid BCP47 extension identifier");
```

Greg says: this appears to be correct as per https://www.rfc-editor.org/info/rfc5646/#section-2.1

Line 29, Patchset 6 (Latest): std::string_view AsStringView() const { return value_; }

Greg Thompson . unresolved

how about just `value()`?

Line 27, Patchset 6 (Latest): explicit Extension(base::PassKey<base::LanguageCode>, std::string value)

Greg Thompson . unresolved

nit: omit `explicit` for multi-arg ctors

File base/i18n/language_code.h

Line 89, Patchset 6 (Latest): // LanguageCodeBuilder::GetInstance().FromString("en-US-u-ca-gregory");

Greg Thompson . resolved

that's me! :-)

File base/i18n/language_code.cc

Line 15, Patchset 6 (Latest):std::string_view FindExtension(std::string_view code, char ext_id) {

Greg Thompson . unresolved

nit: blank line after namespace open

Line 16, Patchset 6 (Latest): std::string needle = StrCat({"-", std::string_view(&ext_id, 1), "-"});

Greg Thompson . unresolved

Gemini says: Using the address of a local `char` to create a `std::string_view` is fragile and generally considered an anti-pattern, even if it's technically memory-safe here because it's consumed immediately.

Consider using `std::string` directly:
```cpp
std::string needle = base::StrCat({"-", std::string(1, ext_id), "-"});
```
Or simply:
```cpp
std::string needle = std::string("-") + ext_id + "-";
```

Greg says: I had a similar thought when i read this, so i think it's a good idea to change it.

Line 20, Patchset 6 (Latest): return "";

Greg Thompson . unresolved

optional nit: it amounts to the same thing, but how about `{}` here to mean "empty string view"

Line 23, Patchset 6 (Latest): for (size_t i = pos + 1; i + 2 < code.size(); i++) {

Greg Thompson . unresolved

nit: i think it's slightly more readable to do `code = code.substr(pos);` or `std::string_view extension = code.substr(pos);` just before this and adjust the code below accordingly.

Line 23, Patchset 6 (Latest): for (size_t i = pos + 1; i + 2 < code.size(); i++) {

Greg Thompson . unresolved

start at index `3`? we don't need to look at any of the characters matched by `needle`.

Line 25, Patchset 6 (Latest): if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&

Greg Thompson . unresolved

Gemini says: BCP47 extension singletons can be digits (`0-9`) as per RFC 5646 section 2.1.
Using `base::IsAsciiAlpha` will fail to recognize digit singletons (e.g., `-1-`), causing it to incorrectly include the next extension as part of the current one.

You should use `base::IsAsciiAlphaNumeric` (or `absl::ascii_isalnum`) instead.

Line 25, Patchset 6 (Latest):

    if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&
        code[i + 2] == '-') {

Greg Thompson . unresolved

nit: swap the order of these last two terms so that the most expensive check is last.

Line 27, Patchset 6 (Latest): return code.substr(pos, i - pos);

Greg Thompson . unresolved

Gemini says (and Greg has not validated, so please verify its claim): Please fix this WARNING reported by autoreview issue finding: There's a critical issue here regarding the private use extension (`x`).

Unlike regular extensions, `x` allows subtags of length 1 (e.g., `en-US-x-a-b`).
If `FindExtension` is looking for `x`, this loop will incorrectly stop at `-a-` and truncate the private use value.
Furthermore, if you are searching for `a`, `code.find(needle)` might accidentally find `-a-` *inside* the private use extension!

To fix both issues, you can take advantage of the fact that `x` is always the last extension:
```cpp
  size_t x_pos = code.find("-x-");
  std::string_view search_space = code;
  if (x_pos != std::string_view::npos && ext_id != 'x') {
    // Ignore everything inside the private use extension.
    search_space = code.substr(0, x_pos);
  }

  std::string needle = std::string("-") + ext_id + "-";
  size_t pos = search_space.find(needle);

  if (pos == std::string_view::npos) {
    return "";
  }

  // 'x' is always the last extension, so it spans to the end of the string.
  if (ext_id == 'x') {
    return search_space.substr(pos);
  }

  for (size_t i = pos + 1; i + 2 < search_space.size(); i++) {
    if (search_space[i] == '-' && base::IsAsciiAlphaNumeric(search_space[i + 1]) &&
        search_space[i + 2] == '-') {
      return search_space.substr(pos, i - pos);
    }
  }

  return search_space.substr(pos);
```

Line 123, Patchset 6 (Latest): if (extension.size() < 3) {

Greg Thompson . unresolved

nit: "3" happens a lot. consider something like `kExtensionPrefixLength` or something in an unnamed namespace at the top of the file.

Line 136, Patchset 6 (Latest): return R(base::PassKey<LanguageCode>(), std::move(*value));

Greg Thompson . unresolved

`*std::move(value)` as per go/totw/181#moving-from-the-value-of-an-abslstatusor

File base/i18n/language_code_unittest.cc

File-level comment, Patchset 6 (Latest):

Greg Thompson . unresolved

Could you add some tests for malformed inputs?

Line 355, Patchset 6 (Latest): "en-US-u-ca-gregory-x-private-a-test");

Greg Thompson . unresolved

Gemini says (and, again, Greg has not validated): This test implies a misunderstanding of BCP47.

Because the private use extension (`x`) is always the last subtag sequence and it allows 1-character subtags, `-a-test` is actually a continuation of the `x` extension, NOT a new extension `a`.

The value of `x` here should be `private-a-test`, and `ext<'a'>()` should return `std::nullopt`.

(This test currently passes only because of the buggy parsing logic in `FindExtension` which incorrectly stops at 1-character subtags inside the `x` extension).

Open in Gerrit

Related details

Attention is currently required from:

Danilo Françoso Tedeschi

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review

No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Danilo Françoso Tedeschi (Gerrit)

unread,

May 29, 2026, 4:04:27 PM (3 days ago) May 29

to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Greg Thompson and Manish Goregaokar

Danilo Françoso Tedeschi added 15 comments

File base/i18n/bcp47_extensions.h

Line 85, Patchset 6: static_assert(c >= 'a' && c <= 'z', "Invalid BCP47 extension identifier");

Greg Thompson . resolved

Gemini says: BCP47 allows singletons to be digits (`0-9`) in addition to letters.
This `static_assert` should be updated to allow digit singletons:
```cpp
static_assert((c >= 'a' && c <= 'z') || (c >= '0' && c <= '9'), "Invalid BCP47 extension identifier");
```
Greg says: this appears to be correct as per https://www.rfc-editor.org/info/rfc5646/#section-2.1

Danilo Françoso Tedeschi

Done

Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }

Greg Thompson . unresolved

how about just `value()`?

Danilo Françoso Tedeschi

dunno, for it .value() is too much associated with optionality, don't you think?

Line 27, Patchset 6: explicit Extension(base::PassKey<base::LanguageCode>, std::string value)

Greg Thompson . resolved

nit: omit `explicit` for multi-arg ctors

Danilo Françoso Tedeschi

Done

File base/i18n/language_code.cc

Line 15, Patchset 6:std::string_view FindExtension(std::string_view code, char ext_id) {

Greg Thompson . resolved

nit: blank line after namespace open

Danilo Françoso Tedeschi

Done

Line 16, Patchset 6: std::string needle = StrCat({"-", std::string_view(&ext_id, 1), "-"});

Greg Thompson . resolved

Gemini says: Using the address of a local `char` to create a `std::string_view` is fragile and generally considered an anti-pattern, even if it's technically memory-safe here because it's consumed immediately.
Consider using `std::string` directly:
```cpp
std::string needle = base::StrCat({"-", std::string(1, ext_id), "-"});
```
Or simply:
```cpp
std::string needle = std::string("-") + ext_id + "-";
```
Greg says: I had a similar thought when i read this, so i think it's a good idea to change it.

Danilo Françoso Tedeschi

Done

Line 20, Patchset 6: return "";

Greg Thompson . resolved

optional nit: it amounts to the same thing, but how about `{}` here to mean "empty string view"

Danilo Françoso Tedeschi

Done

Line 23, Patchset 6: for (size_t i = pos + 1; i + 2 < code.size(); i++) {

Greg Thompson . resolved

nit: i think it's slightly more readable to do `code = code.substr(pos);` or `std::string_view extension = code.substr(pos);` just before this and adjust the code below accordingly.

Danilo Françoso Tedeschi

Done

Line 23, Patchset 6: for (size_t i = pos + 1; i + 2 < code.size(); i++) {

Greg Thompson . resolved

start at index `3`? we don't need to look at any of the characters matched by `needle`.

Danilo Françoso Tedeschi

Done

Line 25, Patchset 6: if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&

Greg Thompson . resolved

Gemini says: BCP47 extension singletons can be digits (`0-9`) as per RFC 5646 section 2.1.
Using `base::IsAsciiAlpha` will fail to recognize digit singletons (e.g., `-1-`), causing it to incorrectly include the next extension as part of the current one.
You should use `base::IsAsciiAlphaNumeric` (or `absl::ascii_isalnum`) instead.

Danilo Françoso Tedeschi

Done

Line 25, Patchset 6: if (code[i] == '-' && base::IsAsciiAlpha(code[i + 1]) &&


        code[i + 2] == '-') {

Greg Thompson . resolved

nit: swap the order of these last two terms so that the most expensive check is last.

Danilo Françoso Tedeschi

Done

Line 27, Patchset 6: return code.substr(pos, i - pos);

Greg Thompson . resolved

Gemini says (and Greg has not validated, so please verify its claim): Please fix this WARNING reported by autoreview issue finding: There's a critical issue here regarding the private use extension (`x`).
Unlike regular extensions, `x` allows subtags of length 1 (e.g., `en-US-x-a-b`).
If `FindExtension` is looking for `x`, this loop will incorrectly stop at `-a-` and truncate the private use value.
Furthermore, if you are searching for `a`, `code.find(needle)` might accidentally find `-a-` *inside* the private use extension!
To fix both issues, you can take advantage of the fact that `x` is always the last extension:
```cpp
  size_t x_pos = code.find("-x-");
  std::string_view search_space = code;
  if (x_pos != std::string_view::npos && ext_id != 'x') {
    // Ignore everything inside the private use extension.
    search_space = code.substr(0, x_pos);
  }
  std::string needle = std::string("-") + ext_id + "-";
  size_t pos = search_space.find(needle);
  if (pos == std::string_view::npos) {
    return "";
  }
  // 'x' is always the last extension, so it spans to the end of the string.
  if (ext_id == 'x') {
    return search_space.substr(pos);
  }
  for (size_t i = pos + 1; i + 2 < search_space.size(); i++) {
    if (search_space[i] == '-' && base::IsAsciiAlphaNumeric(search_space[i + 1]) &&
        search_space[i + 2] == '-') {
      return search_space.substr(pos, i - pos);
    }
  }
  return search_space.substr(pos);
```

Danilo Françoso Tedeschi

very nice catch. I rewrote it a little differently, let me know what you think.

Line 123, Patchset 6: if (extension.size() < 3) {

Greg Thompson . resolved

nit: "3" happens a lot. consider something like `kExtensionPrefixLength` or something in an unnamed namespace at the top of the file.

Danilo Françoso Tedeschi

Done

Line 136, Patchset 6: return R(base::PassKey<LanguageCode>(), std::move(*value));

Greg Thompson . resolved

`*std::move(value)` as per go/totw/181#moving-from-the-value-of-an-abslstatusor

Danilo Françoso Tedeschi

Done

File base/i18n/language_code_unittest.cc

File-level comment, Patchset 6:

Greg Thompson . resolved

Could you add some tests for malformed inputs?

Danilo Françoso Tedeschi

Done

Line 355, Patchset 6: "en-US-u-ca-gregory-x-private-a-test");

Greg Thompson . resolved

Gemini says (and, again, Greg has not validated): This test implies a misunderstanding of BCP47.
Because the private use extension (`x`) is always the last subtag sequence and it allows 1-character subtags, `-a-test` is actually a continuation of the `x` extension, NOT a new extension `a`.
The value of `x` here should be `private-a-test`, and `ext<'a'>()` should return `std::nullopt`.
(This test currently passes only because of the buggy parsing logic in `FindExtension` which incorrectly stops at 1-character subtags inside the `x` extension).

Danilo Françoso Tedeschi

Done

Open in Gerrit

Related details

Attention is currently required from:

Greg Thompson
Manish Goregaokar

Submit Requirements:

Code-Coverage
Code-Owners

Code-Review

No-Unresolved-Comments

Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Greg Thompson (Gerrit)

unread,

4:08 AM (13 hours ago) 4:08 AM

to Danilo Françoso Tedeschi, Manish Goregaokar, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Danilo Françoso Tedeschi and Manish Goregaokar

Greg Thompson added 14 comments

Commit Message

Line 15, Patchset 10 (Latest):- New API: Adds LanguageCode::GetExtension<T>(traits) to retrieve specific extension types.

Greg Thompson . unresolved

nit: please wrap cl descriptions at 72 cols as per https://chromium.googlesource.com/chromium/src/+/main/docs/contributing.md#uploading-a-change-for-review

File base/i18n/bcp47_extensions.h

Line 41, Patchset 10 (Latest):// Represents a Unicode BCP47 extension ('u-').

Greg Thompson . unresolved

...and this holds the subtags for the Unicode extension in a language tag.

Line 22, Patchset 10 (Latest):// Represents a generic BCP47 extension (e.g., 'x-' for private use).

Greg Thompson . unresolved

to be precise: an instance of this holds an extension's subtags in a language tag, right?

Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }

Greg Thompson . unresolved

how about just `value()`?

Danilo Françoso Tedeschi

dunno, for it .value() is too much associated with optionality, don't you think?

Greg Thompson

let's look at bcp47 to see what it calls this... https://www.rfc-editor.org/info/rfc5646/#section-2.2.6 calls the stuff that follows the single-character extension identifier the "extension subtags" (see items 5 and 6 -- 5 says "Each extension subtag MUST be from two to eight characters long..." and 6 says "Each singleton MUST be followed by at least one extension subtag."). so how about renaming `value_` to `subtags_` and call this getter `subtags()`? this seems most consistent with the language of the spec.

File base/i18n/language_code.h

Line 78, Patchset 10 (Latest): // Retrieves a BCP47 extension subtag from the language code.

Greg Thompson . unresolved

nit: the blob after a singleton is one or more subtags, so maybe say "Retrieves the subtag(s) for an extension to a BCP47 language tag."

Line 39, Patchset 10 (Latest):class BASE_I18N_EXPORT LanguageCode {

Greg Thompson . unresolved

now that i've been scrutinizing BCP47, i think this class is more aligned with what the spec calls a "Language Tag". the spec uses the term "code" to refer to the actual value of any subtag in a language tag (e.g., "en" is the language code for English; "US" is the region code for the US, etc). maybe too late to rename the class now, but perhaps it's worth clarifying in the doc comments.

File base/i18n/language_code.cc

Line 10, Patchset 10 (Latest):#include "base/strings/strcat.h"

Greg Thompson . unresolved

This `#include` doesn't seem to be used in this file and can be safely removed.

Line 18, Patchset 10 (Latest):// Finds the position of start of the first extension.

Greg Thompson . unresolved

nit: remove extra space

Line 33, Patchset 10 (Latest):// Returns the extension keyed by the singleton <ext_id>. It only returns the

Greg Thompson . unresolved

maybe rephrase to "Returns the subtags for the extension identified by the singleton `ext_id`." (using backticks rather than angle brackets around the argument name as per https://google.github.io/styleguide/cppguide.html#Function_Comments.)

Line 35, Patchset 10 (Latest):std::string_view GetExtensionValue(std::string_view code, char ext_id) {

Greg Thompson . unresolved

i wonder if a good way to structure this is to have a primitive similar to `FindFirstExtension` that operates on a string view and returns either the offset of the first singleton found (i.e., the middle character in the discovered '-S-' sequence) or npos (no singleton found). maybe call this `FindNextSingleton`. then this function can use it repeatedly until it finds the singleton that equals `ext_id`; aborting if it finds "x" or an empty view. when it finds a match, it calls `FindNextSingleton` one more time to find the start of the next singleton. if this isn't npos and is > this_ext + 2, we return the substr from this_ext + 2 to next_ext. or something like that. i don't think we need `needle` in this case -- we just walk one by one through each singleton until we find the one we want, and return everything up to the next singleton (with special handling for "x"). wdyt? would this simplify things at all?

(i don't think we'd need/want the "Skip the first two characters..." part of the fn above.)

Line 36, Patchset 10 (Latest): std::string needle = ext_id + std::string("-");

Greg Thompson . unresolved

You can avoid the string heap allocation here by using a `std::string_view` over a local char array:
```cpp
  const char needle_chars[] = {ext_id, '-'};
  std::string_view needle(needle_chars, 2u);
```

Line 53, Patchset 10 (Latest): CHECK_GT(next_extension_pos, 0u)

Greg Thompson . unresolved

crashing on invalid inputs seems problematic. should we not simply reject/ignore them?

it might be nice to reference the spec (e.g., "BCP47 2.2.6. says that each extension subtag MUST be from two to eight characters long")

Line 155, Patchset 10 (Latest): std::string_view extension_value = GetExtensionValue(code_.AsString(), key);

Greg Thompson . unresolved

maybe `GetExtensionSubtags` is more aligned with the language of the spec.

File base/i18n/language_code_unittest.cc

Line 130, Patchset 10 (Latest): ... EXPECT_THAT(

Greg Thompson . unresolved

remove "... "

Open in Gerrit

Related details

Attention is currently required from:

Danilo Françoso Tedeschi
Manish Goregaokar

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Danilo Françoso Tedeschi (Gerrit)

unread,

11:44 AM (5 hours ago) 11:44 AM

to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Greg Thompson and Manish Goregaokar

Danilo Françoso Tedeschi added 7 comments

Commit Message

Line 15, Patchset 10:- New API: Adds LanguageCode::GetExtension<T>(traits) to retrieve specific extension types.

Greg Thompson . resolved

nit: please wrap cl descriptions at 72 cols as per https://chromium.googlesource.com/chromium/src/+/main/docs/contributing.md#uploading-a-change-for-review

Danilo Françoso Tedeschi

formated it.

File base/i18n/bcp47_extensions.h

Line 41, Patchset 10:// Represents a Unicode BCP47 extension ('u-').

Greg Thompson . resolved

...and this holds the subtags for the Unicode extension in a language tag.

Danilo Françoso Tedeschi

Done

Line 22, Patchset 10:// Represents a generic BCP47 extension (e.g., 'x-' for private use).

Greg Thompson . resolved

to be precise: an instance of this holds an extension's subtags in a language tag, right?

Danilo Françoso Tedeschi

Added a method for returning the singleton identifier for the extension. This way we get a proper extension representation from the `Extension` class.

Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }

Greg Thompson . resolved

how about just `value()`?

Danilo Françoso Tedeschi

dunno, for it .value() is too much associated with optionality, don't you think?

Greg Thompson

let's look at bcp47 to see what it calls this... https://www.rfc-editor.org/info/rfc5646/#section-2.2.6 calls the stuff that follows the single-character extension identifier the "extension subtags" (see items 5 and 6 -- 5 says "Each extension subtag MUST be from two to eight characters long..." and 6 says "Each singleton MUST be followed by at least one extension subtag."). so how about renaming `value_` to `subtags_` and call this getter `subtags()`? this seems most consistent with the language of the spec.

Danilo Françoso Tedeschi

Done

Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }

Greg Thompson . resolved

how about just `value()`?

Danilo Françoso Tedeschi

dunno, for it .value() is too much associated with optionality, don't you think?

Greg Thompson

let's look at bcp47 to see what it calls this... https://www.rfc-editor.org/info/rfc5646/#section-2.2.6 calls the stuff that follows the single-character extension identifier the "extension subtags" (see items 5 and 6 -- 5 says "Each extension subtag MUST be from two to eight characters long..." and 6 says "Each singleton MUST be followed by at least one extension subtag."). so how about renaming `value_` to `subtags_` and call this getter `subtags()`? this seems most consistent with the language of the spec.

Danilo Françoso Tedeschi

Done

File base/i18n/language_code_unittest.cc

Line 124, Patchset 11 (Latest):


  // Singletons can be digits (except 'x', 'i').
  EXPECT_THAT(LanguageCodeBuilder::GetInstance().FromString("en-1-myext"),
              Optional(Property(&LanguageCode::ToString, Eq("en-1-myext"))));

Danilo Françoso Tedeschi . resolved

REmoving the possibility of digits as singletons as they are not supported from ICU4x now and are not even considered by the standards.

Line 130, Patchset 10: ... EXPECT_THAT(

Greg Thompson . resolved

remove "... "

Danilo Françoso Tedeschi

Done

Open in Gerrit

Related details

Attention is currently required from:

Greg Thompson
Manish Goregaokar

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Danilo Françoso Tedeschi (Gerrit)

unread,

1:03 PM (4 hours ago) 1:03 PM

to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Greg Thompson and Manish Goregaokar

Danilo Françoso Tedeschi added 10 comments

File base/i18n/bcp47_extensions.h

Line 29, Patchset 6: std::string_view AsStringView() const { return value_; }

Greg Thompson . resolved

how about just `value()`?

Danilo Françoso Tedeschi

dunno, for it .value() is too much associated with optionality, don't you think?

Danilo Françoso Tedeschi

Done

File base/i18n/language_code.h

Line 78, Patchset 10: // Retrieves a BCP47 extension subtag from the language code.

Greg Thompson . resolved

nit: the blob after a singleton is one or more subtags, so maybe say "Retrieves the subtag(s) for an extension to a BCP47 language tag."

Danilo Françoso Tedeschi

Done

Line 39, Patchset 10:class BASE_I18N_EXPORT LanguageCode {

Greg Thompson . unresolved

now that i've been scrutinizing BCP47, i think this class is more aligned with what the spec calls a "Language Tag". the spec uses the term "code" to refer to the actual value of any subtag in a language tag (e.g., "en" is the language code for English; "US" is the region code for the US, etc). maybe too late to rename the class now, but perhaps it's worth clarifying in the doc comments.

Danilo Françoso Tedeschi

I do not think it is too late to rename it. I pick the name just because in google3 that is what they ended up choosing. I am ok with changing it.

the other option would be to call it `Locale` as well, but I also agree `LanguageTag` also sounds more aligned with BCP47.

File base/i18n/language_code.cc

Line 10, Patchset 10:#include "base/strings/strcat.h"

Greg Thompson . resolved

This `#include` doesn't seem to be used in this file and can be safely removed.

Danilo Françoso Tedeschi

Done

Line 18, Patchset 10:// Finds the position of start of the first extension.

Greg Thompson . resolved

nit: remove extra space

Danilo Françoso Tedeschi

Done

Line 33, Patchset 10:// Returns the extension keyed by the singleton <ext_id>. It only returns the

Greg Thompson . resolved

maybe rephrase to "Returns the subtags for the extension identified by the singleton `ext_id`." (using backticks rather than angle brackets around the argument name as per https://google.github.io/styleguide/cppguide.html#Function_Comments.)

Danilo Françoso Tedeschi

Done

Line 35, Patchset 10:std::string_view GetExtensionValue(std::string_view code, char ext_id) {

Greg Thompson . resolved

i wonder if a good way to structure this is to have a primitive similar to `FindFirstExtension` that operates on a string view and returns either the offset of the first singleton found (i.e., the middle character in the discovered '-S-' sequence) or npos (no singleton found). maybe call this `FindNextSingleton`. then this function can use it repeatedly until it finds the singleton that equals `ext_id`; aborting if it finds "x" or an empty view. when it finds a match, it calls `FindNextSingleton` one more time to find the start of the next singleton. if this isn't npos and is > this_ext + 2, we return the substr from this_ext + 2 to next_ext. or something like that. i don't think we need `needle` in this case -- we just walk one by one through each singleton until we find the one we want, and return everything up to the next singleton (with special handling for "x"). wdyt? would this simplify things at all?
(i don't think we'd need/want the "Skip the first two characters..." part of the fn above.)

Danilo Françoso Tedeschi

Let me know what you think of the approach I implemented.

Line 36, Patchset 10: std::string needle = ext_id + std::string("-");

Greg Thompson . resolved

You can avoid the string heap allocation here by using a `std::string_view` over a local char array:
```cpp
  const char needle_chars[] = {ext_id, '-'};
  std::string_view needle(needle_chars, 2u);
```

Danilo Françoso Tedeschi

Done

Line 53, Patchset 10: CHECK_GT(next_extension_pos, 0u)

Greg Thompson . resolved

crashing on invalid inputs seems problematic. should we not simply reject/ignore them?

it might be nice to reference the spec (e.g., "BCP47 2.2.6. says that each extension subtag MUST be from two to eight characters long")

Danilo Françoso Tedeschi

Done

Line 155, Patchset 10: std::string_view extension_value = GetExtensionValue(code_.AsString(), key);

Greg Thompson . resolved

maybe `GetExtensionSubtags` is more aligned with the language of the spec.

Danilo Françoso Tedeschi

I called it GetExtensionString as I changed the code to include the singleton in the output of it.

Open in Gerrit

Related details

Attention is currently required from:

Greg Thompson
Manish Goregaokar

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Danilo Françoso Tedeschi (Gerrit)

unread,

1:11 PM (4 hours ago) 1:11 PM

to Manish Goregaokar, Greg Thompson, Chromium LUCI CQ, chromium...@chromium.org, chromium-a...@chromium.org, extension...@chromium.org, jshin...@chromium.org

Attention needed from Greg Thompson and Manish Goregaokar

Danilo Françoso Tedeschi added 1 comment

Patchset-level comments

File-level comment, Patchset 14 (Latest):

Danilo Françoso Tedeschi . resolved

Created a bug for ICU4x not accepting digits as extension singletons: https://github.com/unicode-org/icu4x/issues/8018

Open in Gerrit

Related details

Attention is currently required from:

Greg Thompson
Manish Goregaokar

Submit Requirements:

Code-Coverage
Code-Owners
Code-Review
No-Unresolved-Comments
Review-Enforcement

Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings.

Gerrit

satisfied_requirement

unsatisfied_requirement

open

diffy

Reply all

Reply to author

Forward

Add BCP47 initial extension support to LanguageCode [chromium/src : main]

Manish Goregaokar (Gerrit)

Manish Goregaokar added 1 comment

Related details

Manish Goregaokar (Gerrit)

Manish Goregaokar voted Code-Review+1

Related details

Greg Thompson (Gerrit)

Greg Thompson added 16 comments

Related details

Danilo Françoso Tedeschi (Gerrit)

Danilo Françoso Tedeschi added 15 comments

Related details

Greg Thompson (Gerrit)

Greg Thompson added 14 comments

Related details

Danilo Françoso Tedeschi (Gerrit)

Danilo Françoso Tedeschi added 7 comments

Related details

Danilo Françoso Tedeschi (Gerrit)

Danilo Françoso Tedeschi added 10 comments

Related details

Danilo Françoso Tedeschi (Gerrit)

Danilo Françoso Tedeschi added 1 comment

Related details