WTF::String and spans / std::string_views

14 views
Skip to first unread message

Daniel Cheng

unread,
Jun 5, 2025, 8:53:11 PMJun 5
to platform-architecture-dev, TAMURA, Kent
I've been reviewing a lot of unsafe buffer changes, and I noticed WTF::String often uses spans of characters instead of string_views. Does anyone know why we do this? I (and dtapuska@ had the same intuition) guessed that maybe this is because WTF::String's 8-bit representation is Latin1 and not ASCII (and therefore is not UTF-8), while std::string_view is implicitly UTF-8.

However, elsewhere, there's a lot of assumption that spans of chars and std::string_views are freely convertible, and span.h itself assumes that spans of chars are printable as UTF-8. So this code snippet:

  if (it != StaticStrings().end()) {
    DCHECK_EQ(it->value->Span8(), base::as_bytes(string));
    return it->value;
  }


It also makes APIs harder to use. For example, constructing WTF::String from literals is very common, and using base::span means we have:
  explicit String(base::span<const char> latin1_data)
      : String(base::as_bytes(latin1_data)) {}
  explicit String(const std::string& s) : String(base::as_byte_span(s)) {}
  String(const char* characters)  // NOLINT(google-explicit-constructor)
      : String(characters ? base::span(std::string_view(characters))
                          : base::span<const char>()) {}

But all of these could be replaced with a single WTF::String(std::string_view) overload.

There are some trickier cases with embedded NULs, but those don't need a span either: we have MakeStringViewWithNulChars() to create a std::string_view with NULs (this is error prone, but we can and should add a clang-tidy check when constructing string_views from literals with embedded NULs).

Using spans also introduces more conversions with other code that *doesn't* use spans, e.g. url_parse uses std::string_view, not spans.

1. Should we be using std::string_view / std::u16string_view more instead of spans?
2. If we're concerned about ASCII vs Latin1 vs UTF8, should we add checks / encourage more factory functions, e.g. WTF::String::FromAscii / WTF::String::FromLatin1 / WTF::String::FromUTF8?

Daniel

TAMURA, Kent

unread,
Jun 11, 2025, 11:43:17 PMJun 11
to Daniel Cheng, platform-architecture-dev
If using std::*string_view simplifies the source code without negatively impacting the efficiency of the compiled code, then I think it would be a good idea to proceed with that.

The current code's focus on base::span is a direct result of simply replacing pointer+size arguments with base::span.
 
2. If we're concerned about ASCII vs Latin1 vs UTF8, should we add checks / encourage more factory functions, e.g. WTF::String::FromAscii / WTF::String::FromLatin1 / WTF::String::FromUTF8?

+1 to the idea of encouraging factory functions though I'd like to keep the implicit conversion from string literals.
 

Daniel


--
TAMURA Kent
Software Engineer, Google


Reply all
Reply to author
Forward
0 new messages