WTF::String and spans / std::string_views

16 views

Skip to first unread message

Daniel Cheng

unread,

Jun 5, 2025, 8:53:11 PMJun 5

to platform-architecture-dev, TAMURA, Kent

I've been reviewing a lot of unsafe buffer changes, and I noticed WTF::String often uses spans of characters instead of string_views. Does anyone know why we do this? I (and dtapuska@ had the same intuition) guessed that maybe this is because WTF::String's 8-bit representation is Latin1 and not ASCII (and therefore is not UTF-8), while std::string_view is implicitly UTF-8.

However, elsewhere, there's a lot of assumption that spans of chars and std::string_views are freely convertible, and span.h itself assumes that spans of chars are printable as UTF-8. So this code snippet:

if (it != StaticStrings().end()) {
DCHECK_EQ(it->value->Span8(), base::as_bytes(string));
return it->value;
}

ends up treating the underlying span as UTF-8 anyway when it's logged: https://source.chromium.org/chromium/chromium/src/+/main:base/containers/span.h;l=1511;drc=2f48689d5394c1efb5d42b5e5e3d8ca763e17cd1

It also makes APIs harder to use. For example, constructing WTF::String from literals is very common, and using base::span means we have:
explicit String(base::span<const char> latin1_data)
: String(base::as_bytes(latin1_data)) {}
explicit String(const std::string& s) : String(base::as_byte_span(s)) {}
String(const char* characters) // NOLINT(google-explicit-constructor)
: String(characters ? base::span(std::string_view(characters))
: base::span<const char>()) {}

But all of these could be replaced with a single WTF::String(std::string_view) overload.

There are some trickier cases with embedded NULs, but those don't need a span either: we have MakeStringViewWithNulChars() to create a std::string_view with NULs (this is error prone, but we can and should add a clang-tidy check when constructing string_views from literals with embedded NULs).

Using spans also introduces more conversions with other code that *doesn't* use spans, e.g. url_parse uses std::string_view, not spans.

1. Should we be using std::string_view / std::u16string_view more instead of spans?

2. If we're concerned about ASCII vs Latin1 vs UTF8, should we add checks / encourage more factory functions, e.g. WTF::String::FromAscii / WTF::String::FromLatin1 / WTF::String::FromUTF8?

Daniel

TAMURA, Kent

unread,

Jun 11, 2025, 11:43:17 PMJun 11

to Daniel Cheng, platform-architecture-dev

If using std::*string_view simplifies the source code without negatively impacting the efficiency of the compiled code, then I think it would be a good idea to proceed with that.

The current code's focus on base::span is a direct result of simply replacing pointer+size arguments with base::span.

2. If we're concerned about ASCII vs Latin1 vs UTF8, should we add checks / encourage more factory functions, e.g. WTF::String::FromAscii / WTF::String::FromLatin1 / WTF::String::FromUTF8?

+1 to the idea of encouraging factory functions though I'd like to keep the implicit conversion from string literals.

Daniel

TAMURA Kent
Software Engineer, Google

Reply all

Reply to author

Forward

0 new messages