I've been reviewing a lot of unsafe buffer changes, and I noticed WTF::String often uses spans of characters instead of string_views. Does anyone know why we do this? I (and dtapuska@ had the same intuition) guessed that maybe this is because WTF::String's 8-bit representation is Latin1 and not ASCII (and therefore is not UTF-8), while std::string_view is implicitly UTF-8.
However, elsewhere, there's a lot of assumption that spans of chars and std::string_views are freely convertible, and span.h itself assumes that spans of chars are printable as UTF-8. So this
code snippet:
if (it != StaticStrings().end()) {
DCHECK_EQ(it->value->Span8(), base::as_bytes(string));
return it->value;
}
It also makes APIs harder to use. For example, constructing WTF::String from literals is very common, and using base::span means we have:
explicit String(base::span<const char> latin1_data)
: String(base::as_bytes(latin1_data)) {}
explicit String(const std::string& s) : String(base::as_byte_span(s)) {}
String(const char* characters) // NOLINT(google-explicit-constructor)
: String(characters ? base::span(std::string_view(characters))
: base::span<const char>()) {}
But all of these could be replaced with a single WTF::String(std::string_view) overload.
There are some trickier cases with embedded NULs, but those don't need a span either: we have MakeStringViewWithNulChars() to create a std::string_view with NULs (this is error prone, but we can and should add a clang-tidy check when constructing string_views from literals with embedded NULs).
Using spans also introduces more conversions with other code that *doesn't* use spans, e.g. url_parse uses std::string_view, not spans.
1. Should we be using std::string_view / std::u16string_view more instead of spans?
2. If we're concerned about ASCII vs Latin1 vs UTF8, should we add checks / encourage more factory functions, e.g. WTF::String::FromAscii / WTF::String::FromLatin1 / WTF::String::FromUTF8?
Daniel