string_view::is_null()

2,461 views
Skip to first unread message

abolz...@gmail.com

unread,
Dec 27, 2013, 7:52:12 AM12/27/13
to std-pr...@isocpp.org

Is there a reason why string_view is not nullable?

Nullable here means default constructed or constructed from a nullptr
(and a zero length). Then adding a is_null() (or null()) member
function which can then be used to distinguish the string_view from an
empty string.

This should not affect data() which returns a non-null pointer.

Marshall Clow

unread,
Dec 28, 2013, 12:01:34 AM12/28/13
to std-pr...@isocpp.org
On Dec 27, 2013, at 4:52 AM, abolz...@gmail.com wrote:

Is there a reason why string_view is not nullable?

Nullable here means default constructed or constructed from a nullptr
(and a zero length). Then adding a is_null() (or null()) member
function which can then be used to distinguish the string_view from an
empty string.

i think that attempting to distinguish between an empty string_view that “used to point at something”
and an empty string view “that never pointed at anything” is a use case to be discouraged, rather than
something that should be enshrined in the standard.

This should not affect data() which returns a non-null pointer.

I disagree here with the proposal as currently written.
I believe that the requirement that data() always return a non-null pointer is misguided.

if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

An empty string view need not point *anywhere*, or have any data behind it.

— Marshall

P.S. In my implementation, I’m tempted to have a default-constructed string_view return a value of 1 from data().

Jeffrey Yasskin

unread,
Dec 28, 2013, 12:59:37 PM12/28/13
to std-pr...@isocpp.org
On Fri, Dec 27, 2013 at 9:01 PM, Marshall Clow <mclow...@gmail.com> wrote:
On Dec 27, 2013, at 4:52 AM, abolz...@gmail.com wrote:

Is there a reason why string_view is not nullable?

Nullable here means default constructed or constructed from a nullptr
(and a zero length). Then adding a is_null() (or null()) member
function which can then be used to distinguish the string_view from an
empty string.

i think that attempting to distinguish between an empty string_view that “used to point at something”
and an empty string view “that never pointed at anything” is a use case to be discouraged, rather than
something that should be enshrined in the standard.

This should not affect data() which returns a non-null pointer.

I disagree here with the proposal as currently written.
I believe that the requirement that data() always return a non-null pointer is misguided. 

if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

The thing is, if nullptr is a possible return value for data(), people do start using it to convey information. If they get a null out exactly when they put a null in, it is telling them something useful. The only way to "discourage" it is to make it never happen. If we let string_view::data() pass through null, we're enshrining the use you don't like in the standard.
 

An empty string view need not point *anywhere*, or have any data behind it.

— Marshall

P.S. In my implementation, I’m tempted to have a default-constructed string_view return a value of 1 from data().

That's totally fine on all the platforms I've ever used. There are a few I've heard of where it could be a problem, but no need to worry about those until your implementation gets ported to them.

Peter Bigot

unread,
Dec 28, 2013, 2:21:12 PM12/28/13
to std-pr...@isocpp.org

On Friday, December 27, 2013 11:01:34 PM UTC-6, Marshall wrote:
On Dec 27, 2013, at 4:52 AM, abolz...@gmail.com wrote:

Is there a reason why string_view is not nullable?

Nullable here means default constructed or constructed from a nullptr
(and a zero length). Then adding a is_null() (or null()) member
function which can then be used to distinguish the string_view from an
empty string.

i think that attempting to distinguish between an empty string_view that “used to point at something”
and an empty string view “that never pointed at anything” is a use case to be discouraged, rather than
something that should be enshrined in the standard.

-1 ; I think it's an obvious and valuable feature.

This should not affect data() which returns a non-null pointer.

I disagree here with the proposal as currently written.
I believe that the requirement that data() always return a non-null pointer is misguided.

if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

My understanding (which I'd like confirmed) is that sv.data() is still useful as it provides a valid position within the referenced object, which might be referenced by other string_view instances or in its native form.

Thus, for example, a function object conforming to the delimiters interface documented at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3593.html#delimiters might return a zero-length string to denote a zero-length delimiter (e.g., the position following a period in an English sentence).  In this case it is the value of sv.data() that tells the caller where the delimited sequence ends and the next sequence begins.

 

An empty string view need not point *anywhere*, or have any data behind it.

— Marshall

P.S. In my implementation, I’m tempted to have a default-constructed string_view return a value of 1 from data().


As I understand it there is no expression that can legitimately dereference that pointer, so it sounds like a clean solution to the problem.

Peter

Peter Bigot

unread,
Dec 28, 2013, 2:27:52 PM12/28/13
to std-pr...@isocpp.org


On Saturday, December 28, 2013 1:21:12 PM UTC-6, Peter Bigot wrote:

On Friday, December 27, 2013 11:01:34 PM UTC-6, Marshall wrote:

if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

My understanding (which I'd like confirmed) is that sv.data() is still useful as it provides a valid position within the referenced object, which might be referenced by other string_view instances or in its native form.

Thus, for example, a function object conforming to the delimiters interface documented at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3593.html#delimiters might return a zero-length string to denote a zero-length delimiter (e.g., the position following a period in an English sentence).  In this case it is the value of sv.data() that tells the caller where the delimited sequence ends and the next sequence begins.


Sorry; I misread the details of that proposal: it explicitly uses an empty delimiter to indicate "not found" (too bad; zero-length delimiters would be useful).  It does, though, specify "not found" as an empty string_view where sv.data() is the end of the referenced string, which avoids the need to special-case the end of the last token.

Peter

Marshall Clow

unread,
Dec 29, 2013, 12:35:17 PM12/29/13
to std-pr...@isocpp.org

On Dec 28, 2013, at 11:21 AM, Peter Bigot <big...@acm.org> wrote:


On Friday, December 27, 2013 11:01:34 PM UTC-6, Marshall wrote:
On Dec 27, 2013, at 4:52 AM, abolz...@gmail.com wrote:

Is there a reason why string_view is not nullable?

Nullable here means default constructed or constructed from a nullptr
(and a zero length). Then adding a is_null() (or null()) member
function which can then be used to distinguish the string_view from an
empty string.

i think that attempting to distinguish between an empty string_view that “used to point at something”
and an empty string view “that never pointed at anything” is a use case to be discouraged, rather than
something that should be enshrined in the standard.

-1 ; I think it's an obvious and valuable feature.

This should not affect data() which returns a non-null pointer.

I disagree here with the proposal as currently written.
I believe that the requirement that data() always return a non-null pointer is misguided.

if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

My understanding (which I'd like confirmed) is that sv.data() is still useful as it provides a valid position within the referenced object, which might be referenced by other string_view instances or in its native form.

a string view refers to a half open range (just like an iterator pair).
sv.data() a pointer to the beginning of that range, just like sv.begin() is an iterator to the beginning of that range.

If sv.size() == 0, then dereferencing sv.data() or sv.begin() is undefined behavior.

Also, I can’t find anywhere in the specification does it guarantee that the following code works:

const char *foo = “abc”;
string_view sv1 ( foo, 3 ); // size() == 3
sv1. remove_suffix (3); // size() == 0
assert ( sv1.size() == 0);
assert ( sv1.data() == foo ); // will probably work - but might not.

— Marshall

Peter Bigot

unread,
Dec 29, 2013, 1:15:50 PM12/29/13
to std-pr...@isocpp.org
On Sun, Dec 29, 2013 at 11:35 AM, Marshall Clow <mclow...@gmail.com> wrote:

On Dec 28, 2013, at 11:21 AM, Peter Bigot <big...@acm.org> wrote:


On Friday, December 27, 2013 11:01:34 PM UTC-6, Marshall wrote:
On Dec 27, 2013, at 4:52 AM, abolz...@gmail.com wrote:

This should not affect data() which returns a non-null pointer.

I disagree here with the proposal as currently written.
I believe that the requirement that data() always return a non-null pointer is misguided.

if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

My understanding (which I'd like confirmed) is that sv.data() is still useful as it provides a valid position within the referenced object, which might be referenced by other string_view instances or in its native form.

a string view refers to a half open range (just like an iterator pair).
sv.data() a pointer to the beginning of that range, just like sv.begin() is an iterator to the beginning of that range.

Agreed.
 

If sv.size() == 0, then dereferencing sv.data() or sv.begin() is undefined behavior.

Agreed.

Also, per [string.view.access] the requirements disallow expression sv[0] in this case (unlike str[0] if str is a std::string, where the expression is permitted but the referenced object may not be modified).  A subtlety worth keeping in mind (libstdc++ missed that one).
 

Also, I can’t find anywhere in the specification does it guarantee that the following code works:

const char *foo = “abc”;
string_view sv1 ( foo, 3 ); // size() == 3

The definition of the constructor in section x.1 [string.view.cons] guarantees sv1.data() == foo.
 
sv1. remove_suffix (3); // size() == 0

Per [string.view.modifiers] this is equivalent to sv1 = sv1.substr(0, sv1.size() - 3).

Per [string.view.ops] the definition of substr makes the RHS equivalent to string_view(sv1.data() + 0, sv1.size() - 3).

Same constructor as above, so if that temporary object is tv, tv.data() == sv1.data()+0 == foo+0 == foo.

 
assert ( sv1.size() == 0);
assert ( sv1.data() == foo ); // will probably work - but might not.

AFAICT this must be true.

If my reasoning is wrong, string_view becomes significantly less valuable to me, so do please point out my error.

FWIW my first reading of N3593 was probably correct and zero-length string_view values returned from a delimiter find()  may point into the middle of an input range; they only indicate "not found" when the data() member references the end position of the input.
 
Peter

abolz...@gmail.com

unread,
Dec 29, 2013, 2:15:07 PM12/29/13
to std-pr...@isocpp.org
Yes.
When used as a replacement for "string const&" one should already test for
non-null pointers, I think. The precondition for the constructors just makes
it impossible to use string_view as a pair (char const*, length).

Jeffrey Yasskin

unread,
Dec 29, 2013, 11:07:21 PM12/29/13
to std-pr...@isocpp.org
Yeah, because this all makes sv1 equivalent to string_view(foo, 0),
and that guarantees that data()==foo, the assert()s have to pass.
Hopefully the new version at
https://github.com/google/cxx-std-draft/blob/string-ref-paper/string_view.html,
which describes things in terms of exposition-only data_ and size_
members, makes this clearer.

Jeffrey

Olaf van der Spek

unread,
Jan 7, 2014, 6:23:38 AM1/7/14
to std-pr...@isocpp.org
On Monday, December 30, 2013 5:07:21 AM UTC+1, Jeffrey Yasskin wrote:

Hopefully the new version at
https://github.com/google/cxx-std-draft/blob/string-ref-paper/string_view.html,
which describes things in terms of exposition-only data_ and size_
members, makes this clearer.


Is a HTML view of that document available? 

Jeffrey Yasskin

unread,
Jan 8, 2014, 2:14:24 PM1/8/14
to std-pr...@isocpp.org

Olaf van der Spek

unread,
Jan 15, 2014, 7:20:11 AM1/15/14
to std-pr...@isocpp.org
On Saturday, December 28, 2013 6:59:37 PM UTC+1, Jeffrey Yasskin wrote:
i think that attempting to distinguish between an empty string_view that “used to point at something”
and an empty string view “that never pointed at anything” is a use case to be discouraged, rather than
something that should be enshrined in the standard.

This should not affect data() which returns a non-null pointer.

I disagree here with the proposal as currently written.
I believe that the requirement that data() always return a non-null pointer is misguided. 

if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

The thing is, if nullptr is a possible return value for data(), people do start using it to convey information. If they get a null out exactly when they put a null in, it is telling them something useful. The only way to "discourage" it is to make it never happen. If we let string_view::data() pass through null, we're enshrining the use you don't like in the standard.
 
In some cases being able to distinguish between null and empty is required. string_view could easily support this distinction, what's the reason it should be discouraged?

optional<string_view> is an obvious suggestion but that increases code complexity at the call site when the distinction is not required.

Olaf van der Spek

unread,
Jan 15, 2014, 7:28:39 AM1/15/14
to std-pr...@isocpp.org
On Saturday, December 28, 2013 6:01:34 AM UTC+1, Marshall wrote:
if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

Are you sure?
IMO an empty string_view could still point into a string and I should be able to use that pointer to for example construct a new string_view.

Thiago Macieira

unread,
Jan 15, 2014, 12:22:22 PM1/15/14
to std-pr...@isocpp.org
On quarta-feira, 15 de janeiro de 2014 04:20:11, Olaf van der Spek wrote:
> In some cases being able to distinguish between null and empty is required.
> string_view could easily support this distinction, what's the reason it
> should be discouraged?

Just FYI

Once std::string_view came into focus, I looked into QStringRef to see if it
could be updated for Qt 6 and support the same semantics. Whereas
std::string_view consists of two iterators (begin and end), QStringRef
actually contains a pointer to the original QString, plus an offset and length.
That's what I tried to change.

There were two reasons why that did work: one was specifically the isNull()
method, since the "nullness" is a property of the QString which a pair of
iterators cannot represent. That is, a null QString and an empty-but-null
QString can have the same data pointer.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
signature.asc

Jeffrey Yasskin

unread,
Jan 15, 2014, 1:26:14 PM1/15/14
to std-pr...@isocpp.org
On Wed, Jan 15, 2014 at 9:22 AM, Thiago Macieira <thi...@macieira.org> wrote:
> On quarta-feira, 15 de janeiro de 2014 04:20:11, Olaf van der Spek wrote:
>> In some cases being able to distinguish between null and empty is required.
>> string_view could easily support this distinction, what's the reason it
>> should be discouraged?
>
> Just FYI
>
> Once std::string_view came into focus, I looked into QStringRef to see if it
> could be updated for Qt 6 and support the same semantics. Whereas
> std::string_view consists of two iterators (begin and end), QStringRef
> actually contains a pointer to the original QString, plus an offset and length.
> That's what I tried to change.
>
> There were two reasons why that did work: one was specifically the isNull()
> method, since the "nullness" is a property of the QString which a pair of
> iterators cannot represent. That is, a null QString and an empty-but-null
> QString can have the same data pointer.

Can you look over some of the uses of isNull() in QT and see what
fraction are ones where both empty and null are possible values, and
the distinction is useful rather than just forcing the API to document
which one is used?

Thiago Macieira

unread,
Jan 15, 2014, 2:45:46 PM1/15/14
to std-pr...@isocpp.org
On quarta-feira, 15 de janeiro de 2014 10:26:14, Jeffrey Yasskin wrote:
> Can you look over some of the uses of isNull() in QT and see what
> fraction are ones where both empty and null are possible values, and
> the distinction is useful rather than just forcing the API to document
> which one is used?

The far majority of the cases, people just need the empty case. That's what we
recommend. And unless some API in specific documents the distinction, using
isEmpty() is the right thing to do.

The null option comes in handy when you need to distinguish a field that might
be present but empty from a field that isn't present. QUrl makes use of that
and the equivalent std::networking::uri proposal just uses
std::optional<String> (String is a template). For example:

QUrl url1("foo:/"), url2("foo://@/?#");
QString query1 = url1.query(), query2 = url2.query();

Both query1 and query2 are empty, but only query1 is null, indicating that the
query was not present. The same applies to userInfo(), host(), and fragment(),
and would apply to authority() if the "@" weren't present.

And you can do:

url1.setQuery("");
url2.setQuery(QString());

to invert the situation.

The other common case for using nulls is in QVariant and that comes from the
QtSql module: all entries are returned as QVariants and they need to support
database tables that don't contain "NOT NULL" (that is, are nullable). So
QVariant can contain a null int that is different from a zero:

QVariant v1, v2{0};
v1.convert(QVariant::Int);
// v1.isNull() == true; v2.isNull() == false

We don't recommend relying on the nullness of a string. We only condone on the
above cases I described, but I wouldn't be surprised to find more uses. And,
trust me, there's quite a lot of headache involved in keeping the nullness of
certain types across transformations:

QString().toUtf8().isNull() == true;
QString("").toUtf8().isNull() == false;

And then there are weird questions like:
- does a null QString compare equal to an empty one? (yes)
- does a null QString startsWith() an empty one? Does the opposite?
- same for endsWith(), contains(), indexOf()
- is QString().left(1) null? How about QString().left(0)?
- and what about QString("hello").left(0)?
- if left, right and mid can return null, can leftRef, rightRef and midRef
(which return QStringRef)?

I don't know the answer to most of those questions, which means I would
recommend no one rely on a specific behaviour.

Most of it is unit-tested so we don't break it:
http://code.woboq.org/qt5/qtbase/tests/auto/corelib/tools/qstring/tst_qstring.cpp.html#_ZN11tst_QString10startsWithEv
http://code.woboq.org/qt5/qtbase/tests/auto/corelib/tools/qstring/tst_qstring.cpp.html#_ZN11tst_QString8endsWithEv
http://code.woboq.org/qt5/qtbase/tests/auto/corelib/tools/qstring/tst_qstring.cpp.html#_ZN11tst_QString4leftEv
http://code.woboq.org/qt5/qtbase/tests/auto/corelib/tools/qstring/tst_qstring.cpp.html#_ZN11tst_QString7leftRefEv

and what isn't tested I'd feel free to change behaviour at any time.

Nevin Liber

unread,
Jan 15, 2014, 2:52:17 PM1/15/14
to std-pr...@isocpp.org
On 15 January 2014 13:45, Thiago Macieira <thi...@macieira.org> wrote:

The null option comes in handy when you need to distinguish a field that might
be present but empty from a field that isn't present. QUrl makes use of that
and the equivalent std::networking::uri proposal just uses
std::optional<String> (String is a template).

That looks like a better solution to me.  null and empty are different conceptually and should not be merged together.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

abolz...@gmail.com

unread,
Jan 15, 2014, 3:27:05 PM1/15/14
to std-pr...@isocpp.org
In general I would say keep the nullness and otherwise treat null and empty strings the same.

But sometimes it is convenient to distinguish between a null and an empty string.

I'd be fine with the current proposal if a named constructor string_view::null() and a member
function string_view::is_null() were added.

Jeffrey Yasskin

unread,
Jan 15, 2014, 3:42:20 PM1/15/14
to std-pr...@isocpp.org
Thanks for the detailed discussion of QT's use. I agree there are some
times when "not present" is different from "empty", but the fact that
in "The far majority of the cases, people just need the empty case"
makes me think that the string_view proposal makes the right choice.
It's true that this means existing classes like QStringRef can't just
become typedefs, but my goal here has been to learn from existing
practice so that the standard can avoid making the same mistakes.

optional<> is always available for cases like SQL and URLs, and using
it means that the unusual case is called out rather than hiding inside
the same type used for the usual case.

Thanks again for trying it out,
Jeffrey

P.S. Other people need to make a stronger argument than "sometimes it
is convenient". Give concrete examples like Thiago did, or I can't do
anything with your email.

abolz...@gmail.com

unread,
Jan 17, 2014, 1:59:51 AM1/17/14
to std-pr...@isocpp.org
Well, I can't make a stronger argument than "convenience" since there is
optional<>, so its always possible to extend string_view with a null value.

In the past I used a string_view-like class to split strings into key-value
pairs. Using the nullness I could easily distinguish "key=" (empty value)
from "key" (null value).
I have also used this for string splitting: A function returning the next
delimiter might return an empty string (an empty delimiter found) or a
null string (no delimiter found).

I just looked into the string_view-like classes mentioned in the proposal
[1,2,3] and the boost implementation [4]. They all allow data() to
return null and pass through the pointer used to construct the string_view.
The original string_ref proposal did exactly the same.

However, N3512 changed this (without mentioning why I think, maybe I
missed something...) and the current proposal states that programmers
used this to signal conditions that differed from empty() and that this was
a source of confusion in interfaces and so is not allowed. Could you
give an example where null-strings resulted in such a confusion? I really
want to avoid making the same mistakes!

Thanks
Alex

Marshall Clow

unread,
Jan 17, 2014, 11:26:50 AM1/17/14
to std-pr...@isocpp.org
A string view references a half-open contiguous range of characters; [ data(), data()+size())

if size() == 0, then the range of characters referenced by the string view is empty, and accessing them (either via dereferencing begin() or data()) is (or should be) undefined behavior. (the same as any other container)

If in your program, you “know” that the pointer returned by data() points to something valid, well, that’s on you to determine.

Code example:

char deref ( string_view sv ) { return *sv.data(); }

string_view sv1;
string_view sv2 { “ABCDE” };
sv2.remove_prefix(5);
assert ( sv1 == sv2 );
deref ( sv1 ); // may or may not blow up in your face
deref ( sv2 ); // this will probably “work” ; but I don’t know what it will return.

— Marshall

Peter Bigot

unread,
Jan 17, 2014, 1:34:37 PM1/17/14
to std-pr...@isocpp.org

While the half-open range argument is true in isolation, it's missing the point that the value of data() is well-known based on how the string_view was created and what operations have been performed on it. Adjusting the string_view to the point where size() is zero does not invalidate data().

Based on how sv2 is constructed in that example, I believe it's guaranteed that deref(sv2) will return the '\0' that terminates the string literal used to initialize sv2, and that doing so does not invoke undefined behavior.  This follows from how sv2 is created and the definition of remove_prefix.

Here's a more relevant example.  There is no need for scare quotes when stating that the programmer knows delim.data() always points to something valid in the test code.

Peter

string_view
comma_delimiter (string_view sv)
{
  string_view::size_type p{sv.find(',')};
  if (string_view::npos == p) {
    return sv.substr(sv.size());
  }
  return sv.substr(p, 1);
}

string_view
case_change_delimiter (string_view sv)
{
  for (string_view::size_type i = 1; i < sv.size(); ++i) {
    string_view::value_type c1 = sv[i-1];
    string_view::value_type c2 = sv[i];
    if (isalpha(c1) && isalpha(c2) && (islower(c1) != islower(c2))) {
      return sv.substr(i, 0);
    }
  }
  return sv.substr(sv.size());
}

TEST(Delimiter, Example)
{
  const std::string base("A,Bcde");
  string_view sv(base);
  ASSERT_EQ(base.data(), sv.data());

  /* Locate the comma as a single-character delimiter */
  string_view delim(comma_delimiter(sv));
  ASSERT_EQ(1, delim.size());
  ASSERT_EQ(1, (delim.data() - sv.data()));
  ASSERT_EQ(",", delim);

  /* Locate the case change as zero-width delimiter */
  delim = case_change_delimiter(sv);
  ASSERT_EQ(0, delim.size());
  ASSERT_EQ(3, (delim.data() - sv.data()));
  ASSERT_EQ("cde", sv.substr(delim.data()-sv.data()));
}

Nevin Liber

unread,
Jan 17, 2014, 2:58:37 PM1/17/14
to std-pr...@isocpp.org
On 17 January 2014 12:34, Peter Bigot <big...@acm.org> wrote:

While the half-open range argument is true in isolation, it's missing the point that the value of data() is well-known based on how the string_view was created and what operations have been performed on it.

How does the callee know how the string_view was created?  If the callee has more preconditions than those enforced by string_view, you might be better off using a separate type.  If the callee "knows" how it was created, why use a string_view at all?
 
Adjusting the string_view to the point where size() is zero does not invalidate data().

Based on how sv2 is constructed in that example, I believe it's guaranteed that deref(sv2) will return the '\0' that terminates the string literal used to initialize sv2, and that doing so does not invoke undefined behavior. 

What compilers are allowed to assume from the specification of the standard library is an interesting question... :-)

Take for example:

std::vector<std::string> v(1);
auto p1 = v.data();
v.pop_back();

Can I legally do a placement new of a std::string into the space pointed to by p1?  Beats me.

Jeffrey Yasskin

unread,
Jan 17, 2014, 3:11:33 PM1/17/14
to std-pr...@isocpp.org
The string_view spec guarantees the value data() returns after
substr() and remove_prefix() operations, so the 'sv2' case is actually
guaranteed to return the '\0' at the end of "ABCDE". However,
*sv2.begin() has undefined behavior, because it goes through the
iterator type. The use I intended for this was the ability to write
things like:

const char str[] = "ABCDE";
string_view sv{str};
consume_letters(sv);
if (sv.data() == std::end(str))
finished();

but I don't see a way to enable that without letting people also
dereference data() when the string_view is empty but they know
something about the value. (This assumes that knowing ptr1==ptr2 means
that *ptr1 and *ptr2 are equally valid, which isn't the case for
things like realloc(), but I don't want to put any such corner cases
in the string_view spec if I can help it.)

I do see that this is a somewhat awkward middle ground, so I'm
definitely going to ask for a straw-poll in Issaquah about whether the
LWG wants data()==nullptr to be allowed. If it is, I intend to also
remove the precondition on the string_view(data, size) to let people
explicitly put nullptr in. I'm not sure if string_view() should be
guaranteed to produce data()==nullptr if that change happens.

It'll also be possible to change this after the first version of the
TS ships with string_view inside, if field experience indicates that
it's the wrong choice.

Nevin Liber

unread,
Jan 17, 2014, 3:18:44 PM1/17/14
to std-pr...@isocpp.org
On 17 January 2014 14:11, Jeffrey Yasskin <jyas...@google.com> wrote:
If it is, I intend to also
remove the precondition on the string_view(data, size) to let people
explicitly put nullptr in.

I'd like to, mainly because it makes it easy to create a string_view from a vector<char> by using v.data() and v.size().  Having to special case a check for nullptr (such as in string::string(const char*, size_t)) for what is otherwise a perfectly valid 0-sized range is just annoying.

Jeffrey Yasskin

unread,
Jan 17, 2014, 3:22:21 PM1/17/14
to std-pr...@isocpp.org
On Fri, Jan 17, 2014 at 12:18 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On 17 January 2014 14:11, Jeffrey Yasskin <jyas...@google.com> wrote:
>>
>> If it is, I intend to also
>> remove the precondition on the string_view(data, size) to let people
>> explicitly put nullptr in.
>
>
> I'd like to, mainly because it makes it easy to create a string_view from a
> vector<char> by using v.data() and v.size(). Having to special case a check
> for nullptr (such as in string::string(const char*, size_t)) for what is
> otherwise a perfectly valid 0-sized range is just annoying.

Ah, vector.data() doesn't guarantee non-null results. :( That's a good argument.

Peter Bigot

unread,
Jan 17, 2014, 4:24:19 PM1/17/14
to std-pr...@isocpp.org
On Fri, Jan 17, 2014 at 1:58 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
On 17 January 2014 12:34, Peter Bigot <big...@acm.org> wrote:

While the half-open range argument is true in isolation, it's missing the point that the value of data() is well-known based on how the string_view was created and what operations have been performed on it.

How does the callee know how the string_view was created?  If the callee has more preconditions than those enforced by string_view, you might be better off using a separate type.  If the callee "knows" how it was created, why use a string_view at all?

I don't understand this argument.  Where does a callee enter the picture?  Both Marshall's example and mine are self-contained: we can see how the instances were created, we have N3762 which specifies what data() means throughout the operations that were invoked, and from that we know that data() points within a range that is still valid in the "caller" context where it's used.
 
Based on how sv2 is constructed in that example, I believe it's guaranteed that deref(sv2) will return the '\0' that terminates the string literal used to initialize sv2, and that doing so does not invoke undefined behavior. 

What compilers are allowed to assume from the specification of the standard library is an interesting question... :-)

Take for example:

std::vector<std::string> v(1);
auto p1 = v.data();
v.pop_back();

Can I legally do a placement new of a std::string into the space pointed to by p1?  Beats me.

I'd guess that'd depend in part on whether pop_back() might invalidate data(), which I'm not motivated to research.  For string_view I think it's a lot simpler:

When a string_view is first created, it references a valid range [data(), data()+size()) that itself is within an external object that the string_view references.  I'm unaware of any legal string_view operation (except clear()) that can move data() outside that initially valid range: every mutating operation leaves the string_view instance's range unchanged or reduces it within existing bounds.  The validity of the original range derives from an object other than the string view, so as long as that range is not invalidated (which in turn I think would necessarily involve a data race), it should be possible to continue use the pointers that lie within it, right?

Peter

Peter Bigot

unread,
Jan 17, 2014, 4:25:00 PM1/17/14
to std-pr...@isocpp.org
First, is that true?  [data(), data()+size()) must be a valid range, but I've been unable to find a definition of "valid range" in the standard so I could understand whether an empty valid range permits its base to be a null pointer.

Second, even if so std::string.data() does guarantee a non-null result, doesn't it?  In that case it's also required that data()+i == &operator[](i) for i in [0, size()] which led me to believe data() could not be null, since the equality must hold for i==0 and operator[] must return a reference to an object (hence the underlying pointer cannot be null).

I've been swayed by the argument that there's no such thing as a null std::string, so there should be no such thing as a null std::string_view.  This is easily assured by disallowing std::string_view(s, n) for a null s just as std::string(s, n) is disallowed for null s.

I don't see it as unreasonable to require a check for that situation before creating a std::string_view in situations where a char* pointer comes from a source that might produce a null pointer.  As Nevin points out, you have to do it for std::string too.

Personally, I'd find it more annoying to have to constantly check in every utility function that operates on string_view instances whether an input sv.data() is null in addition to whether sv.size() is zero.  I understood std::string_view to serve the role of a reference type compatible with both character arrays and std::strings, so just as with other reference types it shouldn't be possible for the underlying pointer to be null.

My opinion, at any rate.

Peter

Marshall Clow

unread,
Jan 17, 2014, 4:48:25 PM1/17/14
to std-pr...@isocpp.org

On Jan 17, 2014, at 10:34 AM, Peter Bigot <big...@acm.org> wrote:



On Friday, January 17, 2014 10:26:50 AM UTC-6, Marshall wrote:
On Jan 15, 2014, at 4:28 AM, Olaf van der Spek <olafv...@gmail.com> wrote:

On Saturday, December 28, 2013 6:01:34 AM UTC+1, Marshall wrote:
if sv.size () == 0 or sv.empty () [same thing], there is nothing that you can do with sv.data()
(except compare it to null, and that doesn’t tell you anything useful).

Are you sure?
IMO an empty string_view could still point into a string and I should be able to use that pointer to for example construct a new string_view.

A string view references a half-open contiguous range of characters; [ data(), data()+size())

if size() == 0, then the range of characters referenced by the string view is empty, and accessing them (either via dereferencing begin() or data()) is (or should be) undefined behavior. (the same as any other container)

If in your program, you “know” that the pointer returned by data() points to something valid, well, that’s on you to determine.

Code example:

char deref ( string_view sv ) { return *sv.data(); }

string_view sv1;
string_view sv2 { “ABCDE” };
sv2.remove_prefix(5);
assert ( sv1 == sv2 );
deref ( sv1 ); // may or may not blow up in your face
deref ( sv2 ); // this will probably “work” ; but I don’t know what it will return.

— Marshall


While the half-open range argument is true in isolation, it's missing the point that the value of data() is well-known based on how the string_view was created and what operations have been performed on it. Adjusting the string_view to the point where size() is zero does not invalidate data().

Peter —

I think you’re trying to add additional meaning to string_view. 
It’s a reference to a sequence of characters.
When the size == 0, then it’s a reference to NO characters. 

Saying “all, yes, but we *know* what it points to, and so, really, it refers to a character” is widening the definition of string_view.
Unnecessarily so, in my belief.

— Marshall

Marshall Clow

unread,
Jan 17, 2014, 4:55:41 PM1/17/14
to std-pr...@isocpp.org
On Jan 17, 2014, at 1:25 PM, Peter Bigot <big...@acm.org> wrote:

On Fri, Jan 17, 2014 at 2:22 PM, Jeffrey Yasskin <jyas...@google.com> wrote:
On Fri, Jan 17, 2014 at 12:18 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On 17 January 2014 14:11, Jeffrey Yasskin <jyas...@google.com> wrote:
>>
>> If it is, I intend to also
>> remove the precondition on the string_view(data, size) to let people
>> explicitly put nullptr in.
>
>
> I'd like to, mainly because it makes it easy to create a string_view from a
> vector<char> by using v.data() and v.size().  Having to special case a check
> for nullptr (such as in string::string(const char*, size_t)) for what is
> otherwise a perfectly valid 0-sized range is just annoying.

Ah, vector.data() doesn't guarantee non-null results. :( That's a good argument.

First, is that true?  [data(), data()+size()) must be a valid range, but I've been unable to find a definition of "valid range" in the standard so I could understand whether an empty valid range permits its base to be a null pointer.

Second, even if so std::string.data() does guarantee a non-null result, doesn't it?  In that case it's also required that data()+i == &operator[](i) for i in [0, size()] which led me to believe data() could not be null, since the equality must hold for i==0 and operator[] must return a reference to an object (hence the underlying pointer cannot be null).

I think you mean, [0,i), not [0,i]

I've been swayed by the argument that there's no such thing as a null std::string, so there should be no such thing as a null std::string_view.  This is easily assured by disallowing std::string_view(s, n) for a null s just as std::string(s, n) is disallowed for null s.

The difference here is that string manages its own storage, while string_view does not.

I don't see it as unreasonable to require a check for that situation before creating a std::string_view in situations where a char* pointer comes from a source that might produce a null pointer.  As Nevin points out, you have to do it for std::string too.

Maybe I missed something earlier in the discussion, but I believe that
std::string foo { nullptr, 0 };

is a perfectly legal way of creating an empty string.

Section 21.4.2 says only:
basic_string(const charT* s, size_type n,
             const Allocator& a = Allocator());
Requires: s points to an array of at least n elements of charT.


Personally, I'd find it more annoying to have to constantly check in every utility function that operates on string_view instances whether an input sv.data() is null in addition to whether sv.size() is zero. 

I’m afraid I don’t see why you would do this. Can you give an example, please?

— Marshall

Daniel Krügler

unread,
Jan 17, 2014, 5:04:51 PM1/17/14
to std-pr...@isocpp.org
2014/1/17 Marshall Clow <mclow...@gmail.com>:
>
> Maybe I missed something earlier in the discussion, but I believe that
> std::string foo { nullptr, 0 };
>
> is a perfectly legal way of creating an empty string.

No, it is not, it undefined behaviour.

> Section 21.4.2 says only:
> basic_string(const charT* s, size_type n,
> const Allocator& a = Allocator());
> Requires: s points to an array of at least n elements of charT.

Exactly this requirement makes above code undefined. There exists an
explicit rule in [res.on.arguments] p1 b2 for this kind of wording:

"If a function argument is described as being an array, the pointer
actually passed to the function shall
have a value such that all address computations and accesses to
objects (that would be valid if the
pointer did point to the first element of such an array) are in fact valid."

- Daniel

Jeffrey Yasskin

unread,
Jan 17, 2014, 5:10:41 PM1/17/14
to std-pr...@isocpp.org
Agreed. To mitigate this, if string_view::data() could return nullptr, I believe we could still create strings using std::string(null_sv.begin(), null_sv.end()).

Billy O'Neal

unread,
Jan 17, 2014, 5:17:53 PM1/17/14
to std-proposals
>"If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid."

So to clarify, if it were an array of size 0, you would be allowed to increment the pointer by 1. (You would not be allowed to dereference the pointer) You are never allowed to increment a null pointer, so data() returning nullptr does not meet this requirement.
On Fri, Jan 17, 2014 at 2:04 PM, Daniel Krügler <daniel....@gmail.com> wrote:

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Peter Bigot

unread,
Jan 17, 2014, 5:43:09 PM1/17/14
to std-pr...@isocpp.org


On Friday, January 17, 2014 3:55:41 PM UTC-6, Marshall wrote:

On Jan 17, 2014, at 1:25 PM, Peter Bigot <big...@acm.org> wrote:
On Fri, Jan 17, 2014 at 2:22 PM, Jeffrey Yasskin <jyas...@google.com> wrote:
On Fri, Jan 17, 2014 at 12:18 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On 17 January 2014 14:11, Jeffrey Yasskin <jyas...@google.com> wrote:
>>
>> If it is, I intend to also
>> remove the precondition on the string_view(data, size) to let people
>> explicitly put nullptr in.
>
>
> I'd like to, mainly because it makes it easy to create a string_view from a
> vector<char> by using v.data() and v.size().  Having to special case a check
> for nullptr (such as in string::string(const char*, size_t)) for what is
> otherwise a perfectly valid 0-sized range is just annoying.

Ah, vector.data() doesn't guarantee non-null results. :( That's a good argument.

First, is that true?  [data(), data()+size()) must be a valid range, but I've been unable to find a definition of "valid range" in the standard so I could understand whether an empty valid range permits its base to be a null pointer.

Second, even if so std::string.data() does guarantee a non-null result, doesn't it?  In that case it's also required that data()+i == &operator[](i) for i in [0, size()] which led me to believe data() could not be null, since the equality must hold for i==0 and operator[] must return a reference to an object (hence the underlying pointer cannot be null).

I think you mean, [0,i), not [0,i]


No, I mean i in [0,size()] which in this case is [0,0].  21.4.7.1 paragraph 1.

I've been swayed by the argument that there's no such thing as a null std::string, so there should be no such thing as a null std::string_view.  This is easily assured by disallowing std::string_view(s, n) for a null s just as std::string(s, n) is disallowed for null s.

The difference here is that string manages its own storage, while string_view does not.

As others have confirmed, std::string still doesn't permit a null pointer for data(), so this isn't compelling to me.

Peter

gorni...@gmail.com

unread,
Jan 17, 2014, 6:10:41 PM1/17/14
to std-pr...@isocpp.org
What if string_view provides a specialization for optional<string_view<>>?

That way optional<string_view> can be implemented very efficiently by checking if sv.begin() == nullptr and does not consume any extra storage. An extra benefit is that we have clear distinction between nullable types (optional<T>) and regular types, int, vector, string_view, etc.

Will this solution satisfy the original poster?

Marshall Clow

unread,
Jan 17, 2014, 6:51:34 PM1/17/14
to std-pr...@isocpp.org
On Jan 17, 2014, at 2:10 PM, Jeffrey Yasskin <jyas...@google.com> wrote:

Agreed. To mitigate this, if string_view::data() could return nullptr, I believe we could still create strings using std::string(null_sv.begin(), null_sv.end()).

I thought that’s why we had to_string().

— Marshall


Jeffrey Yasskin

unread,
Jan 17, 2014, 7:06:18 PM1/17/14
to std-pr...@isocpp.org
We have to_string mostly for convenience, I think, although it is
another way to get from a null string_view to a std::string.

Bengt Gustafsson

unread,
Jan 17, 2014, 8:00:12 PM1/17/14
to std-pr...@isocpp.org
I have not followed this entire thread, instead I hacked up a little implementation which I used in implementing something for another thread on an improved "printf" functionality. What I noticed was:

- I had use for a C array of string_view to keep the parts of the original string that fell between each data inserted. This means that I must be able to default construct string_view as there is no other way to construct an array element.
- It is logical to do this default construction by setting m_begin and m_end to nullptr, or m_begin to nullptr and m_size to 0.
- It must be allowed to take a default constructed string_view as the range of a range based for statement __or__ to check if the range is in default constucted state using something like os_null().

- both of these methods are impossible if the presumptions stated above "begin() should never return nullptr" and "we don't need a special is_null()". Well, not entirely. Of course you could create some bogus object and use its address instead of nullptr.

Someone stated above that the \0 at the end of the string should be accessible even if the string_view is empty. This is clearly impossible as a string_view may be constructed as a part of a string which continues on. Thus there is no \0 at *end() anytime.

This said I think that the very simple solution is to just set the pointer(s) to nullptr in the default ctor, don't implement any special is_null but return "null" iterators from begin and end in default constructed string_view, which of course results in zero length if they are subtracted. I don't really see a problem here, isn't this how all containers do it. For instance a newly default constructed vector? Is the issue whether you can rely on this to know wheter the string_view (or vector) has been non-empty before? If so, follow the example of vector and give the same promises on the standard level!

Bengt Gustafsson

unread,
Jan 17, 2014, 8:12:23 PM1/17/14
to std-pr...@isocpp.org
Someone said that constructing a string from (nullpltr, 0) is not allowed, and that is used as an argument here. How about looking into why this is not allowed first? I can't see any practical reason for string nor for string_view.

Someone else claimed that if nullptr could be returned from data() then you'd have to test for this in _addition_ to testing for size() == 0. I can't understand ahy as if size() is 0 there is exactly 0 offsets at which it is ok to dereference the pointer returned from data. This is more information you get from testing whether the returned pointer is nullptr, i.e. even if data() returned non-null there may be no elements to access, but if size() returns 0 there can be _no_ elements whether the pointer is null or not.

Miro Knejp

unread,
Jan 17, 2014, 9:13:10 PM1/17/14
to std-pr...@isocpp.org
>
> - both of these methods are impossible if the presumptions stated
> above "begin() should never return nullptr" and "we don't need a
> special is_null()". Well, not entirely. Of course you could create
> some bogus object and use its address instead of nullptr.
>
Why not set the string_view to "" in the default constructor?


Peter Bigot

unread,
Jan 17, 2014, 9:15:43 PM1/17/14
to std-pr...@isocpp.org
On Friday, January 17, 2014 7:00:12 PM UTC-6, Bengt Gustafsson wrote:
I have not followed this entire thread, instead I hacked up a little implementation which I used in implementing something for another thread on an improved "printf" functionality. What I noticed was:

- I had use for a C array of string_view to keep the parts of the original string that fell between each data inserted. This means that I must be able to default construct string_view as there is no other way to construct an array element.

Sure, that's one approach.
 
- It is logical to do this default construction by setting m_begin and m_end to nullptr, or m_begin to nullptr and m_size to 0.

Logical, perhaps; necessary, no.  It's equally or more logical to treat it the same as a default-constructed std::string.  It was tentatively agreed much earlier in this or another thread that doing so doesn't even require allocating any memory to hold the "referenced" empty string, while still preserving the requirement that data() have a value distinct from nullptr.
 
- It must be allowed to take a default constructed string_view as the range of a range based for statement

Sure.
 
__or__ to check if the range is in default constucted state using something like os_null().

Why?  Is there something specific to ranges that makes a default-constructed range significant?


- both of these methods are impossible if the presumptions stated above "begin() should never return nullptr" and "we don't need a special is_null()". Well, not entirely. Of course you could create some bogus object and use its address instead of nullptr.

Someone stated above that the \0 at the end of the string should be accessible even if the string_view is empty. This is clearly impossible as a string_view may be constructed as a part of a string which continues on. Thus there is no \0 at *end() anytime.

This may be referring to what I said about deref(sv2) in Marshall's example, but if so it isn't an accurate summary.  What I said is that if a string_view is constructed from a string literal (which is necessarily nul-terminated) and the default size calculated from traits::length(str), then even though accessing the terminating nul from the string_view is not legal the nul is still there and still legally accessible by using a char* pointer which is known to point into that string literal.

Here's another example which avoids the nul and any assumptions about persistence of the string literal:

  const std::string base("ABCDE");
  string_view sv(base);
  ASSERT_EQ(base.data(), sv.data());
  sv.remove_prefix(2);
  ASSERT_EQ(base.data()+2, sv.data());
  sv.remove_suffix(3);
  ASSERT_EQ(0, sv.size());
  ASSERT_EQ(base.data()+2, sv.data());
  ASSERT_EQ('C', *sv.data());

sv in isolation only guarantees that [sv.data(), sv.data()+0) is a valid range and specifically does not sanction the expression *sv.data().

But sv.data() is value- and type-equivalent to base.data()+2 by the specification for string_view.

And accessing *(base.data()+2) is perfectly legitimate.

So why can't I substitute a value- and type-equivalent subexpression into that dereference expression?

I'm hoping that somebody will explain why this is an invalid use of std::string_view as currently described in https://rawgithub.com/google/cxx-std-draft/string-ref-paper/string_view.html.

As for why I think allowing null sv.data() creates a new requirement to check for null sv.data():  If you accept that sv.data() can be useful even though sv.size() is zero---perhaps for nothing more than to calculate an offset into the original string with sv.data()-base.data()---then it becomes necessary to know that sv.data() is not null regardless of whether sv.size() is zero so you don't perform pointer arithmetic on a null pointer.  Not always, but certainly if you're using sv.data() this way (which I have found very convenient).

My preference is to impose the non-null requirement on the string_view type just as it is imposed on std::string, rather than have to check it manually.   (AFAIK std::string has never permitted its data() function to return a null pointer, so I don't see a lot of value in reviewing that decision.)

Peter

Marshall Clow

unread,
Jan 17, 2014, 11:19:07 PM1/17/14
to std-pr...@isocpp.org
On Jan 17, 2014, at 6:13 PM, Miro Knejp <mi...@knejp.de> wrote:

>>
>> - both of these methods are impossible if the presumptions stated above "begin() should never return nullptr" and "we don't need a special is_null()". Well, not entirely. Of course you could create some bogus object and use its address instead of nullptr.
>>
> Why not set the string_view to "" in the default constructor?

I believe that this is the current proposal.

However, this requires creating a global variable (which some implementations will put in the code segment) for each default constructed string_view (yes, some implementations will merge them together in the same translation unit).

— Marshall


abolz...@gmail.com

unread,
Jan 18, 2014, 10:30:17 AM1/18/14
to std-pr...@isocpp.org, gorni...@gmail.com
An (unspecialized) optional<string_view> is not always an option. E.g. I can't simply change a function like

void f(char const* s) {}

to

void f(optional<string_view> s) {}

because I can no longer write f("hello"). So the specialization should be implicitly constructible from a char array.
And then an optional<string_view> constructed from a nullptr should be an uninitialized optional since there
are no preconditions on f, I should be able to write f(getenv("not-an-env-var")).

abolz...@gmail.com

unread,
Jan 18, 2014, 10:36:17 AM1/18/14
to std-pr...@isocpp.org
For the current proposal this should work. Since you can't put a nullptr in you can't get a nullptr out and sv.data() - base.data() is defined.
If a string_view would allow null data() then (assuming a string_view is constructible from a nullptr, too) this would be ok too, since no
operation on a string_view changes the data pointer and subtracting a nullptr from a nullptr is ok too.
If a stringview were allowed to be constructed from a nullptr, but data() always returns non-null, then one actually has to check for base.data() != null., 

Peter Bigot

unread,
Jan 19, 2014, 10:55:32 AM1/19/14
to std-pr...@isocpp.org

I thought somebody had proposed a "will-probably-work" solution involving casts of non-zero values to a pointer to avoid the global variable, but here's another solution I believe is safe and well-defined:

Nothing in the current spec requires that the data() function return the same value for distinct default-constructed string_view instances.  So use the following data members:

  const charT * m_ptr;
  union {
     size_t m_len;
     charT m_nul;
  };

and have the default constructor set m_ptr to &m_nul and m_len to 0.  The result is a (unique) empty string reference.

I've tested this by modifying Boost's implementation and it works fine.  Note that only the m_len data member is actually used and is always zero for the default-constructed value.  There's no issue about accessing the other union member because when size() is zero you can't legitimately dereference data() unless you know from construction that it's pointing into a non-empty range (and in this case it doesn't).

Peter

Magnus Fromreide

unread,
Jan 19, 2014, 12:44:34 PM1/19/14
to std-pr...@isocpp.org
On Sun, Jan 19, 2014 at 07:55:32AM -0800, Peter Bigot wrote:
> On Friday, January 17, 2014 10:19:07 PM UTC-6, Marshall wrote:
> >
> > On Jan 17, 2014, at 6:13 PM, Miro Knejp <mi...@knejp.de <javascript:>>
I have thought in a similar direction, but the problem is if you have two
string_view's, s1 and s2. Assume that s1 i empty, does the statement
s2 = s1; imply that s2.data() == s1.data()?

Then what happens if s1 is deallocated and it's memory is returned to the
system, won't s2.m_ptr then hold an illegal pointer value, one of those
where even loading it could trigger a hardware trap on some architectures.

/MF

Peter Bigot

unread,
Jan 19, 2014, 1:31:19 PM1/19/14
to std-pr...@isocpp.org
In my approach I detect in the assignment operator and copy constructor whether the RHS is default-constructed, and if so invoke clear() on the LHS/new instance.  So in all cases either the reference is to a sequence outside the instance, or it's an empty default-constructed (cleared) instance where data() is a non-dereferenceable pointer to an internal charT object: no cross-object pointers.

Now that you point it out, this does result in behavior non-conformant with the current draft specification for the copy constructor and operator= which are specified as =default.

On further reflection it'd better to store a null pointer in m_ptr to represent a default-constructed value but to return &m_nul from data() when m_ptr is null.  Then the default implementations are retained.  This still means that data() will be always be non-null, but sv1.data() will not compare equal to sv2.data() if one or both of the instances have been cleared/default-constructed.  (sv1 == sv2 will still hold, of course.)

Does that address the objection?

Peter

gorni...@gmail.com

unread,
Jan 19, 2014, 1:31:58 PM1/19/14
to std-pr...@isocpp.org, gorni...@gmail.com
I did not propose using unspecialized optional. I suggested to add specialization of optional<basic_string_view<...>> to string_view header. Thus any use of optional<string_view> will pick up specialized version that can have constructors that you desire.

I think it is beneficial to not to roll nullability into string_view. optional is The Way to make type nullable and where possible, one can use specialized optional to add additional constructors and/or provide more efficient implementation than default optional.

Paul A. Tessier

unread,
Jan 19, 2014, 2:00:43 PM1/19/14
to std-pr...@isocpp.org
If s1 and s2 are empty, only access to the metadata is rational. Using
operator[], front(), back(), etc. will produce undefined behavior.
s1.data() and s2.data() are irrelevant because, it points to the
beginning of a range of zero length. One could just as easily set
data() to null or some random value, when the string_view becomes empty
but, such actions are unneeded. The equality comparison is based on the
equality of the two ordered sets of characters from s1 and s2. All
empty views are equally empty and does not imply ( s2.data() ==
s1.data() ), just as ( s1 == s2 ) does not imply that ( &s1 == &s2 ).

If &s1 is deallocated and falls into a protected segment, any access to
it will trap, just like any other object. If both s1 and s2 where
referencing a string that was deallocated before they both became empty
then, both would trap, unless data() is randomized or set to some safe
location as I mentioned above but, this would be pointless as accessing
the referenced string of an empty view is undefined behavior.


To me the prevention of null being returned by data() is just an
artifact inherited from std::string. It seem like an early and
incomplete error check. You could also argue that all unreachable
segments be restricted.

If string_view was only a range of a std:string, forwarding all
std::string's warts through the interface would not be completely
unexpected. Especially when the wrapper is such a lightweight class.
Yet, string_view serves a wider purpose therefore, those warts should
ignored. I argue that std::string's interface should be relaxed and, not
that string_view's be restricted.

Consider std::array< char, 0 >. It's data() will return null, and the
rest of it's members behave as rationally. I see no reason for
string_view not to behave in a similar fashion. While std:string may
never return null for it's data(), it is inconsequential as string_view
maybe constructed from other sources.

Having a is_null() seems excessive. data() should be allowed to be null,
as this seems the simplest solution. Following the behavior of
std::array< char, 0 > seems to solve many of the other problems in the
interface being unable to handle data() and begin() being null.

While string_view is modeled after std::string's interface, the newer
std::array interface seems more flexible and also solves the default
empty initialization problem. Besides, any object that behaves
rationally when zeroed out, say by memset, is a plus.


Magnus Fromreide

unread,
Jan 19, 2014, 2:22:40 PM1/19/14
to std-pr...@isocpp.org
The check at copy time seems to solve the problem.

The use of null as a flag value seems to interfere with the "allow null
string_views" and "have an optimized optional<string_view>" proposals
as both allow null's for other purposes.

/MF

Alexander Bolz

unread,
Jan 19, 2014, 3:31:45 PM1/19/14
to std-pr...@isocpp.org, gorni...@gmail.com
Am Sonntag, 19. Januar 2014 19:31:58 UTC+1 schrieb gorni...@gmail.com:
I did not propose using unspecialized optional. I suggested to add specialization of optional<basic_string_view<...>> to string_view header. Thus any use of optional<string_view> will pick up specialized version that can have constructors that you desire.

Ok. But an unspecialized optional<string_view> is very different from a nullable string_view.
And a specialized optional<string_view> would be very different from an optional<T>.

I need to make sure that the specialized version is constructible from a nullptr and
and then constructing a disengaged optional. Then what should

template <class... Args> constexpr explicit optional(in_place_t, Args&&... args);
Postconditions: *this is engaged.

do?

What should the relational operators of this specialization look like? I would like to be
able to write something like

void f(optional<string_view> s)
{
  if (s == "abc") {}
  if (s < "abc") {}
}

The default relational operators explicitly handle the nullness of the optional. I don't want
that. I'd expect a disengaged optional<string_view> is like an empty string. So I write specialized
versions for these relational operators which do exactly that. That would be very confusing for
general algorithms which just compare optional<T>'s.

Alexander Bolz

unread,
Jan 19, 2014, 3:37:59 PM1/19/14
to std-pr...@isocpp.org
This is not correct. I just looked it up in N3797:
23.3.2.8. says "begin() == end() == unique value. The return value of data() is unspecified"

Nevin Liber

unread,
Jan 19, 2014, 3:37:28 PM1/19/14
to std-pr...@isocpp.org
On 19 January 2014 14:31, Alexander Bolz <abolz...@gmail.com> wrote:
Ok. But an unspecialized optional<string_view> is very different from a nullable string_view.
And a specialized optional<string_view> would be very different from an optional<T>.

Which would be enough for me to vote strongly against this.  Learn from vector<bool>... (bad name for a useful class)

--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Miro Knejp

unread,
Jan 19, 2014, 4:08:39 PM1/19/14
to std-pr...@isocpp.org

Ok. But an unspecialized optional<string_view> is very different from a nullable string_view.
And a specialized optional<string_view> would be very different from an optional<T>.
What is the difference between a "nullable string_view" and an "optional<string_view>"? Not in terms of syntax but semantics. That's what must be the same. Both give you means of marking the string_view as *null* and both cause errors/UB if accessing such a string.

If done properly the only difference between optional<T> and optional<string_view> is sizeof() by optimizing away a bool member and use .data() == nullptr for determining whether the optional is engaged or not. As long as string_view does not accept null this distinction can be made.

I need to make sure that the specialized version is constructible from a nullptr and
and then constructing a disengaged optional.
Do the null check before constructing the optional. If the string_view constructor does not accept nullptr neither does optional<string_view>(in_place_t, ...). Just like for any other T.

What should the relational operators of this specialization look like? I would like to be
able to write something like

void f(optional<string_view> s)
{
  if (s == "abc") {}
  if (s < "abc") {}
}
Since it's an optional<T> you still have to use it like an optional<T>. If "s" is disengaged both comparsions return false, which is correct and I would expect the same for a null string_view. Just like with any other T.

Instead of using "if(s.data())" use "if(s)" or "if(!s)" to check for null.

I don't see why there should be a difference between specialized and unspecialized optional<string_view>. I also don't see how it is different from a "null string_view". As far as I can see the semantics are equivalent. As long as string_view does not accept nullptr the specialization is nothing but a sizeof() optimizing implementation detail.

Nevin Liber

unread,
Jan 19, 2014, 4:15:07 PM1/19/14
to std-pr...@isocpp.org
On 19 January 2014 15:08, Miro Knejp <mi...@knejp.de> wrote:

I don't see why there should be a difference between specialized and unspecialized optional<string_view>. I also don't see how it is different from a "null string_view". As far as I can see the semantics are equivalent. As long as string_view does not accept nullptr the specialization is nothing but a sizeof() optimizing implementation detail.

In which case it is just a Quality of Implementation issue and need not be addressed by the Standard.

Miro Knejp

unread,
Jan 19, 2014, 4:23:17 PM1/19/14
to std-pr...@isocpp.org

Am 19.01.2014 22:15, schrieb Nevin Liber:
On 19 January 2014 15:08, Miro Knejp <mi...@knejp.de> wrote:

I don't see why there should be a difference between specialized and unspecialized optional<string_view>. I also don't see how it is different from a "null string_view". As far as I can see the semantics are equivalent. As long as string_view does not accept nullptr the specialization is nothing but a sizeof() optimizing implementation detail.

In which case it is just a Quality of Implementation issue and need not be addressed by the Standard.
Yes, that was basically my point.

Paul A. Tessier

unread,
Jan 19, 2014, 4:52:29 PM1/19/14
to std-pr...@isocpp.org
--
 
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Ah, that is correct, of which null is a unique value.  My original point still stands though.  Preventing null as an allowable value only adds complexity, and allowing null can be handled in a rational fashion as is demonstrated by new interfaces.

Alexander Bolz

unread,
Jan 19, 2014, 4:56:45 PM1/19/14
to std-pr...@isocpp.org
2014/1/19 Miro Knejp <mi...@knejp.de>

Ok. But an unspecialized optional<string_view> is very different from a nullable string_view.
And a specialized optional<string_view> would be very different from an optional<T>.
What is the difference between a "nullable string_view" and an "optional<string_view>"? Not in terms of syntax but semantics. That's what must be the same. Both give you means of marking the string_view as *null* and both cause errors/UB if accessing such a string.

Yes, and that's almost the only thing both have in common. With "nullable" I mean that it
should be possible to construct a string_view from a nullptr and -- for consistency -- data()
should be allowed to return nullptr. Any operation on string_view would assume that a null
string_view is equal to an empty string_view.

The current proposal requires data != null. Ok. But IMHO there is no reason for this. It just
restricts the cases where a string_view would be useful.

Now the caller has to do a null check even if a function could handle null pointers.
 

If done properly the only difference between optional<T> and optional<string_view> is sizeof() by optimizing away a bool member and use .data() == nullptr for determining whether the optional is engaged or not. As long as string_view does not accept null this distinction can be made.

If data() == nullptr ever returns true, string_view is already nullable.

 

I need to make sure that the specialized version is constructible from a nullptr and
and then constructing a disengaged optional.
Do the null check before constructing the optional. If the string_view constructor does not accept nullptr neither does optional<string_view>(in_place_t, ...). Just like for any other T.

I just don't want to do that. I have a function like

void f(char const* s) {
  if (s == nullptr)
    dothis();
  else
    dothat();
}

and I want to replace char const* with string_view to make it accept a std::string so I can save some std::string.c_str()
calls.

 

What should the relational operators of this specialization look like? I would like to be
able to write something like

void f(optional<string_view> s)
{
  if (s == "abc") {}
  if (s < "abc") {}
}
Since it's an optional<T> you still have to use it like an optional<T>. If "s" is disengaged both comparsions return false, which is correct and I would expect the same for a null string_view. Just like with any other T.

Instead of using "if(s.data())" use "if(s)" or "if(!s)" to check for null.

I don't see why there should be a difference between specialized and unspecialized optional<string_view>. I also don't see how it is different from a "null string_view". As far as I can see the semantics are equivalent. As long as string_view does not accept nullptr the specialization is nothing but a sizeof() optimizing implementation detail.

--
 
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.



--
Alex

Miro Knejp

unread,
Jan 19, 2014, 5:53:53 PM1/19/14
to std-pr...@isocpp.org


I just don't want to do that. I have a function like

void f(char const* s) {
  if (s == nullptr)
    dothis();
  else
    dothat();
}

and I want to replace char const* with string_view to make it accept a std::string so I can save some std::string.c_str()
calls.

Saving some string.c_str() calls isn't a very convincing argument. Not for me at least. When I have to chose between safety/robustness or convenience the former wins. Knowing a std::string_view cannot be null just as std::string cannot be is a valuable quality of life improvement. I would assume the number of times one needs a nullable string is only a fraction of the actual uses of string_view.

string_view has as far as I can remember always been described as a drop-in replacement for string. Discussing the possibility of a nullable std::string_view should equally raise a discussion about the possibility of a nullable std::string. If it is not possible to replace functions taking (const string&) with (string_view) and have the same semantics it's far less usefull and only works against the trend of making things safer and more intuitive. (const char*) methods should always ring alarm bells.

gorni...@gmail.com

unread,
Jan 19, 2014, 5:55:24 PM1/19/14
to std-pr...@isocpp.org
Alexander:

The current proposal requires data != null. Ok. But IMHO there is no reason for this. It just
restricts the cases where a string_view would be useful.

I see, would replacing

constexpr basic_string_view(const charT* str, size_type len);

Requires: str is not a null pointer and [str,str + len) is a valid range.

with

constexpr basic_string_view(const charT* str, size_type len);

Requires: str is not a null pointer and [str,str + len) is a valid range
or str == nullptr and len == 0.

address your concerns?



Olaf van der Spek

unread,
Jan 19, 2014, 7:11:32 PM1/19/14
to std-pr...@isocpp.org
Op zondag 19 januari 2014 19:31:19 UTC+1 schreef Peter Bigot:
On further reflection it'd better to store a null pointer in m_ptr to represent a default-constructed value but to return &m_nul from data() when m_ptr is null.  

Having such conditionals in data() doesn't seem like a good idea. data() should be as simple (and as fast) as possible.

constexpr basic_string_view(const charT* str, size_type len);
> Requires: str is not a null pointer and [str,str + len) is a valid range. 

What's the rationale for the not-null requirement? I don't like it, one should be able to construct a string_view from a NULL, 0 pair.
vector::data() could return NULL (AFAIK), so that'd mean I couldn't construct a string_view from v.data(), v.size()

Alexander Bolz

unread,
Jan 19, 2014, 7:35:11 PM1/19/14
to std-pr...@isocpp.org
For a function like

void print(string const& s) {
}

there is an implicit precondition: a string can not be constructed from a nullptr.
So you have to check this before calling. When replacing string const& with string_view,
you just need to make the precondition explicit. Then in both cases data() != null.

If you use string.data(), you can't simply replace func(string const&) with func(string_view)
even if data() returns non-null, since data() might not be null terminated.

void print(string const& s) {
  printf("%s", s.data());
}

If yyou don't use data() then, well... its not important if data() returns nullptr or not.

Miro Knejp

unread,
Jan 19, 2014, 8:36:40 PM1/19/14
to std-pr...@isocpp.org


For a function like

void print(string const& s) {
}

there is an implicit precondition: a string can not be constructed from a nullptr.
So you have to check this before calling. When replacing string const& with string_view,
you just need to make the precondition explicit. Then in both cases data() != null.
Yes, I as author of print have the implicit precondition enforced by std::string that it's always called with a valid string. That is something I can rely on. That is a good thing. Changing the signature to accept (nullable) string_view suddenly breaks this guarantee and I either have to add asserts or some other kinds of checks and document it somewhere and needlessly complexify every method relying on this guarantee. That is a bad thing. And suddenly you need to create new test cases for print even though it was working perfectly. That is also a bad thing.


If you use string.data(), you can't simply replace func(string const&) with func(string_view)
even if data() returns non-null, since data() might not be null terminated.

void print(string const& s) {
  printf("%s", s.data());
}

If yyou don't use data() then, well... its not important if data() returns nullptr or not.
If one used .data() to pass it to (const char*) methods (which should ring a bell and make you pay *very good attention* to what you're doing) before C++11 then it was already calling for disaster as .data() was not guaranteed to be null terminated. Then there are the methods which take a pointer and a length. There it doesn't matter but they might crash if passed in null (even if the length is 0). Seen it happen more often than I like. I always tell people to use .c_str() because it is more explicitly stating intentions and it is backwards compatible.

null always makes things more complicated than necessary. Just notice how the library is evolving to gradually remove the word and value "null" from our everyday usage. That's a good thing. Think in abstractions, not null. Throwing null at everyone only to enable a few edge cases seems wrong. One can always come up with situations where a nullable string_view might be useful, as is the case with many other things deemed "bad", but in the end it matters how common that scenario is compared to where a non-null guarantee makes code safer, simpler and more robust.

Peter Bigot

unread,
Jan 19, 2014, 8:54:32 PM1/19/14
to std-pr...@isocpp.org

Yes.  I am not in favor of string_view supporting null values as well as empty values; absence of a value is a distinct concept that should be treated independently as with std::optional.  I think string_view should support the intersection, not the union, of the information provided by its underlying concepts (a) std::string and (b) a pair (const charT * ptr, size_t len).  Only the latter is capable of expressing absence-of-a-value as distinct from empty-string.  (In fact it expresses multiple absence-of-a-value as you could wish to encode information in a non-zero size() paired with a null data().  Madness that way lies.  Just say no to null data().)

I do think that string_view is more a reference type than a collection type (thought it is both), so there may be an argument in support of being able to detect whether an instance holds a valid reference (i.e. does not have a default-constructed value, including a value equivalent to default-construction due to invocation of clear()).  If this is desirable, it can be detected easily by adding a member function has_reference() that returns !!m_ptr.
 
and "have an optimized optional<string_view>" proposals
as both allow null's for other purposes.

I can't speak to that question.

Peter

Paul Tessier

unread,
Jan 19, 2014, 10:39:14 PM1/19/14
to std-pr...@isocpp.org


On Sunday, January 19, 2014 8:36:40 PM UTC-5, Miro Knejp wrote:


For a function like

void print(string const& s) {
}

there is an implicit precondition: a string can not be constructed from a nullptr.
So you have to check this before calling. When replacing string const& with string_view,
you just need to make the precondition explicit. Then in both cases data() != null.
Yes, I as author of print have the implicit precondition enforced by std::string that it's always called with a valid string. That is something I can rely on. That is a good thing. Changing the signature to accept (nullable) string_view suddenly breaks this guarantee and I either have to add asserts or some other kinds of checks and document it somewhere and needlessly complexify every method relying on this guarantee. That is a bad thing. And suddenly you need to create new test cases for print even though it was working perfectly. That is also a bad thing.

How is that true.  If a string_view is null, how is it not valid.  All null views are empty views.  No additionally complexity or coding is required.  Calling other members with an empty view is for the most part undefined behaviour.  If data(), begin(), etc. return null or not is irrelevant, and an arbitrary restriction.

If you use string.data(), you can't simply replace func(string const&) with func(string_view)
even if data() returns non-null, since data() might not be null terminated.

void print(string const& s) {
  printf("%s", s.data());
}

If yyou don't use data() then, well... its not important if data() returns nullptr or not.
If one used .data() to pass it to (const char*) methods (which should ring a bell and make you pay *very good attention* to what you're doing) before C++11 then it was already calling for disaster as .data() was not guaranteed to be null terminated. Then there are the methods which take a pointer and a length. There it doesn't matter but they might crash if passed in null (even if the length is 0). Seen it happen more often than I like. I always tell people to use .c_str() because it is more explicitly stating intentions and it is backwards compatible.
 
null always makes things more complicated than necessary. Just notice how the library is evolving to gradually remove the word and value "null" from our everyday usage. That's a good thing. Think in abstractions, not null. Throwing null at everyone only to enable a few edge cases seems wrong. One can always come up with situations where a nullable string_view might be useful, as is the case with many other things deemed "bad", but in the end it matters how common that scenario is compared to where a non-null guarantee makes code safer, simpler and more robust.

A legacy interface's poorly handled parameters should not be used as an excuse for a design argument.

Paul Tessier

unread,
Jan 19, 2014, 11:07:45 PM1/19/14
to std-pr...@isocpp.org

Why would ( nullptr, 3 ) be considered an empty view?  It's garbage in almost all use cases but, so is ( (char*)1, 3 ).  Defending against one single bad pointer and ignoring all the rest, solves one out of a myriad of similar problems.  A null view ( null, 0 ) is no different than an empty view ( (char*)rand(), 0 ).  Although, is_null seems pointless if it can be guaranteed that the following be true.

For any empty string_view N constructed as ( string_view N{ P, 0 } ), that N.data(), N.begin(), N.end(), etc. all return P.

Currently P is banned from being null.  Which seems to solve one corner case and, creates problems for other, otherwise logical, use cases.
 

I do think that string_view is more a reference type than a collection type (thought it is both), so there may be an argument in support of being able to detect whether an instance holds a valid reference (i.e. does not have a default-constructed value, including a value equivalent to default-construction due to invocation of clear()).  If this is desirable, it can be detected easily by adding a member function has_reference() that returns !!m_ptr.
 

All references can become dangling by one way or another.  What you described is a smart pointer with a string interface, which if that's what you want, I can argue against.  It may solve many use cases but, seems overkill at least considering the current stated goal of string_view.
 

Nevin Liber

unread,
Jan 20, 2014, 1:01:16 AM1/20/14
to std-pr...@isocpp.org
On 19 January 2014 22:07, Paul Tessier <pher...@gmail.com> wrote:
Why would ( nullptr, 3 ) be considered an empty view?

It would be undefined behavior.

My personal preference for all this is that string_view(nullptr, 0) be allowed, and it is a Quality of Implementation issue whether or not sv.data() can ever return a nullptr; i.e., it would be allowed to return any legal pointer it wants for an empty string_view.
-- 

Jeffrey Yasskin

unread,
Jan 20, 2014, 1:12:55 AM1/20/14
to std-pr...@isocpp.org
On Sun, Jan 19, 2014 at 10:01 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On 19 January 2014 22:07, Paul Tessier <pher...@gmail.com> wrote:
>>
>> Why would ( nullptr, 3 ) be considered an empty view?
>
>
> It would be undefined behavior.
>
> My personal preference for all this is that string_view(nullptr, 0) be
> allowed, and it is a Quality of Implementation issue whether or not
> sv.data() can ever return a nullptr; i.e., it would be allowed to return any
> legal pointer it wants for an empty string_view.

That's interesting. It's easier to both specify and implement to just
say that string_view(x, y).data()==x and string_view(x, y).size()==y,
assuming [x, x+y) is a valid range. Why would you special-case data()
to return arbitrary things when y==0?

Thanks,
Jeffrey (who's adding the data()==nullptr option to the paper as we speak)

Nevin Liber

unread,
Jan 20, 2014, 1:25:58 AM1/20/14
to std-pr...@isocpp.org
On 20 January 2014 00:12, Jeffrey Yasskin <jyas...@google.com> wrote:
> My personal preference for all this is that string_view(nullptr, 0) be
> allowed, and it is a Quality of Implementation issue whether or not
> sv.data() can ever return a nullptr; i.e., it would be allowed to return any
> legal pointer it wants for an empty string_view.

That's interesting. It's easier to both specify and implement to just
say that string_view(x, y).data()==x and string_view(x, y).size()==y,
assuming [x, x+y) is a valid range. Why would you special-case data()
to return arbitrary things when y==0?

A debugging implementation may want to ensure that developers aren't dependent on "knowing" that a nullptr was passed in.

Jeffrey Yasskin

unread,
Jan 20, 2014, 2:36:00 AM1/20/14
to std-pr...@isocpp.org
Thanks. I think I don't personally want that, but I've added your
reason to the "if we want data()==nullptr" section in the attached
draft. I plan to send this to Clark tomorrow (Monday) morning.
string_view.html

Alexander Bolz

unread,
Jan 20, 2014, 7:15:16 AM1/20/14
to std-pr...@isocpp.org
Please let me summarize why I think string_view should be constructible
from a nullptr (and a zero length) and and data() should return the
pointer passed to the constructor. Please, if I got anything wrong, let me
know! I really don't understand the reason for data() != null.

The only reason presented here that data() should always return non-null is
that "nullptrs are bad". I'm not convinced. If there are other arguments,
I really would like to hear them.

Since it's not null terminated, string_view's data() is not a replacement
for string's data() and since it might contain null characters, string's
data() is not a replacement for C-strings. When using data() I have to
be careful anyway.

data() == null (which implies size() == 0) doesn't force the user
to write additional null checks. The standard _explicitly_ allows
nullptr + 0 and nullptr - nullptr. So if I use any standard algorithm
like find, I can write

find(sv.data(), sv.data() + sv.size(), 'x')

or

find(sv.begin(), sv.end(), 'x').

If a function f takes an iterator and and a length I could also write

f(sv.begin(), sv.end() - sv.begin()).

For all these algorithms using iterators, a null string_view is exactly
the same as an empty string_view.

Actually, allowing data() == null frees the user from performing null checks.

So I think allowing data() == null makes working with C-strings more safe
and working with std::string's more efficient. And of course I can directly
use any other contiguous range of characters as a (sub-)string.

And if there are situations where I need/want to know whether a string_view
was constructed from a nullptr, I can get this information, too.

This would not (directly) be possible if string_view were constructible from
a nullptr, but the return value of data() is unspecified. It actually
would introduce UB if I don't pay extra attention to this.

void f(char const* ptr, size_t len)
{
    auto s = string_view(ptr, len);
    // ...
    g(ptr, s.data() - ptr);
}

Summary: I pay for what I don't use: a non-null ptr.

Peter Bigot

unread,
Jan 20, 2014, 8:13:11 AM1/20/14
to std-pr...@isocpp.org
 
I believe your alternative section also needs something like:

In subclause "x.1 basic_string_view constructors and assignment operators [string.view.cons]"

    basic_string_view(const charT* str);
    Requires: [str,str + traits::length(str)) is a valid range, or that str is a null pointer value

along with setting the size() to zero in the added case, if I'm correct in my understanding that traits::length(str) is undefined when str is null.

Peter

Peter Bigot

unread,
Jan 20, 2014, 8:30:32 AM1/20/14
to std-pr...@isocpp.org

Thanks; that's a clear statement.  In return, here's my reasoning for the other perspective:

In addition to that it expresses something std::string cannot express, my main objection to allowing string_view(p) where p is a null pointer value is the assumption that this should be implicitly treated as identical to construction from an empty string.  I believe emptiness comes from size()==0, not from data()==nullptr in conjunction with an inference that therefore size() must also be zero because that's necessary to allow data() to be the basis of a valid range.

E.g., when getenv(3) returns a null pointer it means something very different from when it returns a non-null pointer to an empty string.  The argument that allowing null frees you from null checks only works when those two pointers signify the same thing and all uses of data() involve range operations.  It's equally true that disallowing null frees you from null checks because if you have a string_view at all you know data() cannot be null, and this becomes important when data() is being used as an iterator in its own right.

My strongest objection is to any line of reasoning that leads to:

  const std::string base("ABCDE");
  string_view sv(base);
  ASSERT_EQ(base.data(), sv.data());
  sv.remove_prefix(2);
  ASSERT_EQ(base.data()+2, sv.data());
  sv.remove_suffix(3);
  ASSERT_EQ(0, sv.size());
  ASSERT_EQ(nullptr, sv.data());

passing in a conforming implementation due to an expectation that all empty ranges are equivalent.

Would it be acceptable to everybody if language were added to the proposal to the effect that all mutating string_view operations ensure that [data(), data()+size()) is a valid subrange of [bp, bp+bn) where [bp, bp+bn) identifies the range last assigned to the string_view (on construction, through assignment, or through clear())?

Is that even necessary?  I had thought it was the original intent and that the necessary language is already there, but I'm not convinced everybody else sees it that way, and that part of the reason is a belief in equivalence of empty ranges.

Or maybe I'm mistaken in my impression that some people don't accept the use of data() as an iterator outside the range of the string_view that returned it, a key capability that I believe the reference semantics of string_view must permit.

Peter

Paul Tessier

unread,
Jan 20, 2014, 12:18:00 PM1/20/14
to std-pr...@isocpp.org

A simple implementation would be much better.  Allowing the pointer and size to pass through directly would allow one to track down their origins, if it is a cause for concern.  Preventing null solves nothing, a mishandled view will still be mishandled.  Mutating a null input is equally pointless in debugging aspect, as you have allowed a logical input to be changed into a different input which will be harder to trace back to the origin.

None of these arguments against have been of the fashion of preventing common programming pitfalls.  Just because std::string will never produce or accept a range beginning a null, does not mean it is logical or reasonable in all cases. [ P, P+N ) is a valid range for any non-negative N, therefore [ P, P ) is a valid range, for any P, even when P is in a protected segment.  Validity of a range only describes if it is rational.  It does not describe if it is rational to use it in all contexts.  Any P in a protected segment will trap when dereferenced.  There is no way to protected against this.  Dereferencing an empty range is almost always a mistake, no matter where it is located.

I would also argue against string_view( char* ) doing runtime null checks and creating an empty view, as this only defends against ( P == null ) and not P being in protected memory, of which null falls into.  Asserts on null, would be more reasonable.  Although in this case it can be argued the other way as it is a common practice to signal an empty C string with a null.  Empty C strings should be "" and not null but, there is no changing the past. string_view( null ) being [ null, null ) is not my preference but, something that I cannot argue against as being unreasonable.  ( string_view( null ) : string_view( rand(), 0 ) {} ) on the other hand is not reasonable.

If string_view cannot accept C string valid ranges, but only the ranges allow by std:string, it's C string constructors and support should be deleted.  It should work as any novice would expect.  A range passed in will be the same range returned.  Nothing more complicated than a thin wrapper around a valid range with string helper methods.




Marshall Clow

unread,
Jan 20, 2014, 3:14:03 PM1/20/14
to std-pr...@isocpp.org
On Jan 20, 2014, at 9:18 AM, Paul Tessier <pher...@gmail.com> wrote:

I would also argue against string_view( char* ) doing runtime null checks and creating an empty view, as this only defends against ( P == null ) and not P being in protected memory, of which null falls into. 

I agree. 
string_view(char *) should require that the char * point to a null terminated string.

— Marshall


Reply all
Reply to author
Forward
0 new messages