string_view for fstream::open()

3,095 views
Skip to first unread message

Julian Watzinger

unread,
Aug 8, 2017, 1:46:06 PM8/8/17
to ISO C++ Standard - Future Proposals
I'd like to propose a simple addition to std::basic_fstream:

Currently, there's two overloads for std::basic_fstream::open.

void open( const char *filename,
    ios_base
::openmode mode = ios_base::in|ios_base::out );
 
 



 
void open( const std::string &filename,                                  
    ios_base
::openmode mode = ios_base::in|ios_base::out );

That means that when making extensive use of string_view, I'd have to create a temporary std::string to open a filestream in a safe way. I suggest adding a third overload:

void open( const std::string_view &filename,                                  
    ios_base
::openmode mode = ios_base::in|ios_base::out );

plus an addition overload for this in the constructor. It would require the same addition to std::basic_filebuf::open. From what I can tell, it looks feasable. Any objections/further suggestions?

Bo Persson

unread,
Aug 8, 2017, 1:52:02 PM8/8/17
to std-pr...@isocpp.org
On 2017-08-08 19:46, Julian Watzinger wrote:
> I'd like to propose a simple addition to std::basic_fstream:
>
> Currently, there's two overloads for std::basic_fstream::open.
>
> |
> voidopen(constchar*filename,
> ios_base::openmodemode =ios_base::in|ios_base::out);
>
>
>
>
>
> voidopen(conststd::string
> <http://de.cppreference.com/w/cpp/string/basic_string>&filename,
> ios_base::openmodemode =ios_base::in|ios_base::out);
>
> |
>
> That means that when making extensive use of string_view, I'd have to
> create a temporary std::string to open a filestream in a safe way. I
> suggest adding a third overload:
>
> |
> voidopen(conststd::string
> <http://de.cppreference.com/w/cpp/string/basic_string>_view &filename,
> ios_base::openmodemode =ios_base::in|ios_base::out);
> |
>
> plus an addition overload for this in the constructor. It would require
> the same addition to std::basic_filebuf::open. From what I can tell, it
> looks feasable. Any objections/further suggestions?
>

One objection is that the file name is supposed to be a nul-terminated
string. The string_view is not required to be that.


Bo Persson


galax...@gmx.at

unread,
Aug 8, 2017, 1:59:44 PM8/8/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk
Am Dienstag, 8. August 2017 19:52:02 UTC+2 schrieb Bo Persson:
 
One objection is that the file name is supposed to be a nul-terminated
string. The string_view is not required to be that.


     Bo Persson



I realize that this is the current state, which is why I said that I'd have to create a temporary std::string from a std::string_view if I want to open a file-stream safely; otherwise I could just call the "const char* filename" overload from std::string_view::data().

Is there any technical reason why the filename would have to be nul-terminated? Otherwise one could simply make an implementation for open() that does not require a nul-terminated string but rather a (const char* filename, size_t size) pair.

Nicol Bolas

unread,
Aug 8, 2017, 2:05:25 PM8/8/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk

Yes, there are technical reasons. Pretty much every low-level file API uses NUL-terminated strings. So either the stream itself has to make (or worse allocate) a NUL-terminated copy, or we make you create a NUL-terminated copy. I'd say it's better to do the latter than the former. No need to promise something we can't deliver.

Julian Watzinger

unread,
Aug 8, 2017, 2:16:24 PM8/8/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk
Ah, I see. I definately agree on the notion that in this case, its better that I create a copy myself.

Is there any chance that low-level APIs will be moving away from nul-terminated strings in the future? Having a lot of low-level functions require a nul-terminated string kind of dampends the overall usefulness of string_view IMHO.
At least from what I can tell after a few days worth of porting to string_view - at first I though it would be a direct replacement for most places where "const std::string&" was being used, but now it seems there's a lot more consideration to be done, not to end up with more temporaries/allocations than before.

Nicol Bolas

unread,
Aug 8, 2017, 2:54:53 PM8/8/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk
On Tuesday, August 8, 2017 at 2:16:24 PM UTC-4, Julian Watzinger wrote:
Ah, I see. I definately agree on the notion that in this case, its better that I create a copy myself.

Is there any chance that low-level APIs will be moving away from nul-terminated strings in the future?

I rather doubt it. It's hardly high on most people's priority lists.

Having a lot of low-level functions require a nul-terminated string kind of dampends the overall usefulness of string_view IMHO.

Which is why I wrote my own `zstring_view` class, which explicitly represents a view of a NUL-terminated string (it treats the NUL-terminator the same way `std::string` does; you can look at it, but it's not explicitly part of the `begin/end` range). It has some of the operations of `string_view` (thanks to inheritance), but it removes or alters the ones that subdivide from the end. For obvious reasons.

At least from what I can tell after a few days worth of porting to string_view - at first I though it would be a direct replacement for most places where "const std::string&" was being used, but now it seems there's a lot more consideration to be done, not to end up with more temporaries/allocations than before.

You have exactly expressed why I wrote `zstring_view`.

Thiago Macieira

unread,
Aug 8, 2017, 5:52:52 PM8/8/17
to std-pr...@isocpp.org
On Tuesday, 8 August 2017 11:16:24 PDT Julian Watzinger wrote:
> Is there any chance that low-level APIs will be moving away from
> nul-terminated strings in the future?

Zero.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

olafv...@gmail.com

unread,
Aug 9, 2017, 5:35:12 AM8/9/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk


Op dinsdag 8 augustus 2017 20:54:53 UTC+2 schreef Nicol Bolas:
Any plans to propose it for standardization? 

Julian Watzinger

unread,
Aug 9, 2017, 6:13:51 AM8/9/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk
Am Dienstag, 8. August 2017 20:54:53 UTC+2 schrieb Nicol Bolas:
I rather doubt it. It's hardly high on most people's priority lists.

Quite unfortunate :( Its one thing to have a c-style API, which I personally dislike when compared to modern C++; but to have the API rrestrict the input-data in such a way... I'll still hope that at some point, now that we actually have string_view, people will start to see this as a necessity :/


Which is why I wrote my own `zstring_view` class, which explicitly represents a view of a NUL-terminated string (it treats the NUL-terminator the same way `std::string` does; you can look at it, but it's not explicitly part of the `begin/end` range). It has some of the operations of `string_view` (thanks to inheritance), but it removes or alters the ones that subdivide from the end. For obvious reasons.

Huh, I ended up writing a similar class, though I ended up ditching it, as it didn't work quite as expected for my codebase - ie. I ended up having to convert from string_view to cstring_view so often that it became unbeareabe pretty quickly.

I actually ended up with a hack of sort:

template<size_t Size, typename Type>
std
::array<Type, Size> toAPI(std::base_string_view<Type> view) const
{
    std
::array<Type, Size> vArray;

    view
.copy(vArray.data(), Size);
    vArray
[view.size()] = '\0';

   
return vArray;
}

At least when calling fileIO/WinAPI functions usually there's a limit to how many characters can be in a string (ie. MAX_PATH = 260), so I'd call toAPI<MAX_PATH>(view), which doesn't invoke dynamic allocations and should thus be faster then creating a string (untested, hopefully the array gets affected by RVO/copy ellision).

Thiago Macieira

unread,
Aug 9, 2017, 11:04:17 AM8/9/17
to std-pr...@isocpp.org
On Wednesday, 9 August 2017 03:13:50 PDT Julian Watzinger wrote:
> Quite unfortunate Its one thing to have a c-style API, which I
> personally dislike when compared to modern C++; but to have the API
> rrestrict the input-data in such a way... I'll still hope that at some
> point, now that we actually have string_view, people will start to see this
> as a necessity

Do you mean those C developers developing C API? Including people actively
hostile to C++, like Linus Torvalds?

When do you think they will see std::string_view as a necessity?

Nicol Bolas

unread,
Aug 9, 2017, 11:27:10 AM8/9/17
to ISO C++ Standard - Future Proposals


On Wednesday, August 9, 2017 at 11:04:17 AM UTC-4, Thiago Macieira wrote:
On Wednesday, 9 August 2017 03:13:50 PDT Julian Watzinger wrote:
> Quite unfortunate  Its one thing to have a c-style API, which I
> personally dislike when compared to modern C++; but to have the API
> rrestrict the input-data in such a way... I'll still hope that at some
> point, now that we actually have string_view, people will start to see this
> as a necessity

Do you mean those C developers developing C API? Including people actively
hostile to C++, like Linus Torvalds?

When do you think they will see std::string_view as a necessity?

It's not about making them adopt `string_view` specifically. It's more about them taking string+length. Right now, those APIs require NUL-termination of strings.

But there are some C APIs that take a string+length. Lua, for example. OpenGL's shader APIs are another example.

Nicol Bolas

unread,
Aug 9, 2017, 11:35:04 AM8/9/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk, olafv...@gmail.com

Not really. As useful as `zstring_view` certainly is, I do not like the idea of proliferating `string_view` types. The other problem is that there aren't very many APIs in the C++ parts of the standard that specifically need NUL-termination.

Now, that being said, P0645 includes a `cstring_view` type, which is just a `const char*` for a NUL-terminated string. That is, it has no size, so `size` is linear. Personally, I think this is a horrible type and I hope it gets revised during standardization into something more like `zstring_view`. And that the system gets changed to one that can work based on an actual `string_view`.

There's no excuse for making C++ APIs that depend on NUL-termiantion.

Jeffrey Yasskin

unread,
Aug 9, 2017, 12:16:23 PM8/9/17
to std-pr...@isocpp.org, b...@gmb.dk
On Tue, Aug 8, 2017 at 11:16 AM, Julian Watzinger <galax...@gmx.at> wrote:
> Ah, I see. I definately agree on the notion that in this case, its better
> that I create a copy myself.
>
> Is there any chance that low-level APIs will be moving away from
> nul-terminated strings in the future?

As folks have said, these are largely system calls, for which the cost
of migration is probably too high for the benefit that going to
pointer+length would provide.

However, being system calls, they also cost significantly more than
the cost of copying a path-length string, so don't be too afraid of
just doing that.

Jeffrey

Niall Douglas

unread,
Aug 9, 2017, 12:19:34 PM8/9/17
to ISO C++ Standard - Future Proposals, b...@gmb.dk, galax...@gmx.at

At least when calling fileIO/WinAPI functions usually there's a limit to how many characters can be in a string (ie. MAX_PATH = 260), so I'd call toAPI<MAX_PATH>(view), which doesn't invoke dynamic allocations and should thus be faster then creating a string (untested, hopefully the array gets affected by RVO/copy ellision).

Recent Windows allow max path to reach 32K characters if a process opts into it.

You should look at https://ned14.github.io/afio/classafio__v2__xxx_1_1path__view.html and specifically https://ned14.github.io/afio/structafio__v2__xxx_1_1path__view_1_1c__str.html. It is the correct solution to this problem and works very well, despite slightly touching on UB. If the C++ standard could be improved to avoid that slight UB, that would be very valuable instead of your original request.

Niall

Thiago Macieira

unread,
Aug 9, 2017, 3:41:37 PM8/9/17
to std-pr...@isocpp.org
On quarta-feira, 9 de agosto de 2017 08:35:04 PDT Nicol Bolas wrote:
> Now, that being said, P0645 includes a `cstring_view` type
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0645r0.html#String
> View>, which is just a `const char*` for a NUL-terminated string. That is,
> it has no size, so `size` is linear. Personally, I think this is a horrible
> type and I hope it gets revised during standardization into something more
> like `zstring_view`. And that the system gets changed to one that can work
> based on an actual `string_view`.

Either way, it's useful for an alternate main() function that could take C++
types. Transforming

int main(int argc, char **argv)

into

int main(std::initializer_list<std::cstring_view>)
(or zstring_view or stringz_view or whatever you call it)

Can be achieved with one instruction added by the compiler into the main
function (which is special anyway).

Thiago Macieira

unread,
Aug 9, 2017, 3:46:02 PM8/9/17
to std-pr...@isocpp.org
On quarta-feira, 9 de agosto de 2017 09:19:34 PDT Niall Douglas wrote:
> You should look at
> https://ned14.github.io/afio/classafio__v2__xxx_1_1path__view.html and
> specifically
> https://ned14.github.io/afio/structafio__v2__xxx_1_1path__view_1_1c__str.htm
> l. It is the correct solution to this problem and works very well, despite
> slightly touching on UB. If the C++ standard could be improved to avoid
> that slight UB, that would be very valuable instead of your original
> request.

What UB? The links don't talk about it.

Thiago Macieira

unread,
Aug 9, 2017, 3:49:42 PM8/9/17
to std-pr...@isocpp.org
On quarta-feira, 9 de agosto de 2017 09:15:59 PDT 'Jeffrey Yasskin' via ISO C+
+ Standard - Future Proposals wrote:
> However, being system calls, they also cost significantly more than
> the cost of copying a path-length string, so don't be too afraid of
> just doing that.

The problem is the copying an input of arbitrary length in an exception-free
and thread-safe manner.

You can't do malloc(), since that might fail. You can't use new[] because it
might throw. You can't lock a mutex because that is not async signal safe.

Olaf van der Spek

unread,
Aug 9, 2017, 3:52:47 PM8/9/17
to ISO C++ Standard - Future Proposals
2017-08-09 21:49 GMT+02:00 Thiago Macieira <thi...@macieira.org>:
> On quarta-feira, 9 de agosto de 2017 09:15:59 PDT 'Jeffrey Yasskin' via ISO C+
> + Standard - Future Proposals wrote:
>> However, being system calls, they also cost significantly more than
>> the cost of copying a path-length string, so don't be too afraid of
>> just doing that.
>
> The problem is the copying an input of arbitrary length in an exception-free
> and thread-safe manner.

Exception-free or error-free?
Why does it have to be error-free? Generally the (system) call can fail too.



--
Olaf

Thiago Macieira

unread,
Aug 9, 2017, 4:09:22 PM8/9/17
to std-pr...@isocpp.org
Exception-free. And you can't call malloc (not async signal safe), you can't
lock a mutex or even a spinlock.

Niall Douglas

unread,
Aug 9, 2017, 6:39:49 PM8/9/17
to ISO C++ Standard - Future Proposals

What UB? The links don't talk about it.

It's only a hint of UB, not actual UB. I've improved the docs with a red warning sign saying "the byte after the view must be readable".

Niall

Jeffrey Yasskin

unread,
Aug 9, 2017, 7:22:01 PM8/9/17
to std-pr...@isocpp.org
On Wed, Aug 9, 2017 at 12:49 PM, Thiago Macieira <thi...@macieira.org> wrote:
> On quarta-feira, 9 de agosto de 2017 09:15:59 PDT 'Jeffrey Yasskin' via ISO C+
> + Standard - Future Proposals wrote:
>> However, being system calls, they also cost significantly more than
>> the cost of copying a path-length string, so don't be too afraid of
>> just doing that.
>
> The problem is the copying an input of arbitrary length in an exception-free
> and thread-safe manner.
>
> You can't do malloc(), since that might fail. You can't use new[] because it
> might throw. You can't lock a mutex because that is not async signal safe.

Which are the low-level APIs that take null-terminated strings and are
also async-signal-safe and failure-free? (If I needed
exception-freedom I'd use new(nothrow) or catch the exception and turn
it into a failure.)

Thanks,
Jeffrey

galax...@gmx.at

unread,
Aug 9, 2017, 7:35:48 PM8/9/17
to ISO C++ Standard - Future Proposals
Am Mittwoch, 9. August 2017 17:04:17 UTC+2 schrieb Thiago Macieira:
Do you mean those C developers developing C API? Including people actively
hostile to C++, like Linus Torvalds?

When do you think they will see std::string_view as a necessity?



Well I said "hope", not "think" :>  Still astounds me how anyone can really actively oppose C++ and support C at the same time, but... no, I have nothing.


It's not about making them adopt `string_view` specifically. It's more about them taking string+length. Right now, those APIs require NUL-termination of strings.

Aye, thats the main points. Its one thing to have an API take (void*, size_t) for some array-manipulation, but to have (void*) and require the pointer to have a specific value at the end would be considered ludicrous, outside of strings OFC (yeah there's the historal reason but meh...).


However, being system calls, they also cost significantly more than
the cost of copying a path-length string, so don't be too afraid of
just doing that.

Its not just that, but when I'm thinking about integrating std::string_view, and now I'm running into situations where I actually need to make more unnecessary copies then if I just sticked to "const std::string&", thats at least something to consider. Thats literally the first modern C++-feature where I noticed something like it - its like if, say, std::unique_ptr sometimes forced you to create a temporary copy of the owned object to be intercompatible with some API...


Recent Windows allow max path to reach 32K characters if a process opts into it.

Eh, I know why I hate developing with WinAPI :/


 

 

Nicol Bolas

unread,
Aug 9, 2017, 7:44:41 PM8/9/17
to ISO C++ Standard - Future Proposals, galax...@gmx.at
On Wednesday, August 9, 2017 at 7:35:48 PM UTC-4, galax...@gmx.at wrote:
Am Mittwoch, 9. August 2017 17:04:17 UTC+2 schrieb Thiago Macieira:
Do you mean those C developers developing C API? Including people actively
hostile to C++, like Linus Torvalds?

When do you think they will see std::string_view as a necessity?



Well I said "hope", not "think" :>  Still astounds me how anyone can really actively oppose C++ and support C at the same time, but... no, I have nothing.

It's not about making them adopt `string_view` specifically. It's more about them taking string+length. Right now, those APIs require NUL-termination of strings.

Aye, thats the main points. Its one thing to have an API take (void*, size_t) for some array-manipulation, but to have (void*) and require the pointer to have a specific value at the end would be considered ludicrous, outside of strings OFC (yeah there's the historal reason but meh...).

It's funny that you mention that, because I actually know of a few APIs that do that. Namely... Lua (with luaL_Reg) and OpenGL (wgl/glXCreateContextAttribsARB). Yes, the same ones that don't require NUL-termination for their strings.

Arrays with sentinel values are not as uncommon as you think. They're used primarily on APIs where there is a genuine expectation that the receiving function is only going to look at each item in turn, and only going to look at it once. What they are not is the default case. Or the common case.

galax...@gmx.at

unread,
Aug 9, 2017, 7:54:14 PM8/9/17
to ISO C++ Standard - Future Proposals, galax...@gmx.at


Am Donnerstag, 10. August 2017 01:44:41 UTC+2 schrieb Nicol Bolas:
It's funny that you mention that, because I actually know of a few APIs that do that. Namely... Lua (with luaL_Reg) and OpenGL (wgl/glXCreateContextAttribsARB). Yes, the same ones that don't require NUL-termination for their strings.

Arrays with sentinel values are not as uncommon as you think. They're used primarily on APIs where there is a genuine expectation that the receiving function is only going to look at each item in turn, and only going to look at it once. What they are not is the default case. Or the common case.

Uh, at least its good to know those aren't default/common I suppose. Come to think of it, I know of exactly one example where this is the case, DX11 (D3DCompile where the pDefines parameter requires {nulltr, nullptr}), but thats literally the only time I saw it, and you can imagine how weirded out I was by that.

olafv...@gmail.com

unread,
Aug 10, 2017, 3:10:40 AM8/10/17
to ISO C++ Standard - Future Proposals
Op donderdag 10 augustus 2017 00:39:49 UTC+2 schreef Niall Douglas:

What UB? The links don't talk about it.

It's only a hint of UB, not actual UB. I've improved the docs with a red warning sign saying "the byte after the view must be readable".

How is that not UB? 
 

Niall Douglas

unread,
Aug 10, 2017, 9:44:46 AM8/10/17
to ISO C++ Standard - Future Proposals, olafv...@gmail.com
The requirement in order to provide defined behaviour that the character after the view must be readable is publicly documented. Failure by the end user to ensure this leads to UB, specifically, a segfault.

Niall

Matthew Woehlke

unread,
Aug 10, 2017, 9:49:07 AM8/10/17
to std-pr...@isocpp.org, galax...@gmx.at
On 2017-08-09 19:54, galax...@gmx.at wrote:
> Am Donnerstag, 10. August 2017 01:44:41 UTC+2 schrieb Nicol Bolas:
>> Arrays with sentinel values are not as uncommon as you think. They're used
>> primarily on APIs where there is a genuine expectation that the receiving
>> function is only going to look at each item in turn, and only going to look
>> at it *once*. What they are *not* is the default case. Or the common case.
>
> Uh, at least its good to know those aren't default/common I suppose. Come
> to think of it, I know of exactly one example where this is the case, DX11 (
> D3DCompile
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd607324(v=vs.85).aspx>
> where the pDefines parameter requires {nulltr, nullptr}), but thats
> literally the only time I saw it, and you can imagine how weirded out I was
> by that.

There are some examples also in PROJ.4. Actually, it's not that uncommon
(at least in *my* experience) when you have a static list of things to
have the last item be a sentinel value rather than writing the somewhat
messy logic to get the size of the list. (Again, this tends to apply
when the usual use of the list is to walk it, not index into it.) Even
in C++, I've used this pattern myself some times.

https://github.com/Kitware/vivia/blob/master/Libraries/VvVtkWidgets/vvTrackInfo.cxx#L51
- This is an example I wrote...

--
Matthew

Thiago Macieira

unread,
Aug 10, 2017, 12:00:04 PM8/10/17
to std-pr...@isocpp.org
On quarta-feira, 9 de agosto de 2017 16:21:37 PDT 'Jeffrey Yasskin' via ISO C+
+ Standard - Future Proposals wrote:
> > You can't do malloc(), since that might fail. You can't use new[] because
> > it might throw. You can't lock a mutex because that is not async signal
> > safe.
> Which are the low-level APIs that take null-terminated strings and are
> also async-signal-safe and failure-free? (If I needed
> exception-freedom I'd use new(nothrow) or catch the exception and turn
> it into a failure.)

open(), readlink(), stat(), and their *at() versions, exec*, etc.

See http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
for the full list (http://man7.org/linux/man-pages/man7/signal-safety.7.html
too).

I don't see a solution other than a thread-specific buffer of PATH_MAX size.
For Windows, that would be 64 kB, per thread.

Jeffrey Yasskin

unread,
Aug 10, 2017, 5:28:41 PM8/10/17
to std-pr...@isocpp.org
On Thu, Aug 10, 2017 at 8:59 AM, Thiago Macieira <thi...@macieira.org> wrote:
> On quarta-feira, 9 de agosto de 2017 16:21:37 PDT 'Jeffrey Yasskin' via ISO C+
> + Standard - Future Proposals wrote:
>> > You can't do malloc(), since that might fail. You can't use new[] because
>> > it might throw. You can't lock a mutex because that is not async signal
>> > safe.
>> Which are the low-level APIs that take null-terminated strings and are
>> also async-signal-safe and failure-free? (If I needed
>> exception-freedom I'd use new(nothrow) or catch the exception and turn
>> it into a failure.)
>
> open(), readlink(), stat(), and their *at() versions, exec*, etc.
>
> See http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
> for the full list (http://man7.org/linux/man-pages/man7/signal-safety.7.html
> too).

All of those have long lists of failure conditions, often including
ENOMEM. Adding "couldn't allocate user-space memory" doesn't seem like
the end of the world.

Jeffrey

Jeffrey Yasskin

unread,
Aug 10, 2017, 5:32:51 PM8/10/17
to std-pr...@isocpp.org
Ah, but you'll fall back to just wanting to write a signal-safe
function, even if it can fail. In that case, you probably do need to
operate on char*s and avoid almost the entire C++ library. We haven't
even said that std::string_view is signal-safe, even though we might
be able to say that.

Jeffrey

Thiago Macieira

unread,
Aug 10, 2017, 10:31:15 PM8/10/17
to std-pr...@isocpp.org
On quinta-feira, 10 de agosto de 2017 14:32:28 PDT 'Jeffrey Yasskin' via ISO C
++ Standard - Future Proposals wrote:
> > All of those have long lists of failure conditions, often including
> > ENOMEM. Adding "couldn't allocate user-space memory" doesn't seem like
> > the end of the world.
>
> Ah, but you'll fall back to just wanting to write a signal-safe
> function, even if it can fail. In that case, you probably do need to
> operate on char*s and avoid almost the entire C++ library. We haven't
> even said that std::string_view is signal-safe, even though we might
> be able to say that.

We may also want to pay some attention to the POSIX asynchronous cancellation
points and how they relate to exceptions.

Jeffrey Yasskin

unread,
Aug 10, 2017, 11:42:10 PM8/10/17
to std-pr...@isocpp.org
On Thu, Aug 10, 2017 at 7:31 PM, Thiago Macieira <thi...@macieira.org> wrote:
>
> On quinta-feira, 10 de agosto de 2017 14:32:28 PDT 'Jeffrey Yasskin' via ISO C
> ++ Standard - Future Proposals wrote:
> > > All of those have long lists of failure conditions, often including
> > > ENOMEM. Adding "couldn't allocate user-space memory" doesn't seem like
> > > the end of the world.
> >
> > Ah, but you'll fall back to just wanting to write a signal-safe
> > function, even if it can fail. In that case, you probably do need to
> > operate on char*s and avoid almost the entire C++ library. We haven't
> > even said that std::string_view is signal-safe, even though we might
> > be able to say that.
>
> We may also want to pay some attention to the POSIX asynchronous cancellation
> points and how they relate to exceptions.

SG1 spent a lot of time on that when designing the C++11 threading
library, and couldn't find a way to make POSIX cancellation fit into
C++. You're welcome to spend more of your own time thinking about it,
but there's unlikely to be a fit.

Jeffrey

Thiago Macieira

unread,
Aug 11, 2017, 12:26:08 AM8/11/17
to std-pr...@isocpp.org
On quinta-feira, 10 de agosto de 2017 20:41:46 PDT 'Jeffrey Yasskin' via ISO C
++ Standard - Future Proposals wrote:
> SG1 spent a lot of time on that when designing the C++11 threading
> library, and couldn't find a way to make POSIX cancellation fit into
> C++. You're welcome to spend more of your own time thinking about it,
> but there's unlikely to be a fit.

Sorry, I misspoke. We were talking about async-signal safety and I wrote
"asynchronous cancellation". Asynchronous cancellation is really hard and
really incompatible with proper exception-safety and unwinding.

I meant the *synchronous* cancellation, which can only happen in functions
that are clearly specified to be cancellation points.

Olaf van der Spek

unread,
Aug 11, 2017, 5:16:22 AM8/11/17
to Niall Douglas, ISO C++ Standard - Future Proposals
2017-08-10 15:44 GMT+02:00 Niall Douglas <nialldo...@gmail.com>:
> On Thursday, August 10, 2017 at 8:10:40 AM UTC+1, olafv...@gmail.com wrote:
>>
>> Op donderdag 10 augustus 2017 00:39:49 UTC+2 schreef Niall Douglas:
>>>>
>>>>
>>>> What UB? The links don't talk about it.
>>>
>>>
>>> It's only a hint of UB, not actual UB. I've improved the docs with a red
>>> warning sign saying "the byte after the view must be readable".
>>
>>
>> How is that not UB?
>>
> The requirement in order to provide defined behaviour that the character
> after the view must be readable is publicly documented.

Ouch. IMO that's not a reasonable requirement / interface.

> Failure by the end
> user to ensure this leads to UB, specifically, a segfault.

What could the committee do about this?


--
Olaf

Olaf van der Spek

unread,
Aug 11, 2017, 5:20:32 AM8/11/17
to ISO C++ Standard - Future Proposals
2017-08-10 23:28 GMT+02:00 'Jeffrey Yasskin' via ISO C++ Standard -
Future Proposals <std-pr...@isocpp.org>:
What's the proper design for such interfaces though?
zstring_view seems best but doesn't exist.
const char* wouldn't do any unnecessary allocations.
Both string_view and string might do unnecessary allocations

What do kernels use internally? If they use ptr/size there might be a
chance for a ptr/size interface, if they don't we're stuck with
termination.


--
Olaf

galax...@gmx.at

unread,
Aug 11, 2017, 12:27:49 PM8/11/17
to ISO C++ Standard - Future Proposals, galax...@gmx.at

Okay, appearently its not that uncommon. But its seems mostly for high-levelish work that could be worked around if necessary. In case of file-IO, its pretty low level and basic, and imposes a restriction that cannot be worked around (= rebuild funcionality manually) in any way. Its kind of like if memcpy, memcmp etc... requires the end of the data-pointers passed in to contain a certain escape-sequence.

I'll rest my case though and hope that some other solution like the ones mentioned are proposed - though as I said I personally didn't feel that CStringView/ZStringView worked out quite as well as expected, it just pushed the burden of constructing the temporary-string further up to the user of the abstractions instead of the abstaction itself. Might revisit it once I'm done porting the whole codebase to StringView though.

Thiago Macieira

unread,
Aug 11, 2017, 12:34:34 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 02:16:19 PDT Olaf van der Spek wrote:
> > The requirement in order to provide defined behaviour that the character
> > after the view must be readable is publicly documented.
>
> Ouch. IMO that's not a reasonable requirement / interface.
>
> > Failure by the end
> > user to ensure this leads to UB, specifically, a segfault.
>
> What could the committee do about this?

Why should the committee do something about it?

void *ptr = mmap(nullptr, 4096, PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);
std::string_view(static_cast<char *>(ptr), 4096);

The character after the end of the string is not readable. Why do you want me
to have 100% overhead?

Thiago Macieira

unread,
Aug 11, 2017, 12:39:28 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek wrote:
> What's the proper design for such interfaces though?
> zstring_view seems best but doesn't exist.
> const char* wouldn't do any unnecessary allocations.
> Both string_view and string might do unnecessary allocations

zstring_view would be a nice wrapper for const char*. It basically *is* a
const char*, with a nice string_view-like API on top.

> What do kernels use internally? If they use ptr/size there might be a
> chance for a ptr/size interface, if they don't we're stuck with
> termination.

Null-terminated strings. Both the Linux and Darwin/FreeBSD kernels are written
in C, so that use is very widespread. I don't know what the Windows kernel is
written in, but considering its age I doubt it's anything besides C.

Win32 may have some ptr/size string APIs, like MultiByteToWideChar, but I
don't remember ever seeing one such where file paths are concerned.

Nicol Bolas

unread,
Aug 11, 2017, 1:01:28 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 12:39:28 PM UTC-4, Thiago Macieira wrote:
On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek wrote:
> What's the proper design for such interfaces though?
> zstring_view seems best but doesn't exist.
> const char* wouldn't do any unnecessary allocations.
> Both string_view and string might do unnecessary allocations

zstring_view would be a nice wrapper for const char*. It basically *is* a
const char*, with a nice string_view-like API on top.

At present, there are essentially two designs for such a type, which I will name differently based on actual code I've seen.

`zstring_view` is conceptually a `string_view`. This means that it is a pointer+size, so `size()` is always O(1). But this also means that if you try to put a literal in a `zstring_view`, it will have to invoke `char_traits::length` to get the length. This is good if you're actually going to use begin/end iterator pairs. But it's painful if you're just doing one-time forward iteration.

`cstring_view` is just a `const char*` with a nice wrapper. It is just a pointer, which means `size()` is always O(n). But this also means that sticking a literal in it costs nothing. But if you try to use it with begin/end iterator pairs, you provoke an O(n) operation.

At the same time, with the new Range TS paradigm of "iterator/sentinel"s, it becomes possible to do forward iteration on a `cstring_view` without that extra cost. The sentinel would simply dereference the iterator when it does a comparison.

In the absence of the Range TS, I think `zstring_view` is the better design. You can even mitigate the O(n) length for literals by providing a `zsv` UDL for it (which I have in my implementation). But in the Range TS world, where `cstring_view::end` can return a sentinel rather than an iterator, `cstring_view` seems more performance-friendly in the common cases of forward iteration.

The one thing I absolutely do not want is to have both of them standardized. In the domain of NUL-terminated-string-views, we only need one answer. And neither answer is so performance-unfriendly that the other one needs to be standardized to fix it.

We could of course cop out, and simply say that when it invokes `length` is implementation defined. In that case, `size()` may or may not be O(1). But I really hate that idea.

Niall Douglas

unread,
Aug 11, 2017, 1:01:57 PM8/11/17
to ISO C++ Standard - Future Proposals

SG1 spent a lot of time on that when designing the C++11 threading
library, and couldn't find a way to make POSIX cancellation fit into
C++. You're welcome to spend more of your own time thinking about it,
but there's unlikely to be a fit.

Boost.Thread is probably as close as is possible to implement a thread cancelling capable C++ 11 threading library. It has a lot of imperfections and flaws. I think WG21 was right to leave out thread cancellation from the standard threading library. If people think they really need thread cancellation, they can go swap in Boost.Thread, it's mostly API compatible.

Niall

Niall Douglas

unread,
Aug 11, 2017, 1:10:49 PM8/11/17
to ISO C++ Standard - Future Proposals, nialldo...@gmail.com, olafv...@gmail.com
> The requirement in order to provide defined behaviour that the character
> after the view must be readable is publicly documented.

Ouch. IMO that's not a reasonable requirement / interface.

Maybe not for string_view.

For a path_view, definitely yes. When have you ever sent a range of bytes to a filesystem path API where accessing the character off the end of the last non-zero character isn't legal?

I've been working on AFIO for a very long time now. I've never seen a case in all my years.
 

> Failure by the end
> user to ensure this leads to UB, specifically, a segfault.

What could the committee do about this?

I've always felt that string_view could store whether the input at construction is known for a fact to be zero terminated. Top bit of the length would make sense. Maximum view size would then be SIZE_T_MAX>>1.

There is also a case that most of the string_view constructors could be allowed to probe for zero termination after the view end, but at least one constructor would be guaranteed to never do so.

Lots of options there. Finally, you could just leave string_view alone, and adopt the afio::path_view instead. Filesystem paths are not like other zero terminated strings sent to the kernel. They are treated as blobs of undifferentiated bytes apart from zero and '/'.

Niall

Niall Douglas

unread,
Aug 11, 2017, 1:17:32 PM8/11/17
to ISO C++ Standard - Future Proposals

> What do kernels use internally? If they use ptr/size there might be a
> chance for a ptr/size interface, if they don't we're stuck with
> termination.

Null-terminated strings. Both the Linux and Darwin/FreeBSD kernels are written
in C, so that use is very widespread. I don't know what the Windows kernel is
written in, but considering its age I doubt it's anything besides C.

The NT kernel works in entirely in wchar_t with sized strings which cannot exceed 65536 bytes. There are Length and MaximumLength fields, the latter by tradition includes a zero terminator even though the kernel doesn't use zero termination.

It is legal to supply strings containing one or many zero characters, they are a legal filename. This causes much fun with Win32, and is a classic way for virus authors to create undeleteable files.

Niall

Niall Douglas

unread,
Aug 11, 2017, 1:21:25 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 6:01:28 PM UTC+1, Nicol Bolas wrote:
On Friday, August 11, 2017 at 12:39:28 PM UTC-4, Thiago Macieira wrote:
On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek wrote:
> What's the proper design for such interfaces though?


Let me ask a thought provoking question.

If you exclude filesystem paths, what remaining use cases are there for zero terminated string views?

I'd argue not enough to bother standardising them, but maybe I've missed something.

As for filesystem paths, they really are best implemented via a design identical to afio::path_view. That design is optimally efficient on all platforms.

Niall
 

Olaf van der Spek

unread,
Aug 11, 2017, 1:26:36 PM8/11/17
to ISO C++ Standard - Future Proposals
2017-08-11 18:34 GMT+02:00 Thiago Macieira <thi...@macieira.org>:
> On sexta-feira, 11 de agosto de 2017 02:16:19 PDT Olaf van der Spek wrote:
>> > The requirement in order to provide defined behaviour that the character
>> > after the view must be readable is publicly documented.
>>
>> Ouch. IMO that's not a reasonable requirement / interface.
>>
>> > Failure by the end
>> > user to ensure this leads to UB, specifically, a segfault.
>>
>> What could the committee do about this?
>
> Why should the committee do something about it?

I don't know..
Niall suggested the committee could do something about it so I'm
wondering what it is they could do.

> void *ptr = mmap(nullptr, 4096, PROT_READ, MAP_PRIVATE | MAP_ANON, -1, 0);
> std::string_view(static_cast<char *>(ptr), 4096);
>
> The character after the end of the string is not readable. Why do you want me
> to have 100% overhead?

I don't
--
Olaf

Nicol Bolas

unread,
Aug 11, 2017, 1:38:14 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 1:21:25 PM UTC-4, Niall Douglas wrote:
On Friday, August 11, 2017 at 6:01:28 PM UTC+1, Nicol Bolas wrote:
On Friday, August 11, 2017 at 12:39:28 PM UTC-4, Thiago Macieira wrote:
On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek wrote:
> What's the proper design for such interfaces though?


Let me ask a thought provoking question.

If you exclude filesystem paths, what remaining use cases are there for zero terminated string views?

Literally any API not under your direct control which exclusively trafficks in NUL-terminated strings. It turns out there are a lot of those. You can claim that maybe they shouldn't exist, but since those APIs are out of your direct control, your desires mean squat.

The APIs are what they are, and you have to deal with them. And having a single tool to deal with them is not a bad idea.

I'd argue not enough to bother standardising them, but maybe I've missed something.

As for filesystem paths, they really are best implemented via a design identical to afio::path_view. That design is optimally efficient on all platforms.

Um... why? We already have a path type. And that path type is just a string (of an implementation-defined type). So you can already get an appropriate view for it. What's the point of having yet another view of a contiguous sequence of constant characters?

Jeffrey Yasskin

unread,
Aug 11, 2017, 2:02:37 PM8/11/17
to std-pr...@isocpp.org
On Fri, Aug 11, 2017 at 10:38 AM, Nicol Bolas <jmck...@gmail.com> wrote:
> On Friday, August 11, 2017 at 1:21:25 PM UTC-4, Niall Douglas wrote:
>>
>> On Friday, August 11, 2017 at 6:01:28 PM UTC+1, Nicol Bolas wrote:
>>>
>>> On Friday, August 11, 2017 at 12:39:28 PM UTC-4, Thiago Macieira wrote:
>>>>
>>>> On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek
>>>> wrote:
>>>> > What's the proper design for such interfaces though?
>>>>
>>
>> Let me ask a thought provoking question.
>>
>> If you exclude filesystem paths, what remaining use cases are there for
>> zero terminated string views?
>
>
> Literally any API not under your direct control which exclusively trafficks
> in NUL-terminated strings. It turns out there are a lot of those. You can
> claim that maybe they shouldn't exist, but since those APIs are out of your
> direct control, your desires mean squat.
>
> The APIs are what they are, and you have to deal with them. And having a
> single tool to deal with them is not a bad idea.

The standard's choices have an effect on how APIs evolve. If we think
folks should be migrating away from NUL-termination, omitting a
standard mechanism to deal with those APIs creates pressure to fix
them.

Jeffrey

Nevin Liber

unread,
Aug 11, 2017, 2:07:24 PM8/11/17
to std-pr...@isocpp.org
On Fri, Aug 11, 2017 at 1:02 PM, 'Jeffrey Yasskin' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
The standard's choices have an effect on how APIs evolve. If we think
folks should be migrating away from NUL-termination, omitting a
standard mechanism to deal with those APIs creates pressure to fix
them.

Many of those APIs are C, not C++, so I don't think our choices have much weight with them.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com>  +1-847-691-1404

Nicol Bolas

unread,
Aug 11, 2017, 2:11:15 PM8/11/17
to ISO C++ Standard - Future Proposals

We have to deal with the reality we have, not the reality we want. `string_view` exists primarily because it has become abundantly clear that the C++ world will not adopt a single, unified string type that everyone will use. So instead of pretending that we can force everyone to adopt `basic_string`, we instead shift towards allowing them to use their own string types, with a decent intermediary view type that can be used for intercommunication.

The C world is not going to adopt a string+size interface. Not uniformly. WG14 is not going to create a bunch of `printf` alternatives that take string+size rather than a NUL-terminated string. So we can either keep hoping that they will eventually see the light, or we can bend to reality and work with what we've got.

I prefer accepting reality.

Thiago Macieira

unread,
Aug 11, 2017, 2:12:42 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 10:01:28 PDT Nicol Bolas wrote:
> `zstring_view` is conceptually a `string_view`. This means that it is a
> pointer+size, so `size()` is always O(1). But this also means that if you
> try to put a literal in a `zstring_view`, it will have to invoke
> `char_traits::length` to get the length. This is good if you're actually
> going to use begin/end iterator pairs. But it's painful if you're just
> doing one-time forward iteration.
>
> `cstring_view` is just a `const char*` with a nice wrapper. It is just a
> pointer, which means `size()` is always O(n). But this also means that
> sticking a literal in it costs nothing. But if you try to use it with
> begin/end iterator pairs, you provoke an O(n) operation.

int main(std::initializer_list<std::cstring_view>)

can be accomplished with a simple instruction inserted by the compiler at the
beginning of main(), whereas

int main(std::initializer_list<std::zstring_view>)

would be O(n) and require O(n) storage for n == argc.

Thiago Macieira

unread,
Aug 11, 2017, 2:13:53 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 11:02:14 PDT 'Jeffrey Yasskin' via ISO C+
+ Standard - Future Proposals wrote:
> The standard's choices have an effect on how APIs evolve. If we think
> folks should be migrating away from NUL-termination, omitting a
> standard mechanism to deal with those APIs creates pressure to fix
> them.

The standard's choices mean squat for those who are not using the standard.
Like all the C developers.

(for some of them, it could be even reverse psychology)

Olaf van der Spek

unread,
Aug 11, 2017, 2:14:02 PM8/11/17
to Niall Douglas, ISO C++ Standard - Future Proposals
2017-08-11 19:10 GMT+02:00 Niall Douglas <nialldo...@gmail.com>:
>> > The requirement in order to provide defined behaviour that the character
>> > after the view must be readable is publicly documented.
>>
>> Ouch. IMO that's not a reasonable requirement / interface.
>
>
> Maybe not for string_view.
>
> For a path_view, definitely yes. When have you ever sent a range of bytes to
> a filesystem path API where accessing the character off the end of the last
> non-zero character isn't legal?

I don't think I have, but if the argument is stored in a container
that's not using sentinels.. then I think it'd be trivially possible
for that situation to occur.

>> What could the committee do about this?
>>
> I've always felt that string_view could store whether the input at
> construction is known for a fact to be zero terminated. Top bit of the
> length would make sense. Maximum view size would then be SIZE_T_MAX>>1.
>
> There is also a case that most of the string_view constructors could be
> allowed to probe for zero termination after the view end, but at least one
> constructor would be guaranteed to never do so.
>
> Lots of options there. Finally, you could just leave string_view alone, and
> adopt the afio::path_view instead. Filesystem paths are not like other zero
> terminated strings sent to the kernel. They are treated as blobs of
> undifferentiated bytes apart from zero and '/'.

But then you'd have the unreasonable requirement..


--
Olaf

Thiago Macieira

unread,
Aug 11, 2017, 2:19:50 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 10:17:32 PDT Niall Douglas wrote:
> It is legal to supply strings containing one or many zero characters, they
> are a legal filename. This causes much fun with Win32, and is a classic way
> for virus authors to create undeleteable files.

The transition from zero-terminated and explicit length is a source of many
bugs and security issues. I can think of more than one such attack in the last
10 years.

Like X.509 certificates for "google.com\000.attacker.com". Or URLs with %00 in
them.

Ville Voutilainen

unread,
Aug 11, 2017, 2:21:27 PM8/11/17
to ISO C++ Standard - Future Proposals
On 11 August 2017 at 21:13, Thiago Macieira <thi...@macieira.org> wrote:
> On sexta-feira, 11 de agosto de 2017 11:02:14 PDT 'Jeffrey Yasskin' via ISO C+
> + Standard - Future Proposals wrote:
>> The standard's choices have an effect on how APIs evolve. If we think
>> folks should be migrating away from NUL-termination, omitting a
>> standard mechanism to deal with those APIs creates pressure to fix
>> them.
>
> The standard's choices mean squat for those who are not using the standard.
> Like all the C developers.

It might be plausible for some C developers to add ptr+length
functions in order to play nice
with a string_view, or to play nice with non-terminated substrings.
Whether that set of developers
includes WG14 or any particular group is another matter. But having
said that, "playing nice with string_view"
is a poor reason not to add a view-type that requires zero-termination.

> (for some of them, it could be even reverse psychology)

Well, yes, in case we'd want to say "we are never going to add a
termination-requiring view, please amend
all your interfaces". The rest of it may still end up being reverse
psychology, but we cannot hold ourselves
hostage to people who have already decided not to listen to anything we say. :)

Jeffrey Yasskin

unread,
Aug 11, 2017, 2:22:10 PM8/11/17
to std-pr...@isocpp.org
On Fri, Aug 11, 2017 at 11:06 AM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On Fri, Aug 11, 2017 at 1:02 PM, 'Jeffrey Yasskin' via ISO C++ Standard -
> Future Proposals <std-pr...@isocpp.org> wrote:
>>
>> The standard's choices have an effect on how APIs evolve. If we think
>> folks should be migrating away from NUL-termination, omitting a
>> standard mechanism to deal with those APIs creates pressure to fix
>> them.
>
>
> Many of those APIs are C, not C++, so I don't think our choices have much
> weight with them.

It's possible that the folks designing C APIs are all actively hostile
to C++ users, but I think a lot of them do it for the ABI stability,
and actually want to accommodate callers from other languages.

But I don't have veto power over what goes into C++. If folks want to
propose a {z,c}string_view, we'll see how the committee feels about
it.

Jeffrey

Ville Voutilainen

unread,
Aug 11, 2017, 2:28:09 PM8/11/17
to ISO C++ Standard - Future Proposals
On 11 August 2017 at 21:21, 'Jeffrey Yasskin' via ISO C++ Standard -
Future Proposals <std-pr...@isocpp.org> wrote:
> It's possible that the folks designing C APIs are all actively hostile

Probably not. Some of them are too busy to think all that much about
how to wrap their APIs into a higher-level language
of any kind, some just don't care for other reasons, some might not
really know how to do it optimally. None of those categories
are actively hostile.

> to C++ users, but I think a lot of them do it for the ABI stability,
> and actually want to accommodate callers from other languages.

Yep.

> But I don't have veto power over what goes into C++. If folks want to
> propose a {z,c}string_view, we'll see how the committee feels about
> it.


It's a facility that's obviously useful, but whether it's useful
enough to pass the committee scrutiny is hard to predict.
And that also depends on what that type does, exactly, like whether
and where/how it really requires zero-termination.

Nicol Bolas

unread,
Aug 11, 2017, 3:09:18 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 2:12:42 PM UTC-4, Thiago Macieira wrote:
On sexta-feira, 11 de agosto de 2017 10:01:28 PDT Nicol Bolas wrote:
> `zstring_view` is conceptually a `string_view`. This means that it is a
> pointer+size, so `size()` is always O(1). But this also means that if you
> try to put a literal in a `zstring_view`, it will have to invoke
> `char_traits::length` to get the length. This is good if you're actually
> going to use begin/end iterator pairs. But it's painful if you're just
> doing one-time forward iteration.
>
> `cstring_view` is just a `const char*` with a nice wrapper. It is just a
> pointer, which means `size()` is always O(n). But this also means that
> sticking a literal in it costs nothing. But if you try to use it with
> begin/end iterator pairs, you provoke an O(n) operation.

int main(std::initializer_list<std::cstring_view>)

can be accomplished with a simple instruction inserted by the compiler at the
beginning of main(),

How, exactly? `cstring_view` would not be required to be implemented as identical to `const char*`, after all. So there's no way to could guarantee that the `argv` array would be equivalent to that.

Also, it's not exactly a motivational use case for the type.

Thiago Macieira

unread,
Aug 11, 2017, 3:20:56 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 12:09:18 PDT Nicol Bolas wrote:
> > int main(std::initializer_list<std::cstring_view>)
> >
> > can be accomplished with a simple instruction inserted by the compiler at
> > the
> > beginning of main(),
>
> How, exactly? `cstring_view` would not be *required* to be implemented as
> identical to `const char*`, after all. So there's no way to could guarantee
> that the `argv` array would be equivalent to that.

"can be accomplished" depends of course on the implementation details. If it
is implemented as the simplest possibility (a simple const char*), then the
main function can be accomplished with a single instruction added.

> Also, it's not exactly a motivational use case for the type.

Indeed. There's also the point that main takes "char **" for a reason: both
the pointers and the arguments are modifiable. You can't do that with a view or
std::initializer_list.

Nevin Liber

unread,
Aug 11, 2017, 3:47:24 PM8/11/17
to std-pr...@isocpp.org
On Fri, Aug 11, 2017 at 12:01 PM, Nicol Bolas <jmck...@gmail.com> wrote:
`zstring_view` is conceptually a `string_view`. This means that it is a pointer+size, so `size()` is always O(1). But this also means that if you try to put a literal in a `zstring_view`, it will have to invoke `char_traits::length` to get the length. This is good if you're actually going to use begin/end iterator pairs. But it's painful if you're just doing one-time forward iteration.

Is it really that painful in practice?  I would expect a compiler to optimize the length call away for a zstring_view<char> taking a C-string literal.

Non-literals where you don't know the size are likely more of a problem, in that you might be turning a one-pass algorithm into a two-pass algorithm.

Nevin Liber

unread,
Aug 11, 2017, 4:00:22 PM8/11/17
to std-pr...@isocpp.org
On Fri, Aug 11, 2017 at 12:10 PM, Niall Douglas <nialldo...@gmail.com> wrote:
I've always felt that string_view could store whether the input at construction is known for a fact to be zero terminated. Top bit of the length would make sense. Maximum view size would then be SIZE_T_MAX>>1.

Ugh.  I've found that types which model "may or may not" (may or may not own, may or may not be zero terminated, etc.) are very clunky and error prone to use.  This is, for instance, one of the problems with raw pointers, as you don't know whether or not some cleanup has to be performed after you are done with it.

If you want to model something different than string_view, it should be a different type.

Thiago Macieira

unread,
Aug 11, 2017, 4:36:36 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 10:01:28 PDT Nicol Bolas wrote:
> `cstring_view` is just a `const char*` with a nice wrapper. It is just a
> pointer, which means `size()` is always O(n). But this also means that
> sticking a literal in it costs nothing. But if you try to use it with
> begin/end iterator pairs, you provoke an O(n) operation.

Actually, no. The end iterator can be a sentinel, such that the call to
cstring_view::end() is O(1).

It's std::distance(begin, end) that is O(n). And rbegin().

Nicol Bolas

unread,
Aug 11, 2017, 4:54:59 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 4:36:36 PM UTC-4, Thiago Macieira wrote:
On sexta-feira, 11 de agosto de 2017 10:01:28 PDT Nicol Bolas wrote:
> `cstring_view` is just a `const char*` with a nice wrapper. It is just a
> pointer, which means `size()` is always O(n). But this also means that
> sticking a literal in it costs nothing. But if you try to use it with
> begin/end iterator pairs, you provoke an O(n) operation.

Actually, no. The end iterator can be a sentinel, such that the call to
cstring_view::end() is O(1).

A fact I mentioned later in that post, but one that relies on Ranges TS and its range algorithms. In a pre-Ranges TS world, that's not an acceptable implementation, since all of our current algorithms are based on iterator pairs, not iterator/sentinels.

Niall Douglas

unread,
Aug 11, 2017, 5:05:45 PM8/11/17
to ISO C++ Standard - Future Proposals

Let me ask a thought provoking question.

If you exclude filesystem paths, what remaining use cases are there for zero terminated string views?

Literally any API not under your direct control which exclusively trafficks in NUL-terminated strings. It turns out there are a lot of those. You can claim that maybe they shouldn't exist, but since those APIs are out of your direct control, your desires mean squat.

I don't get how a zero terminated string view gives you any benefit there.

Just pass around null terminated const char * as the end consumer can only consume that anyway, and it won't accept lengths supplied by you. Knowing the valid extent gains you nothing over using a plain string_view.
 

I'd argue not enough to bother standardising them, but maybe I've missed something.

As for filesystem paths, they really are best implemented via a design identical to afio::path_view. That design is optimally efficient on all platforms.

Um... why? We already have a path type. And that path type is just a string (of an implementation-defined type). So you can already get an appropriate view for it. What's the point of having yet another view of a contiguous sequence of constant characters?

Oh, lot and lots.

Back during the Boost.Filesystem peer reviews, the correct design for path was contentious. One of the best designs not adopted was one where the underlying storage is type erased and a UTF-8 appearance is presented publicly. This came with the big benefit that portable Filesystem code doesn't need #ifdefing string literals for non-ASCII characters. The current design where path wraps some string implementation which varies was the one which won out in the end, and given all the competing factors and tradeoffs, I agree it was the least worst choice.

However a path view has a very different set of tradeoffs, and the one which swung in favour of the Filesystem TS path design over the "Plan B" design - mutability - is no longer a problem. So we can adopt the "Plan B" underlying storage erasing design from the Boost peer review as the only "mutable" functions available (a subset of filesystem::path's) merely return new views of the original, retaining the type erasure of the underlying representation.

This enables path_view to very nicely complement and extend "Plan A" path with many of the conveniences that dropping the "Plan B" path design cost. So the dimorphic designs give the best of both worlds, and tick both sets of boxes. I'm very pleased with it, it took me a year of trying ideas out to decide on it as it was one of the hardest of the post-Boost-peer-review issues to tackle, but now it's decided I'm very sure it's the right choice.

The way you prepare a path_view for being fed to a syscall, in case you didn't read the API docs, is that you construct a path_view::c_str object like this:

path_view pv;
path_view::c_str zpath(pv);
open(zpath.buffer, ...);

The path_view::c_str structure is a PATH_MAX sized structure with additional buffer and length values. If the source view is zero terminated, that is used directly. If it is not, it is copied into the c_str and zero terminated. If on Windows and the underlying view source is UTF-8, a conversion to UTF-16 is done. It is perfectly minimum overhead, no malloc anywwhere, and can be used by end users identically on Windows and POSIX, no #ifdefing.

Niall

Thiago Macieira

unread,
Aug 11, 2017, 5:25:12 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 14:05:45 PDT Niall Douglas wrote:
> path_view pv;
> path_view::c_str zpath(pv);
> open(zpath.buffer, ...);
>
> The path_view::c_str structure is a PATH_MAX sized structure with
> additional buffer and length values. If the source view is zero terminated,
> that is used directly. If it is not, it is copied into the c_str and zero
> terminated. If on Windows and the underlying view source is UTF-8, a
> conversion to UTF-16 is done. It is perfectly minimum overhead, no malloc
> anywwhere, and can be used by end users identically on Windows and POSIX,
> no #ifdefing.

So you're saying that it uses 64kB of stack on Windows?

Nicol Bolas

unread,
Aug 11, 2017, 5:29:37 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 5:05:45 PM UTC-4, Niall Douglas wrote:

Let me ask a thought provoking question.

If you exclude filesystem paths, what remaining use cases are there for zero terminated string views?

Literally any API not under your direct control which exclusively trafficks in NUL-terminated strings. It turns out there are a lot of those. You can claim that maybe they shouldn't exist, but since those APIs are out of your direct control, your desires mean squat.

I don't get how a zero terminated string view gives you any benefit there.

It provides me the ability to write APIs that propagate the requirements I have to the caller. Remember: most string types really are NUL-terminated. Consider this case:

void some_func_fixed(const fixed_string<32> &str)
{
  needs_nul_terminated
(str.data());
}

void some_func_view(string_view sv)
{
  std
::string str(sv);
  needs_nul_terminated
(str.data());
};

fixed_string
<32> str;
some_func_fixed
(sv);
some_func_view
(sv);

As you can see, `fixed_string` is a NUL-terminated string, just like most C++ string types. And yet, if you try to call `some_func_view`, it makes a copy. Pointlessly.

`some_func_view` is flexible, since it can take any string you can get a string_view out of. But it is bad because it makes a copy from a source that is probably NUL-terminated.

`z/cstring_view` would give us the flexibility without the bad part.

Just pass around null terminated const char * as the end consumer can only consume that anyway, and it won't accept lengths supplied by you. Knowing the valid extent gains you nothing over using a plain string_view.
 

I'd argue not enough to bother standardising them, but maybe I've missed something.

As for filesystem paths, they really are best implemented via a design identical to afio::path_view. That design is optimally efficient on all platforms.

Um... why? We already have a path type. And that path type is just a string (of an implementation-defined type). So you can already get an appropriate view for it. What's the point of having yet another view of a contiguous sequence of constant characters?

Oh, lot and lots.

Back during the Boost.Filesystem peer reviews, the correct design for path was contentious. One of the best designs not adopted was one where the underlying storage is type erased and a UTF-8 appearance is presented publicly. This came with the big benefit that portable Filesystem code doesn't need #ifdefing string literals for non-ASCII characters. The current design where path wraps some string implementation which varies was the one which won out in the end, and given all the competing factors and tradeoffs, I agree it was the least worst choice.

However a path view has a very different set of tradeoffs, and the one which swung in favour of the Filesystem TS path design over the "Plan B" design - mutability - is no longer a problem. So we can adopt the "Plan B" underlying storage erasing design from the Boost peer review as the only "mutable" functions available (a subset of filesystem::path's) merely return new views of the original, retaining the type erasure of the underlying representation.

Or we can just have a single `filesystem::path` type, which you can get views of with existing types. I prefer that option.

Thiago Macieira

unread,
Aug 11, 2017, 5:56:08 PM8/11/17
to std-pr...@isocpp.org
On sexta-feira, 11 de agosto de 2017 14:29:37 PDT Nicol Bolas wrote:
> As you can see, `fixed_string` is a NUL-terminated string, just like most
> C++ string types. And yet, if you try to call `some_func_view`, it makes a
> copy. Pointlessly.
>
> `some_func_view` is flexible, since it can take any string you can get a
> string_view out of. But it is bad because it makes a copy from a source
> that is probably NUL-terminated.

Hence Niall'srequirement in his API that the next byte be readable. That way,
the code can check if it is NUL-terminated and avoid the allocation.

It's a requirement that very rarely fails.

And if you know the ABI, you can also check if the byte load can cause a fault
or not. I use that in my update to Qt's QString/QByteArray hasher, so it may
load up to 15 bytes after the end of your string (it's x86-specific).

Niall Douglas

unread,
Aug 11, 2017, 6:26:12 PM8/11/17
to ISO C++ Standard - Future Proposals

> The path_view::c_str structure is a PATH_MAX sized structure with
> additional buffer and length values. If the source view is zero terminated,
> that is used directly. If it is not, it is copied into the c_str and zero
> terminated. If on Windows and the underlying view source is UTF-8, a
> conversion to UTF-16 is done. It is perfectly minimum overhead, no malloc
> anywwhere, and can be used by end users identically on Windows and POSIX,
> no #ifdefing.

So you're saying that it uses 64kB of stack on Windows?

For now yes. And 32Kb on Linux and 1Kb on FreeBSD as those are their PATH_MAX.

If it turns out to be a problem down the line, we could shrink to 4Kb and use malloc if the input path is longer than that. For now I haven't encountered a problem, generally if you're calling a path consuming syscall, this is not an overhead.

Niall

Niall Douglas

unread,
Aug 11, 2017, 6:29:23 PM8/11/17
to ISO C++ Standard - Future Proposals

Or we can just have a single `filesystem::path` type, which you can get views of with existing types. I prefer that option.

As already covered in great detail by now, if fed a string_view then one is mandated to memcpy the lot so it can be zero terminated before sending it to the file path consuming syscall.

As much as open() is not a fast syscall, copying 32Kb needlessly makes a statistically noticeable difference. path_view eliminates that problem.

Niall
 

Niall Douglas

unread,
Aug 11, 2017, 6:38:14 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 10:56:08 PM UTC+1, Thiago Macieira wrote:
On sexta-feira, 11 de agosto de 2017 14:29:37 PDT Nicol Bolas wrote:
> As you can see, `fixed_string` is a NUL-terminated string, just like most
> C++ string types. And yet, if you try to call `some_func_view`, it makes a
> copy. Pointlessly.
>
> `some_func_view` is flexible, since it can take any string you can get a
> string_view out of. But it is bad because it makes a copy from a source
> that is probably NUL-terminated.

Hence Niall'srequirement in his API that the next byte be readable. That way,
the code can check if it is NUL-terminated and avoid the allocation.

It's a requirement that very rarely fails.

string_view already had to be extremely conservative because a string is so overloaded with meaning in C++. We store all sorts of things into a char array, often using std::string when std::vector<char> really would be more appropriate. It's entirely legitimate to do mmap(4Kb) and pass that through to other code via string_view. A span would be more appropriate, but that's not into the standard yet.

Filesystem paths on the other hand are much more specific. You'll never do mmap(4Kb) and wrap that into a path_view, or at least, if you did then wrap 4Kb-1 into a path_view and you'll be fine. But we can impose that requirement precisely and only because we know the content. With string_view you can't know what the content will mean, or form it will come in.

With your zstring_view I appreciate your pain, and I indeed did argue for null terminator support in string_view with Marshall back in the day. But that ship has sailed, and the value add of a special zstring_view seems very low to me when more specialised view-of-something-particular is the obvious path forwards.

BTW people have tried to persuade WG14 to replace zero terminated strings for as long as I can remember. Nobody has ever achieved it. Same goes for POSIX. It's not for a lack of sympathy to the problem, it's that those bodies only standardise existing practice where possible, and no implementation apart from Windows NT ironically enough has ever broken with null terminated strings. So it's a catch-22.

Niall

Nicol Bolas

unread,
Aug 11, 2017, 6:48:14 PM8/11/17
to ISO C++ Standard - Future Proposals
On Friday, August 11, 2017 at 6:29:23 PM UTC-4, Niall Douglas wrote:

Or we can just have a single `filesystem::path` type, which you can get views of with existing types. I prefer that option.

As already covered in great detail by now, if fed a string_view then one is mandated to memcpy the lot so it can be zero terminated before sending it to the file path consuming syscall.

A problem easily solved with the types we're talking about. We don't need a replacement for `filesystem::path`. We certainly don't need some 32KB type...

Nicol Bolas

unread,
Aug 11, 2017, 6:50:24 PM8/11/17
to ISO C++ Standard - Future Proposals


On Friday, August 11, 2017 at 5:56:08 PM UTC-4, Thiago Macieira wrote:
On sexta-feira, 11 de agosto de 2017 14:29:37 PDT Nicol Bolas wrote:
> As you can see, `fixed_string` is a NUL-terminated string, just like most
> C++ string types. And yet, if you try to call `some_func_view`, it makes a
> copy. Pointlessly.
>
> `some_func_view` is flexible, since it can take any string you can get a
> string_view out of. But it is bad because it makes a copy from a source
> that is probably NUL-terminated.

Hence Niall'srequirement in his API that the next byte be readable. That way,
the code can check if it is NUL-terminated and avoid the allocation.

It's a requirement that very rarely fails.

It's also not a requirement that's part of the type system. Thus making it even more fragile than an interface that just takes a naked `const char*`. At least there, there is a long-standing tradition of expecting them be NUL-terminated.

Niall Douglas

unread,
Aug 12, 2017, 11:23:44 AM8/12/17
to ISO C++ Standard - Future Proposals
I hate to be slightly rude, but you really don't know what you're talking about on this.

Try doing some benchmarking of the actual cost of a potentially unused 64Kb stack allocation in a routine which calls any path consuming syscall, and compare it to any other solution. Then feel free to retract your claim above.

Furthermore, you once again misrepresent what I said. Nobody is replacing filesystem::path. afio::path_view complements filesystem::path with a usefully orthogonal API design. AFIO itself switches between both path_view and path as appropriate to use case. So will any other code. That's why the design is so correct, path_view does not replace path, it augments it.

Niall
 

Nicol Bolas

unread,
Aug 12, 2017, 12:54:28 PM8/12/17
to ISO C++ Standard - Future Proposals
On Saturday, August 12, 2017 at 11:23:44 AM UTC-4, Niall Douglas wrote:
On Friday, August 11, 2017 at 11:48:14 PM UTC+1, Nicol Bolas wrote:
On Friday, August 11, 2017 at 6:29:23 PM UTC-4, Niall Douglas wrote:

Or we can just have a single `filesystem::path` type, which you can get views of with existing types. I prefer that option.

As already covered in great detail by now, if fed a string_view then one is mandated to memcpy the lot so it can be zero terminated before sending it to the file path consuming syscall.

A problem easily solved with the types we're talking about. We don't need a replacement for `filesystem::path`. We certainly don't need some 32KB type...

I hate to be slightly rude, but you really don't know what you're talking about on this.

Try doing some benchmarking of the actual cost of a potentially unused 64Kb stack allocation in a routine which calls any path consuming syscall, and compare it to any other solution.

I'm not sure how you can benchmark "free". Because `z/cstring_view` is exactly that. It costs nothing on the stack (OK, it costs one or two pointers), and it costs nothing at runtime.

Your example code was:


path_view pv;
path_view
::c_str zpath(pv);
open
(zpath.buffer, ...);

Where `zpath` is apparently a MAX_PATH-sized object. My counter-example is:

path pv;
auto zpath = pv.zstring_view();
open
(zpath.data(), ...);

You can't get cheaper than free, so I'd say it compares quite well with a 64Kb stack allocation.

And considering that `path_view` is useless to non-pathing APIs (string formatting, talking to other NUL-terminating APIs, etc), while `z/cstring_view` would be useful for them, we have two options:

1) Create a narrow solution that takes up a lot of stack space.
2) Create a broad solution that takes up virtually no stack space.

If `path_view` has a purpose beyond solving this problem, that's fine. But we should not disregard the general utility of `z/cstring_view` just because `path_view` is around.

To the degree that it actually will be around, of course.

Thiago Macieira

unread,
Aug 12, 2017, 1:15:27 PM8/12/17
to std-pr...@isocpp.org
On sábado, 12 de agosto de 2017 09:54:28 PDT Nicol Bolas wrote:
> path pv;
> auto zpath = pv.zstring_view();
> open(zpath.data(), ...);
>
> You can't get cheaper than *free*, so I'd say it compares quite well with a
> 64Kb stack allocation.

The problem is that you can't call this function with an input coming from
std::string_view. You need to modify the upper API to use std::zstring_view.

Niall Douglas

unread,
Aug 12, 2017, 6:59:51 PM8/12/17
to ISO C++ Standard - Future Proposals
On Saturday, August 12, 2017 at 6:15:27 PM UTC+1, Thiago Macieira wrote:
On sábado, 12 de agosto de 2017 09:54:28 PDT Nicol Bolas wrote:
> path pv;
> auto zpath = pv.zstring_view();
> open(zpath.data(), ...);
>
> You can't get cheaper than *free*, so I'd say it compares quite well with a
> 64Kb stack allocation.

The problem is that you can't call this function with an input coming from
std::string_view. You need to modify the upper API to use std::zstring_view.

Spot on.

One very common operation which AFIO does is to split a path into leafname and root path. This is needed to implement race free filesystem on POSIX which unlike Windows doesn't let you rename or unlink via fd, so you need to fetch the fd's current path, split it into two views, open its containing directory, look up the leafname and compare st_dev and st_ino so you know that the fd to the containing directory is the right one to use as a base for subsequent operations.

Because the filesystem can permute randomly at any time, this operation is in a loop, and with a potential 32Kb long path one wants to avoid copying it frequently. Yet the root path, if working exclusively with string_view, would always be not zero terminated, and thus you'd always have to copy it. What we can do here is to poke a zero character in between the root path and the leafname, thus afio::path_view finds a null termination and skips the copy.

You surely will say "just use zstring_view", but then I'd need to duplicate my public APIs with internal ones just so I can call them with a zstring_view, which is silly. Much better that my internal code uses the exact same public API as the end users do.

Niall

Reply all
Reply to author
Forward
0 new messages