void open( const char *filename,
ios_base::openmode mode = ios_base::in|ios_base::out );
void open( const std::string &filename,
ios_base::openmode mode = ios_base::in|ios_base::out );
void open( const std::string_view &filename,
ios_base::openmode mode = ios_base::in|ios_base::out );
One objection is that the file name is supposed to be a nul-terminated
string. The string_view is not required to be that.
Bo Persson
Ah, I see. I definately agree on the notion that in this case, its better that I create a copy myself.
Is there any chance that low-level APIs will be moving away from nul-terminated strings in the future?
Having a lot of low-level functions require a nul-terminated string kind of dampends the overall usefulness of string_view IMHO.
At least from what I can tell after a few days worth of porting to string_view - at first I though it would be a direct replacement for most places where "const std::string&" was being used, but now it seems there's a lot more consideration to be done, not to end up with more temporaries/allocations than before.
I rather doubt it. It's hardly high on most people's priority lists.
Which is why I wrote my own `zstring_view` class, which explicitly represents a view of a NUL-terminated string (it treats the NUL-terminator the same way `std::string` does; you can look at it, but it's not explicitly part of the `begin/end` range). It has some of the operations of `string_view` (thanks to inheritance), but it removes or alters the ones that subdivide from the end. For obvious reasons.
template<size_t Size, typename Type
>
std::array<Type, Size> toAPI(std::base_string_view<Type> view) const
{
std::array<Type, Size> vArray;
view.copy(vArray.data(), Size);
vArray[view.size()] = '\0';
return vArray;
}
On Wednesday, 9 August 2017 03:13:50 PDT Julian Watzinger wrote:
> Quite unfortunate Its one thing to have a c-style API, which I
> personally dislike when compared to modern C++; but to have the API
> rrestrict the input-data in such a way... I'll still hope that at some
> point, now that we actually have string_view, people will start to see this
> as a necessity
Do you mean those C developers developing C API? Including people actively
hostile to C++, like Linus Torvalds?
When do you think they will see std::string_view as a necessity?
At least when calling fileIO/WinAPI functions usually there's a limit to how many characters can be in a string (ie. MAX_PATH = 260), so I'd call toAPI<MAX_PATH>(view), which doesn't invoke dynamic allocations and should thus be faster then creating a string (untested, hopefully the array gets affected by RVO/copy ellision).
What UB? The links don't talk about it.
Do you mean those C developers developing C API? Including people actively
hostile to C++, like Linus Torvalds?
When do you think they will see std::string_view as a necessity?
It's not about making them adopt `string_view` specifically. It's more about them taking string+length. Right now, those APIs require NUL-termination of strings.
However, being system calls, they also cost significantly more than
the cost of copying a path-length string, so don't be too afraid of
just doing that.
Recent Windows allow max path to reach 32K characters if a process opts into it.
Am Mittwoch, 9. August 2017 17:04:17 UTC+2 schrieb Thiago Macieira:Do you mean those C developers developing C API? Including people actively
hostile to C++, like Linus Torvalds?
When do you think they will see std::string_view as a necessity?
Well I said "hope", not "think" :> Still astounds me how anyone can really actively oppose C++ and support C at the same time, but... no, I have nothing.It's not about making them adopt `string_view` specifically. It's more about them taking string+length. Right now, those APIs require NUL-termination of strings.
Aye, thats the main points. Its one thing to have an API take (void*, size_t) for some array-manipulation, but to have (void*) and require the pointer to have a specific value at the end would be considered ludicrous, outside of strings OFC (yeah there's the historal reason but meh...).
It's funny that you mention that, because I actually know of a few APIs that do that. Namely... Lua (with luaL_Reg) and OpenGL (wgl/glXCreateContextAttribsARB). Yes, the same ones that don't require NUL-termination for their strings.
Arrays with sentinel values are not as uncommon as you think. They're used primarily on APIs where there is a genuine expectation that the receiving function is only going to look at each item in turn, and only going to look at it once. What they are not is the default case. Or the common case.
What UB? The links don't talk about it.
It's only a hint of UB, not actual UB. I've improved the docs with a red warning sign saying "the byte after the view must be readable".
On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek wrote:
> What's the proper design for such interfaces though?
> zstring_view seems best but doesn't exist.
> const char* wouldn't do any unnecessary allocations.
> Both string_view and string might do unnecessary allocations
zstring_view would be a nice wrapper for const char*. It basically *is* a
const char*, with a nice string_view-like API on top.
SG1 spent a lot of time on that when designing the C++11 threading
library, and couldn't find a way to make POSIX cancellation fit into
C++. You're welcome to spend more of your own time thinking about it,
but there's unlikely to be a fit.
> The requirement in order to provide defined behaviour that the character
> after the view must be readable is publicly documented.
Ouch. IMO that's not a reasonable requirement / interface.
> Failure by the end
> user to ensure this leads to UB, specifically, a segfault.
What could the committee do about this?
> What do kernels use internally? If they use ptr/size there might be a
> chance for a ptr/size interface, if they don't we're stuck with
> termination.
Null-terminated strings. Both the Linux and Darwin/FreeBSD kernels are written
in C, so that use is very widespread. I don't know what the Windows kernel is
written in, but considering its age I doubt it's anything besides C.
On Friday, August 11, 2017 at 12:39:28 PM UTC-4, Thiago Macieira wrote:On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek wrote:
> What's the proper design for such interfaces though?
On Friday, August 11, 2017 at 6:01:28 PM UTC+1, Nicol Bolas wrote:On Friday, August 11, 2017 at 12:39:28 PM UTC-4, Thiago Macieira wrote:On sexta-feira, 11 de agosto de 2017 02:20:30 PDT Olaf van der Spek wrote:
> What's the proper design for such interfaces though?Let me ask a thought provoking question.If you exclude filesystem paths, what remaining use cases are there for zero terminated string views?
I'd argue not enough to bother standardising them, but maybe I've missed something.As for filesystem paths, they really are best implemented via a design identical to afio::path_view. That design is optimally efficient on all platforms.
The standard's choices have an effect on how APIs evolve. If we think
folks should be migrating away from NUL-termination, omitting a
standard mechanism to deal with those APIs creates pressure to fix
them.
On sexta-feira, 11 de agosto de 2017 10:01:28 PDT Nicol Bolas wrote:
> `zstring_view` is conceptually a `string_view`. This means that it is a
> pointer+size, so `size()` is always O(1). But this also means that if you
> try to put a literal in a `zstring_view`, it will have to invoke
> `char_traits::length` to get the length. This is good if you're actually
> going to use begin/end iterator pairs. But it's painful if you're just
> doing one-time forward iteration.
>
> `cstring_view` is just a `const char*` with a nice wrapper. It is just a
> pointer, which means `size()` is always O(n). But this also means that
> sticking a literal in it costs nothing. But if you try to use it with
> begin/end iterator pairs, you provoke an O(n) operation.
int main(std::initializer_list<std::cstring_view>)
can be accomplished with a simple instruction inserted by the compiler at the
beginning of main(),
`zstring_view` is conceptually a `string_view`. This means that it is a pointer+size, so `size()` is always O(1). But this also means that if you try to put a literal in a `zstring_view`, it will have to invoke `char_traits::length` to get the length. This is good if you're actually going to use begin/end iterator pairs. But it's painful if you're just doing one-time forward iteration.
I've always felt that string_view could store whether the input at construction is known for a fact to be zero terminated. Top bit of the length would make sense. Maximum view size would then be SIZE_T_MAX>>1.
On sexta-feira, 11 de agosto de 2017 10:01:28 PDT Nicol Bolas wrote:
> `cstring_view` is just a `const char*` with a nice wrapper. It is just a
> pointer, which means `size()` is always O(n). But this also means that
> sticking a literal in it costs nothing. But if you try to use it with
> begin/end iterator pairs, you provoke an O(n) operation.
Actually, no. The end iterator can be a sentinel, such that the call to
cstring_view::end() is O(1).
Let me ask a thought provoking question.If you exclude filesystem paths, what remaining use cases are there for zero terminated string views?
Literally any API not under your direct control which exclusively trafficks in NUL-terminated strings. It turns out there are a lot of those. You can claim that maybe they shouldn't exist, but since those APIs are out of your direct control, your desires mean squat.
I'd argue not enough to bother standardising them, but maybe I've missed something.As for filesystem paths, they really are best implemented via a design identical to afio::path_view. That design is optimally efficient on all platforms.
Um... why? We already have a path type. And that path type is just a string (of an implementation-defined type). So you can already get an appropriate view for it. What's the point of having yet another view of a contiguous sequence of constant characters?
Let me ask a thought provoking question.If you exclude filesystem paths, what remaining use cases are there for zero terminated string views?
Literally any API not under your direct control which exclusively trafficks in NUL-terminated strings. It turns out there are a lot of those. You can claim that maybe they shouldn't exist, but since those APIs are out of your direct control, your desires mean squat.I don't get how a zero terminated string view gives you any benefit there.
void some_func_fixed(const fixed_string<32> &str)
{
needs_nul_terminated(str.data());
}
void some_func_view(string_view sv)
{
std::string str(sv);
needs_nul_terminated(str.data());
};
fixed_string<32> str;
some_func_fixed(sv);
some_func_view(sv);
Just pass around null terminated const char * as the end consumer can only consume that anyway, and it won't accept lengths supplied by you. Knowing the valid extent gains you nothing over using a plain string_view.I'd argue not enough to bother standardising them, but maybe I've missed something.As for filesystem paths, they really are best implemented via a design identical to afio::path_view. That design is optimally efficient on all platforms.
Um... why? We already have a path type. And that path type is just a string (of an implementation-defined type). So you can already get an appropriate view for it. What's the point of having yet another view of a contiguous sequence of constant characters?Oh, lot and lots.Back during the Boost.Filesystem peer reviews, the correct design for path was contentious. One of the best designs not adopted was one where the underlying storage is type erased and a UTF-8 appearance is presented publicly. This came with the big benefit that portable Filesystem code doesn't need #ifdefing string literals for non-ASCII characters. The current design where path wraps some string implementation which varies was the one which won out in the end, and given all the competing factors and tradeoffs, I agree it was the least worst choice.However a path view has a very different set of tradeoffs, and the one which swung in favour of the Filesystem TS path design over the "Plan B" design - mutability - is no longer a problem. So we can adopt the "Plan B" underlying storage erasing design from the Boost peer review as the only "mutable" functions available (a subset of filesystem::path's) merely return new views of the original, retaining the type erasure of the underlying representation.
> The path_view::c_str structure is a PATH_MAX sized structure with
> additional buffer and length values. If the source view is zero terminated,
> that is used directly. If it is not, it is copied into the c_str and zero
> terminated. If on Windows and the underlying view source is UTF-8, a
> conversion to UTF-16 is done. It is perfectly minimum overhead, no malloc
> anywwhere, and can be used by end users identically on Windows and POSIX,
> no #ifdefing.
So you're saying that it uses 64kB of stack on Windows?
Or we can just have a single `filesystem::path` type, which you can get views of with existing types. I prefer that option.
On sexta-feira, 11 de agosto de 2017 14:29:37 PDT Nicol Bolas wrote:
> As you can see, `fixed_string` is a NUL-terminated string, just like most
> C++ string types. And yet, if you try to call `some_func_view`, it makes a
> copy. Pointlessly.
>
> `some_func_view` is flexible, since it can take any string you can get a
> string_view out of. But it is bad because it makes a copy from a source
> that is probably NUL-terminated.
Hence Niall'srequirement in his API that the next byte be readable. That way,
the code can check if it is NUL-terminated and avoid the allocation.
It's a requirement that very rarely fails.
Or we can just have a single `filesystem::path` type, which you can get views of with existing types. I prefer that option.As already covered in great detail by now, if fed a string_view then one is mandated to memcpy the lot so it can be zero terminated before sending it to the file path consuming syscall.
On sexta-feira, 11 de agosto de 2017 14:29:37 PDT Nicol Bolas wrote:
> As you can see, `fixed_string` is a NUL-terminated string, just like most
> C++ string types. And yet, if you try to call `some_func_view`, it makes a
> copy. Pointlessly.
>
> `some_func_view` is flexible, since it can take any string you can get a
> string_view out of. But it is bad because it makes a copy from a source
> that is probably NUL-terminated.
Hence Niall'srequirement in his API that the next byte be readable. That way,
the code can check if it is NUL-terminated and avoid the allocation.
It's a requirement that very rarely fails.
On Friday, August 11, 2017 at 11:48:14 PM UTC+1, Nicol Bolas wrote:On Friday, August 11, 2017 at 6:29:23 PM UTC-4, Niall Douglas wrote:
Or we can just have a single `filesystem::path` type, which you can get views of with existing types. I prefer that option.As already covered in great detail by now, if fed a string_view then one is mandated to memcpy the lot so it can be zero terminated before sending it to the file path consuming syscall.
A problem easily solved with the types we're talking about. We don't need a replacement for `filesystem::path`. We certainly don't need some 32KB type...I hate to be slightly rude, but you really don't know what you're talking about on this.Try doing some benchmarking of the actual cost of a potentially unused 64Kb stack allocation in a routine which calls any path consuming syscall, and compare it to any other solution.
path_view pv;
path_view::c_str zpath(pv);
open(zpath.buffer, ...);
path pv;
auto zpath = pv.zstring_view();
open(zpath.data(), ...);
On sábado, 12 de agosto de 2017 09:54:28 PDT Nicol Bolas wrote:
> path pv;
> auto zpath = pv.zstring_view();
> open(zpath.data(), ...);
>
> You can't get cheaper than *free*, so I'd say it compares quite well with a
> 64Kb stack allocation.
The problem is that you can't call this function with an input coming from
std::string_view. You need to modify the upper API to use std::zstring_view.