Comments string_view: : Conversions to null terminated char*

725 views
Skip to first unread message

Matthew Fioravante

unread,
Dec 9, 2014, 5:26:48 PM12/9/14
to std-pr...@isocpp.org
string_view is not compatible with legacy C API's because its not null terminated. The solution is to make a copy either into a std::string or a fixed size buffer and then call the C function with that.
 
It would be nice to have a to_c_str() helper method which would handle the copying and null terminating correctly. This makes adding C compatibility a bit easier when we don't want to pay for a memory allocation with a std::string.
 
char* string_view::to_c_str(array_view<char> buf) {
 
auto l = std::min(length(), buf.size()-1);
  std
::memcpy(buf.data(), data(), l);
  buf
[l] = '\0';
 
return buf.data();
}

Sean Middleditch

unread,
Dec 9, 2014, 7:57:59 PM12/9/14
to std-pr...@isocpp.org
On Tuesday, December 9, 2014 2:26:48 PM UTC-8, Matthew Fioravante wrote:
string_view is not compatible with legacy C API's because its not null terminated. The solution is to make a copy either into a std::string or a fixed size buffer and then call the C function with that.

That is usually a bummer. I've filed bugs against C libraries that don't have an explicitly-sized string interface. Naked calls to system APIs (POSIX, Win32, etc.) are rare enough in my experience to not be worth over-engineering solutions for them.
 
 
It would be nice to have a to_c_str() helper method which would handle the copying and null terminating correctly. This makes adding C compatibility a bit easier when we don't want to pay for a memory allocation with a std::string.

My concern is that this returns an owning raw pointer. In a time when we're trying to tell everyone to stop doing that and to use unique_ptr or whatnot instead.

Matthew Fioravante

unread,
Dec 9, 2014, 8:40:54 PM12/9/14
to std-pr...@isocpp.org


On Tuesday, December 9, 2014 7:57:59 PM UTC-5, Sean Middleditch wrote:
On Tuesday, December 9, 2014 2:26:48 PM UTC-8, Matthew Fioravante wrote:
string_view is not compatible with legacy C API's because its not null terminated. The solution is to make a copy either into a std::string or a fixed size buffer and then call the C function with that.

That is usually a bummer. I've filed bugs against C libraries that don't have an explicitly-sized string interface. Naked calls to system APIs (POSIX, Win32, etc.) are rare enough in my experience to not be worth over-engineering solutions for them.
 

It depends on what C API's you're working with. POSIX is one example, fopen() from cstdio is another. Both of which I doubt will be convinced to adopt sized interfaces anytime soon if ever. C compatibility will at best be an issue for a long time and at worst be an issue forever (not unlikely at all).
 
 
It would be nice to have a to_c_str() helper method which would handle the copying and null terminating correctly. This makes adding C compatibility a bit easier when we don't want to pay for a memory allocation with a std::string.

My concern is that this returns an owning raw pointer. In a time when we're trying to tell everyone to stop doing that and to use unique_ptr or whatnot instead.

The returned pointer is not an owning raw pointer, and actually the return value is superfluous. It's only there because there isn't really anything else useful to return and returning a char* means that this method could be composed with C API functions directly.

This design is actually agnostic to the ownership of the buffer being written to. It can be on the stack (most likely use case), heap allocated, global, etc... All that's required is that the user construct an array_view to represent the memory region to be written to. The array_view parameter is also a non-owning pointer and length.

sque...@gmail.com

unread,
Dec 9, 2014, 11:12:20 PM12/9/14
to std-pr...@isocpp.org
On Wednesday, December 10, 2014 12:40:54 PM UTC+11, Matthew Fioravante wrote:
All that's required is that the user construct an array_view to represent the memory region to be written to.

If the user has to do some work anyway, they might as well just create a string from the string_view, e.g.:
void call_c_function(const std::string_view& sv) {
    std
::string s = sv.to_string();
    c_function
(s.c_str());
}

Or maybe even just:
c_function(sv.to_string().c_str());

(Assuming the C function doesn't keep the pointer after it returns)

Matthew Fioravante

unread,
Dec 10, 2014, 10:27:06 AM12/10/14
to std-pr...@isocpp.org, sque...@gmail.com


On Tuesday, December 9, 2014 11:12:20 PM UTC-5, sque...@gmail.com wrote:
On Wednesday, December 10, 2014 12:40:54 PM UTC+11, Matthew Fioravante wrote:
All that's required is that the user construct an array_view to represent the memory region to be written to.

If the user has to do some work anyway, they might as well just create a string from the string_view, e.g.:

I'm not sure people are actually understanding what I am proposing. Creating a std::string requires a memory allocation which may be unacceptable for performance, particularly when the string has a known upper bound on its length.

Here is one use case we have today where allocating memory for a std::string is totally unnecessary and would have a large impact on performance.

float strtol(string_view s, string_view& tail) {
 
char buf[256]; //Can make this as small as whatever is the maximum string representation for floating point + 1
  s
.to_c_str(buf);
 
char* e = buf;
 
auto f = strtof(buf, &e);
  tail
= s;
  tail
.remove_prefix(e - buf);
 
return f;
}

Here is another example, where you can save an allocation if your string is "small"

//From C API
int c_function(const char* s);

//string_view wrapper
int c_function(string_view s) {
  std
::string str;
 
char buf[4096];
 
char* p;
 
if(s.length() < sizeof(buf)-1) {
    p
= s.to_c_str(buf);
 
} else {
    str
= s;
    p
= str.c_str();
 
}
 
return c_function(p);
}




 

sque...@gmail.com

unread,
Dec 10, 2014, 4:43:22 PM12/10/14
to std-pr...@isocpp.org, sque...@gmail.com
On Thursday, December 11, 2014 2:27:06 AM UTC+11, Matthew Fioravante wrote:
On Tuesday, December 9, 2014 11:12:20 PM UTC-5, sque...@gmail.com wrote:
On Wednesday, December 10, 2014 12:40:54 PM UTC+11, Matthew Fioravante wrote:
All that's required is that the user construct an array_view to represent the memory region to be written to.

If the user has to do some work anyway, they might as well just create a string from the string_view, e.g.:

I'm not sure people are actually understanding what I am proposing. Creating a std::string requires a memory allocation which may be unacceptable for performance, particularly when the string has a known upper bound on its length.

Sorry, I missed your "when we don't want to pay for a memory allocation with a std::string" in your original post. Please ignore me.

Nevin Liber

unread,
Dec 10, 2014, 6:57:28 PM12/10/14
to std-pr...@isocpp.org
On 10 December 2014 at 09:27, Matthew Fioravante <fmatth...@gmail.com> wrote:
I'm not sure people are actually understanding what I am proposing. Creating a std::string requires a memory allocation which may be unacceptable for performance, particularly when the string has a known upper bound on its length.

That isn't sufficient.  You also have to know that you can write a '\0' into a byte both legally (i.e., the string_view has modifiable space underneath it) and semantically (doing so won't, say, incorrectly truncate the string for the purposes of calling a C function on it).

This micro optimization is fragile and error-prone.

If you really need this kind of performance, why not just create your own string_view type of class to do it, and have that class require the above preconditions?  I don't see why it needs to be standardized, let alone modifying string_view to support that functionality.  string_view doesn't allow you to modify bytes in the string it refers to; that's a feature, not a bug.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Sean Middleditch

unread,
Dec 10, 2014, 7:58:38 PM12/10/14
to std-pr...@isocpp.org, sque...@gmail.com

On Wednesday, December 10, 2014 7:27:06 AM UTC-8, Matthew Fioravante wrote:


On Tuesday, December 9, 2014 11:12:20 PM UTC-5, sque...@gmail.com wrote:
On Wednesday, December 10, 2014 12:40:54 PM UTC+11, Matthew Fioravante wrote:
All that's required is that the user construct an array_view to represent the memory region to be written to.

If the user has to do some work anyway, they might as well just create a string from the string_view, e.g.:

I'm not sure people are actually understanding what I am proposing. Creating a std::string requires a memory allocation which may be unacceptable for performance, particularly when the string has a known upper bound on its length.

Indeed. My brain read that `memcpy` in your first post as a `strdup` and ignored the whole purpose of the array_view. Stupid brain. Sorry.

We have an almost identical API to your proposal in our libraries, except it's a free function and called copy_string instead of c_str (well, in our company's convention, it's Strings::CopyInto).

I'm of the opinion that yes, this is /just/ useful enough (and small/simple enough) to have in the library, but only just. C APIs should be fixed and those that can't be are mostly all system APIs doing some kind of I/O or whatnot where the allocation of a std::string is thoroughly trumped by the cost of the syscall.

An alternative design might throw an exception (*shudder* I can't believe I just suggested that) if the buffer is too small if one takes the "silently truncating strings is a potential gaping security hole" mindset.

Matthew Fioravante

unread,
Dec 10, 2014, 8:04:19 PM12/10/14
to std-pr...@isocpp.org


On Wednesday, December 10, 2014 6:57:28 PM UTC-5, Nevin ":-)" Liber wrote:
On 10 December 2014 at 09:27, Matthew Fioravante <fmatth...@gmail.com> wrote:
I'm not sure people are actually understanding what I am proposing. Creating a std::string requires a memory allocation which may be unacceptable for performance, particularly when the string has a known upper bound on its length.

That isn't sufficient.  You also have to know that you can write a '\0' into a byte both legally (i.e., the string_view has modifiable space underneath it)

We aren't writing to the bytes pointed to by the string_view. We're copying the data from the string_view to another buffer, null terminating it, and passing it to the c function. We cannot write to the data pointed to by the view because first the view is const and second we cannot write a 0 to the byte after the view. Take a look at the examples again.
 
and semantically (doing so won't, say, incorrectly truncate the string for the purposes of calling a C function on it).

If you want to avoid a heap allocation you are forced to accept truncation. In some cases truncation is ok because you have an upper bound on the possible inputs the function accepts (strtof()). The second example shows how to get a fast copy on the stack for small enough strings and falling back to heap allocation for larger ones.


This micro optimization is fragile and error-prone.

Perhaps, but its more error prone to have to write the copy and null terminate logic yourself everytime. Granted, this use case is small enough that its probably ok to just tell people to deal with it themselves.

Nevin Liber

unread,
Dec 10, 2014, 9:38:48 PM12/10/14
to std-pr...@isocpp.org
On 10 December 2014 at 19:04, Matthew Fioravante <fmatth...@gmail.com> wrote:
We aren't writing to the bytes pointed to by the string_view. We're copying the data from the string_view to another buffer, null terminating it, and passing it to the c function. We cannot write to the data pointed to by the view because first the view is const and second we cannot write a 0 to the byte after the view. Take a look at the examples again.

Your implementation has something writing into the data referred to in an array_view, which isn't allowed last I checked.  Allowing modification of contents through an array_view is a bad idea for exactly the same reasons it is a bad idea to allow it in string_view.

The next problem with it is the std::min call, which makes it very difficult to reason about whether or not the resulting string is truncated.  This alone makes it almost as bad to reason about as using strncpy.

It also doesn't handle zero-length destination buffers, which is problematic if you want to fill up runtime sized buffers.

Generally, we prefer to return things and take advantage of RVO rather than pass them in and replace their contents.  This could be accomplished with, say, a templated size and return a std::array of that size (or throw if it isn't large enough), but I just don't find this kind of micro optimization necessary often enough to be worthwhile as a standard feature.

Matthew Fioravante

unread,
Dec 10, 2014, 10:11:18 PM12/10/14
to std-pr...@isocpp.org


On Wednesday, December 10, 2014 9:38:48 PM UTC-5, Nevin ":-)" Liber wrote:

On 10 December 2014 at 19:04, Matthew Fioravante <fmatth...@gmail.com> wrote:
We aren't writing to the bytes pointed to by the string_view. We're copying the data from the string_view to another buffer, null terminating it, and passing it to the c function. We cannot write to the data pointed to by the view because first the view is const and second we cannot write a 0 to the byte after the view. Take a look at the examples again.

Your implementation has something writing into the data referred to in an array_view, which isn't allowed last I checked.

array_view is not a const view like string_view and should not be. While you can have an array_view<const T> (or carray_view<T>), you don't always need or want it. An array_view essentially just encapsulates a pair of pointers, or if you like a pointer and a size. Instead of passing around 2 objects you pass around one. The use case I have here is a common idiom, you use the array_view to specify a window of memory with which to write to. This is much safer, easier to read / maintain then juggling 2 pointers or a pointer and a length. It's also more efficient than passing a std::vector by reference.

I get the feeling people really haven't tried playing with array_view very much in their own code to understand just how amazingly awesome it is. If you mostly code with contiguous data structures (i.e. prefer std::vector over almost everything else) then array_view is useful all over the place.
 
  Allowing modification of contents through an array_view is a bad idea for exactly the same reasons it is a bad idea to allow it in string_view.

 

The next problem with it is the std::min call, which makes it very difficult to reason about whether or not the resulting string is truncated.  This alone makes it almost as bad to reason about as using strncpy.

Truncation is part of contract of the function. If you don't want truncation check the length first or use std::string.
 

It also doesn't handle zero-length destination buffers, which is problematic if you want to fill up runtime sized buffers.

I suppose this is a bug, although I'd be tempted to change the definition of the method to be "undefined behavior" if a zero length buffer is used because a zero length buffer can never represent a null terminated string. 

Generally, we prefer to return things and take advantage of RVO rather than pass them in and replace their contents.  This could be accomplished with, say, a templated size and return a std::array of that size (or throw if it isn't large enough),

We aren't creating a new object here, instead we're filling up a pre-existing one. Using an array_view in/out parameter provides a lot more flexibility because you can write to any block of memory you like, whether its a whole array or a window into a subset of the array.

 
but I just don't find this kind of micro optimization necessary often enough to be worthwhile as a standard feature.

Fair enough
 

Olaf van der Spek

unread,
Dec 12, 2014, 8:57:52 AM12/12/14
to std-pr...@isocpp.org
On Thursday, December 11, 2014 4:11:18 AM UTC+1, Matthew Fioravante wrote:


On Wednesday, December 10, 2014 9:38:48 PM UTC-5, Nevin ":-)" Liber wrote:

On 10 December 2014 at 19:04, Matthew Fioravante <fmatth...@gmail.com> wrote:
We aren't writing to the bytes pointed to by the string_view. We're copying the data from the string_view to another buffer, null terminating it, and passing it to the c function. We cannot write to the data pointed to by the view because first the view is const and second we cannot write a 0 to the byte after the view. Take a look at the examples again.

Your implementation has something writing into the data referred to in an array_view, which isn't allowed last I checked.

array_view is not a const view like string_view and should not be.

Wasn't string_view recently made non-const?

Truncation is part of contract of the function. If you don't want truncation check the length first or use std::string.

Is actual truncation ever not a bug?
IMO you should abort / throw instead of truncate.

Olaf van der Spek

unread,
Dec 12, 2014, 8:59:34 AM12/12/14
to std-pr...@isocpp.org, sque...@gmail.com
On Wednesday, December 10, 2014 4:27:06 PM UTC+1, Matthew Fioravante wrote:
Here is one use case we have today where allocating memory for a std::string is totally unnecessary and would have a large impact on performance.

Right, though I'm expecting variants of these functions taking string_view. 

Ville Voutilainen

unread,
Dec 12, 2014, 9:02:45 AM12/12/14
to std-pr...@isocpp.org
On 12 December 2014 at 15:57, Olaf van der Spek <olafv...@gmail.com> wrote:
> Wasn't string_view recently made non-const?

I wonder what makes people think that's the case. In N4335 it
certainly isn't mutable.

Olaf van der Spek

unread,
Dec 12, 2014, 9:05:55 AM12/12/14
to std-pr...@isocpp.org

Ville Voutilainen

unread,
Dec 12, 2014, 9:08:43 AM12/12/14
to std-pr...@isocpp.org
On 12 December 2014 at 16:05, Olaf van der Spek <olafv...@gmail.com> wrote:
>> > Wasn't string_view recently made non-const?
>> I wonder what makes people think that's the case. In N4335 it
>> certainly isn't mutable.
> http://channel9.msdn.com/Shows/C9-GoingNative/GoingNative-32-Sneak-Preview-of-C17#time=12m04s


Yes, I suggest you read the comments.

Olaf van der Spek

unread,
Dec 12, 2014, 9:11:20 AM12/12/14
to std-pr...@isocpp.org
Ah, your comment wasn't there when I first viewed this.

I was about to make a comment about it myself but either didn't or it didn't pass moderation.


--

---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/LEPhIf_msXw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.


--
Olaf
Reply all
Reply to author
Forward
0 new messages