Use cases for extended string_view types

266 views
Skip to first unread message

Matthew Fioravante

unread,
May 14, 2015, 9:32:38 PM5/14/15
to std-pr...@isocpp.org
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.

mstring_view: a mutable string_view


array
<mstring_view,kMaxSplits+1> split_store;

auto p = parser(filename);
array_view
<mstring_view> s = p.splitNextLine(split_store,',');

In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().

Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.

zstring_view: a string_view (O(1) length()) which is guaranteed to be null terminated
zmstring-view: an mstring_view (O(1) length()) which is guaranteed to be null terminated

Unfortunately, I think legacy C API's are here to stay for a long time. Especially operating system API's like posix. There are some cases when we know the underlying string data is null terminated and we can use this fact to interact easily and efficiently with C API's. zstring_view is more limited in that there is no substr() operation. 

//We throw away the null termination invariant, even though it is guaranteed to exist.
constexpr string_view foo = "foo"sv;
//We retain the null termination invariant and can take advantage of it.
constexpr zstring_view bar = "bar"zsv;


constexpr zstring_view kLibraryPath = "./path/libfoo.so"zsv;
void *dl = dlopen(kLibraryPath.c_str());

This can also be used with my earlier example.

array<zmstring_view,kMaxSplits+1> split_store();
array_view
<zmstring_view> s = p.splitNextLine(split_store,',');

The parsing algorithm here can replace all instances of ',' with '\0' and thus give us null terminated strings we can directly pass to C APIs.

zstring_ptr: a thin type wrapper around const char* (zstring_ptr::strlen() for length)
zmstring_ptr: a thin type wrapper around char* (zstring_ptr::strlen() for length)

This one has more uses than just a C API 
Storing a string as a single null terminated pointer is more compact than storing a pointer and a length. If size is important, this can cut the memory usage of a data structure storing string_views in half. While we can currently achieve this with const char* and char*, but they are conflated with pointers which causes problems and ambiguities. 

HashMap<const char*,IntId>; //Hashes address
HashMap<zstring_ptr,IntId>; //Hashes string value, using only sizeof(char*) bytes for each key.

bool operator==(const char*, const char*); //Compares addresses, resulting in many surprises for novices
bool operator==(zstring_ptr l,zstring_ptr> r) //Compares values using strcmp()
bool operator==(zstring_ptr l,string_view r) //Compares values


Would you use these types in your projects?

Jeffrey Yasskin

unread,
May 14, 2015, 9:39:23 PM5/14/15
to std-pr...@isocpp.org
On Thu, May 14, 2015 at 6:32 PM, Matthew Fioravante <fmatth...@gmail.com> wrote:
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.

mstring_view: a mutable string_view


array
<mstring_view,kMaxSplits+1> split_store;

auto p = parser(filename);
array_view
<mstring_view> s = p.splitNextLine(split_store,',');

In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().

Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.

Can you give some examples of these transformations? I see the example of replacing delimiters with '\0' to allow use of functions that expect a null-terminated strings. What else do you use it for? Do you have a link to any code that does this?

Thanks,
Jeffrey

Matthew Fioravante

unread,
May 14, 2015, 10:08:51 PM5/14/15
to std-pr...@isocpp.org


On Thursday, May 14, 2015 at 9:39:23 PM UTC-4, Jeffrey Yasskin wrote:
On Thu, May 14, 2015 at 6:32 PM, Matthew Fioravante <fmatth...@gmail.com> wrote:
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.

mstring_view: a mutable string_view


array
<mstring_view,kMaxSplits+1> split_store;

auto p = parser(filename);
array_view
<mstring_view> s = p.splitNextLine(split_store,',');

In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().

Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.

Can you give some examples of these transformations?

One of the most common transformations is to_upper() or to_lower() if you want to do case insensitive comparison. If you need to compare the data many times, you can transform it in place once and the compiler can use the optimal bulk simd / streaming instructions to compare the strings.

Another is that right now we still don't have a string_view compatible strtod() and friends. You can use zstring_view and strtod() to quickly parse comma separated numbers.

Do you have a link to any code that does this?

I don't have links to any public code which does this kind of thing but I've used it before with considerable speed ups over the conventional getline() solution. I'm not sure if any major open source projects are using these techniques for their parsing code.

Nicol Bolas

unread,
May 14, 2015, 10:45:45 PM5/14/15
to std-pr...@isocpp.org
On Thursday, May 14, 2015 at 9:32:38 PM UTC-4, Matthew Fioravante wrote:
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.

mstring_view: a mutable string_view


array
<mstring_view,kMaxSplits+1> split_store;

auto p = parser(filename);
array_view
<mstring_view> s = p.splitNextLine(split_store,',');

In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().

Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.

As a bikeshed point, if it's a "view", then it's not mutable. That's why the word "view" was chosen; you can look, but not touch.

Furthermore, I see no need for a class specifically for this. It seems to me that the range proposal will cover this circumstance well enough. What you have is a range of iterators, when the iterators are `const char*`. The reason that `string_view` exists separately from that proposal is because it's an exceptionally common case, one that has certain special needs.

If you're dealing with parsing a mutable character buffer, then you're probably dealing with iterators already (or something close enough). So ranges would only be par for the course.

zstring_view: a string_view (O(1) length()) which is guaranteed to be null terminated
zmstring-view: an mstring_view (O(1) length()) which is guaranteed to be null terminated

Well, there are all the issues I pointed out on the previous `zstring_view` thread, which are not addressed here.
 
Unfortunately, I think legacy C API's are here to stay for a long time. Especially operating system API's like posix. There are some cases when we know the underlying string data is null terminated and we can use this fact to interact easily and efficiently with C API's. zstring_view is more limited in that there is no substr() operation. 

//We throw away the null termination invariant, even though it is guaranteed to exist.
constexpr string_view foo = "foo"sv;
//We retain the null termination invariant and can take advantage of it.
constexpr zstring_view bar = "bar"zsv;


constexpr zstring_view kLibraryPath = "./path/libfoo.so"zsv;
void *dl = dlopen(kLibraryPath.c_str());


While this does show a more useful use-case than the previous thread, it's a use case that really only matters when dealing with string literals. And while there are quite a few string literals in code, there are many more strings in code (particularly internationalized codebases) that aren't literals. Which means that they're going to be stored in a std::string (already null-terminated) or some other similar object that can also already be null-terminated.

zstring_ptr: a thin type wrapper around const char* (zstring_ptr::strlen() for length)
zmstring_ptr: a thin type wrapper around char* (zstring_ptr::strlen() for length)

This one has more uses than just a C API 
Storing a string as a single null terminated pointer is more compact than storing a pointer and a length. If size is important, this can cut the memory usage of a data structure storing string_views in half.

The only way that would be true is if the only thing in that data structure was a `string_view` (or array thereof). And in that case, I'd much rather we were clear that it's just storing `const char*`s by making the data structure just store `const char*`s. At least then we'd know what we're dealing with.

Remember: the standard should not require a specific implementation. It shouldn't say things like the size of a type must be no greater than the size of a `char*`. And without that guarantee in the standard, we can't assume implementations will implement it.

Lastly, this really isn't what the standard library is for. If a particular user has need for such a thing, then let them write it. But the standard library doesn't have to cover everyone's usage scenarios.

Matthew Fioravante

unread,
May 14, 2015, 10:51:01 PM5/14/15
to std-pr...@isocpp.org


On Thursday, May 14, 2015 at 10:45:45 PM UTC-4, Nicol Bolas wrote:
On Thursday, May 14, 2015 at 9:32:38 PM UTC-4, Matthew Fioravante wrote:
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.

mstring_view: a mutable string_view


array
<mstring_view,kMaxSplits+1> split_store;

auto p = parser(filename);
array_view
<mstring_view> s = p.splitNextLine(split_store,',');

In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().

Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.

As a bikeshed point, if it's a "view", then it's not mutable. That's why the word "view" was chosen; you can look, but not touch.

I disagree, an array_view for example is mutable by default. string_view is immutable by default because its the most useful default use case. A view is a view over the data. 

This is similar but reversed from array_view where we have array_view<T> for a read or write view over the data and carray_view<T> (array_view<const T>) for a read only view of the data. 

Nicol Bolas

unread,
May 14, 2015, 10:54:01 PM5/14/15
to std-pr...@isocpp.org


On Thursday, May 14, 2015 at 10:08:51 PM UTC-4, Matthew Fioravante wrote:
Another is that right now we still don't have a string_view compatible strtod() and friends. You can use zstring_view and strtod() to quickly parse comma separated numbers.

Or, you could use std::stod, once they update the functions to use them. Maybe that would be a good proposal to make: get other functions to actually take std::string_view as well as std::string.

It makes more sense to make C++ proposals to help C++ be more compatible with itself than to help C++ be more compatible with C.

Jeffrey Yasskin

unread,
May 15, 2015, 3:10:58 AM5/15/15
to std-pr...@isocpp.org
On Thu, May 14, 2015 at 7:08 PM, Matthew Fioravante <fmatth...@gmail.com> wrote:


On Thursday, May 14, 2015 at 9:39:23 PM UTC-4, Jeffrey Yasskin wrote:
On Thu, May 14, 2015 at 6:32 PM, Matthew Fioravante <fmatth...@gmail.com> wrote:
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.

mstring_view: a mutable string_view


array
<mstring_view,kMaxSplits+1> split_store;

auto p = parser(filename);
array_view
<mstring_view> s = p.splitNextLine(split_store,',');

In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().

Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.

Can you give some examples of these transformations?

One of the most common transformations is to_upper() or to_lower() if you want to do case insensitive comparison. If you need to compare the data many times, you can transform it in place once and the compiler can use the optimal bulk simd / streaming instructions to compare the strings.

Code-unit-wise to_upper() and to_lower() give the wrong answer, except for English-only text, which we shouldn't be designing for anymore. Collation keys are also generally a different length than their source string.

 
Another is that right now we still don't have a string_view compatible strtod() and friends. You can use zstring_view and strtod() to quickly parse comma separated numbers.

We'll get these; we just need someone to write up the proposal. I suspect the right signatures are actually something like

optional<double> consume_double(string_view& str);

And you'd check str.empty() if you want the double to occupy the whole string. But that's something the paper author should double-check.

Do you have a link to any code that does this?

I don't have links to any public code which does this kind of thing but I've used it before with considerable speed ups over the conventional getline() solution. I'm not sure if any major open source projects are using these techniques for their parsing code.

I definitely believe that splitting strings into string_views is faster; it's just the mutation I'm skeptical of. (Not because it's called a "view". You're right that array_views are/should be mutable by default.)

Jeffrey

Olaf van der Spek

unread,
May 15, 2015, 7:53:37 AM5/15/15
to std-pr...@isocpp.org


Op vrijdag 15 mei 2015 04:08:51 UTC+2 schreef Matthew Fioravante:
One of the most common transformations is to_upper() or to_lower() if you want to do case insensitive comparison. If you need to compare the data many times, you can transform it in place once and the compiler can use the optimal bulk simd / streaming instructions to compare the strings.

Don't sizes change with UTF-8 to-upper/to-lower?
 
Another is that right now we still don't have a string_view compatible strtod() and friends. You can use zstring_view and strtod() to quickly parse comma separated numbers.

string_view variants of strtod-like functions should be introduced, no need for zstring_view here..

Douglas Boffey

unread,
May 15, 2015, 4:59:56 PM5/15/15
to std-pr...@isocpp.org
> Don't sizes change with UTF-8 to-upper/to-lower?

Can you give an example where a letter has a different length between upper and lower case, when UTF8 encoded?


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Jeffrey Yasskin

unread,
May 15, 2015, 5:08:49 PM5/15/15
to std-pr...@isocpp.org
On Fri, May 15, 2015 at 1:59 PM, Douglas Boffey
<douglas...@gmail.com> wrote:
>> Don't sizes change with UTF-8 to-upper/to-lower?
>
> Can you give an example where a letter has a different length between upper
> and lower case, when UTF8 encoded?

ftp://ftp.unicode.org/Public/UCD/latest/ucd/SpecialCasing.txt has a
bunch of examples of the *number of characters* changing when they
switch between upper, lower, and title case.

ftp://ftp.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt also
includes at least

0130;LATIN CAPITAL LETTER I WITH DOT ABOVE;Lu;0;L;0049 0307;;;;N;LATIN
CAPITAL LETTER I DOT;;;0069;
0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049

Matthew Fioravante

unread,
May 16, 2015, 12:20:34 PM5/16/15
to std-pr...@isocpp.org


On Friday, May 15, 2015 at 3:10:58 AM UTC-4, Jeffrey Yasskin wrote:
except for English-only text, which we shouldn't be designing for anymore.

ASCII still has an important role today, particularly in big data domains like finance where you have huge ascii text data files that need to be processed. Knowing that your data is single byte english text gives you a huge advantage in implementing efficient processing.

Olaf van der Spek

unread,
May 16, 2015, 12:21:43 PM5/16/15
to std-pr...@isocpp.org
2015-05-16 18:20 GMT+02:00 Matthew Fioravante <fmatth...@gmail.com>:
> ASCII still has an important role today, particularly in big data domains
> like finance where you have huge ascii text data files that need to be
> processed. Knowing that your data is single byte english text gives you a
> huge advantage in implementing efficient processing.

Sure, but the standard functions shouldn't be limited to ASCII..


--
Olaf

Jeffrey Yasskin

unread,
May 16, 2015, 12:49:28 PM5/16/15
to std-pr...@isocpp.org
'k, that's what you should put in your proposal. I'll still argue that
we shouldn't put an English-only library into an international
standard, but you might convince enough other people.

Jens Maurer

unread,
May 16, 2015, 1:29:29 PM5/16/15
to std-pr...@isocpp.org
On 05/15/2015 09:10 AM, 'Jeffrey Yasskin' via ISO C++ Standard - Future Proposals wrote:
> We'll get these; we just need someone to write up the proposal. I suspect the right signatures are actually something like
>
> optional<double> consume_double(string_view& str);

In my experience, functions like that are hard to make
efficient, because the compiler will (pessimistically) assume
the "char *" used inside might point to the "string_view"
object itself. (For this particular case, this issue might
be minuscule compared to the actual "double" parsing overhead.)

An iterator range approach might work better in that regard:

char * parse_double(char * first, char * last, double& out);


I'm interested in seeing the most versatile and basic interface
standardized, possibly in addition to a "nicer" high-level
interface. Regrettably, C++ I/O doesn't give you the former
for quite a few fundamental operations, such as T <-> string.
See N4412.

Jens

Olaf van der Spek

unread,
May 16, 2015, 1:33:45 PM5/16/15
to std-pr...@isocpp.org
2015-05-16 19:29 GMT+02:00 Jens Maurer <Jens....@gmx.net>:
> On 05/15/2015 09:10 AM, 'Jeffrey Yasskin' via ISO C++ Standard - Future Proposals wrote:
>> We'll get these; we just need someone to write up the proposal. I suspect the right signatures are actually something like
>>
>> optional<double> consume_double(string_view& str);
>
> In my experience, functions like that are hard to make
> efficient, because the compiler will (pessimistically) assume
> the "char *" used inside might point to the "string_view"
> object itself. (For this particular case, this issue might
> be minuscule compared to the actual "double" parsing overhead.)

How's that an issue as you're not writing to it?

IMO it should be a non-reference string_view anyway.

Matthew Fioravante

unread,
May 16, 2015, 2:42:39 PM5/16/15
to std-pr...@isocpp.org
I've made a few threads in the past talking about possible strtod() implementations for string_view. Every time we get into a big discussion about the procedural interface (optional? expected? out params? etc...). Since most of these things like optional and expected are still too new, nobody can really come to a conclusion.

The problem is that we have multiple piece of information we would like to return
- The converted value.
- Whether or not the conversion succeded.
- The specific error if one occurred (overflow, underflow, failure to parse, etc...).
- The tail of the string after the value consumed.

Nobody has quite figured out what a "modern" procedural interface is supposed to look like. We all want something that's easy to use, enforces or at least encourages correctness, and is composable. I think most people agree that a class based interface where you create a Parser object, call member functions, etc.. may be too heavyweight.

I currently use an interface like this which needs to be implemented for every type supported.
template <typename T> error_code parse(T& value, string_view& tail, string_view s);

Yes it uses those evil out parameters that we aren't supposed to use anymore, but we can also write generic wrappers for a more modern composable interface on top of this. The advantage of this base implementation is that its very simple and has no dependencies on other libraries.

template <typename T> error_code parse(T& value, string_view s); { string_view t; return parse(value,t,s); }
template <typename T> T parseOr(string_view s, T value_if_error = T());
 

//I want to be careful
double value;
if(!parse(value, str)) {
 
//Handle the error
}

//Just give me a default value of 1.0 if the parsing failed
auto value = parseOr(str, 1.0);


One could also implement wrappers that use optional<T>, throws exceptions on errors, expected<T>, etc..

The main problem I see with the out param approach as used here is that you always have to default construct an object and then set its value. That means this API would not be usable with objects that cannot be default constructed and then modified later. I'm not sure such an interface is the best one possible especially for standardization, but it works well enough for me for now.

Nicol Bolas

unread,
May 16, 2015, 5:10:01 PM5/16/15
to std-pr...@isocpp.org
On Saturday, May 16, 2015 at 2:42:39 PM UTC-4, Matthew Fioravante wrote:
I've made a few threads in the past talking about possible strtod() implementations for string_view. Every time we get into a big discussion about the procedural interface (optional? expected? out params? etc...). Since most of these things like optional and expected are still too new, nobody can really come to a conclusion.
<snip>

It would be highly unlikely that you could get the standards committee to abandon exceptions as the standard means for error reporting. That's how C++ has been since C++98, and unless something major happens, that's how it's going to be in the future.

It would be better to use existing specifications as your guide. For example, the FileSystem TS makes it clear how a C++ standard library API should look if you're going to use error codes:

* The output value is always the function's conceptual output, not an error code.
* If a function needs to have an error code, then it has two overloads: one that throws and one that uses an error code, which is an output parameter. The only difference is the error code parameter.

There's no point in returning an optional for the throwing version; you'll either get a valid value or you won't be in that scope anymore. And since you want the two functions to be as identical as possible, it makes sense to simply return by value in the error-code case. By using error-codes rather than exceptions, the user has made it clear that a priori code safety isn't their biggest concern. Or that they can't use exceptions at all, in which case, std::optional can't really do anything to save them.

Also, it should be noted that Boost (and their users) have tons of experience with optional. Many Boost libraries rely on it. Generally speaking, Boost uses optional for when returning a non-value does not represent an erroneous condition.

Olaf van der Spek

unread,
May 16, 2015, 5:14:14 PM5/16/15
to std-pr...@isocpp.org
2015-05-16 23:10 GMT+02:00 Nicol Bolas <jmck...@gmail.com>:
> * The output value is always the function's conceptual output, not an error
> code.
> * If a function needs to have an error code, then it has two overloads: one
> that throws and one that uses an error code, which is an output parameter.
> The only difference is the error code parameter.

I thought the FS TS was a great example of how NOT to do it..

Nicol Bolas

unread,
May 16, 2015, 8:46:12 PM5/16/15
to std-pr...@isocpp.org

Whatever anyone's individual thoughts on the matter may be, the FileSystem TS exists, passed PDTS phase, and while not quite a TS is being implemented as we speak in multiple standard library implementations. That makes it de factor prior precedent for C++ standardization, which puts the burden of proof on the person arguing against it, not the person following it.

Matthew Fioravante

unread,
May 17, 2015, 9:17:16 AM5/17/15
to std-pr...@isocpp.org


On Saturday, May 16, 2015 at 5:14:14 PM UTC-4, Olaf van der Spek wrote:

I thought the FS TS was a great example of how NOT to do it..

What are the major complains about the FS TS procedural interface? The most important thing to me would be making sure exceptions are optional and it looks like they are here. 

Vicente J. Botet Escriba

unread,
May 18, 2015, 3:49:13 PM5/18/15
to std-pr...@isocpp.org
Le 17/05/15 15:17, Matthew Fioravante a écrit :


On Saturday, May 16, 2015 at 5:14:14 PM UTC-4, Olaf van der Spek wrote:

I thought the FS TS was a great example of how NOT to do it..

What are the major complains about the FS TS procedural interface? The most important thing to me would be making sure exceptions are optional and it looks like they are here. 

Returning the error code as an out parameter makes the function to don't compose well. This is why most of the functional libraries have adopted monadic interfaces. Of course, C++ is a multi-paradigm language and as people know better the imperative paradigm, it is normal that they could find a well design the one taken by the FS TS.

I believe sincerely that the monadic interface would finish been introduced in the standard. I know that there are a lot of reticences because one particular use of monads could changes the way errors are reported. But haven't we already introduced one more with the FS TS and nobody has addressed comparable complaints?

Note that I'm not proposing to abandon exceptions, but I agree with Olaf that the design taken by FS TS is worst than a monadic one.

If we have a function that can throw, as

    R f(T1, T2);

The following seems, to me, a preferable way to introducing another function that does the same but that cannot throw

    expected<R, error_code> f(no_throw_t, T1, T2);

This introduces some constraints on R, but I can live with these constraints (move shouldn't throw).

Nevertheless expected<T> is not enough, we need more, we need a complete monadic interface. Once this is ready,
the monadic interface would compose clearly better than

    R f(error_code&, T1, T2);

Monads are not an alternative to exceptions, but they are IMHO a valid alternative to the way the FS TS report errors.

I know, I must write a paper if I want things change one day. However, as there are a lot of reticences, the work needs to be done carefully, check if the C++ community changes its mind about monads, and all this takes time.

Vicente


Olaf van der Spek

unread,
May 20, 2015, 9:42:51 AM5/20/15
to std-pr...@isocpp.org
Posix:
if (unlink(path))
// unlink failed

C++ FS:
{
error_code ec;
fs::remove(path, ec);
if (ec)
// remove failed
}


--
Olaf

Nicola Gigante

unread,
May 20, 2015, 2:41:37 PM5/20/15
to std-pr...@isocpp.org
I totally agree.

I think it’s also worth to note that with the await/yield proposal in C++17
we’ll have a de-facto built-in syntax to deal with monadic values.

Imagine:

std::expected<std::string,fs::error_code> getcontents(std::string_view path);

// my code
std::expected<T, fs::error_code> myfunc()
  T result;
  std::string file contents = await getcontents(path);
  // my code
  yield result;
}


Vicente

Bye,
Nicola

Reply all
Reply to author
Forward
0 new messages