array<mstring_view,kMaxSplits+1> split_store;
auto p = parser(filename);
array_view<mstring_view> s = p.splitNextLine(split_store,',');
//We throw away the null termination invariant, even though it is guaranteed to exist.
constexpr string_view foo = "foo"sv;
//We retain the null termination invariant and can take advantage of it.
constexpr zstring_view bar = "bar"zsv;
constexpr zstring_view kLibraryPath = "./path/libfoo.so"zsv;
void *dl = dlopen(kLibraryPath.c_str());
array<zmstring_view,kMaxSplits+1> split_store();
array_view<zmstring_view> s = p.splitNextLine(split_store,',');
HashMap<const char*,IntId>; //Hashes address
HashMap<zstring_ptr,IntId>; //Hashes string value, using only sizeof(char*) bytes for each key.
bool operator==(const char*, const char*); //Compares addresses, resulting in many surprises for novices
bool operator==(zstring_ptr l,zstring_ptr> r) //Compares values using strcmp()
bool operator==(zstring_ptr l,string_view r) //Compares values
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.mstring_view: a mutable string_view
array<mstring_view,kMaxSplits+1> split_store;
auto p = parser(filename);
array_view<mstring_view> s = p.splitNextLine(split_store,',');In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.
On Thu, May 14, 2015 at 6:32 PM, Matthew Fioravante <fmatth...@gmail.com> wrote:I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.mstring_view: a mutable string_view
array<mstring_view,kMaxSplits+1> split_store;
auto p = parser(filename);
array_view<mstring_view> s = p.splitNextLine(split_store,',');In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.Can you give some examples of these transformations?
Do you have a link to any code that does this?
I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.mstring_view: a mutable string_view
array<mstring_view,kMaxSplits+1> split_store;
auto p = parser(filename);
array_view<mstring_view> s = p.splitNextLine(split_store,',');In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.
zstring_view: a string_view (O(1) length()) which is guaranteed to be null terminatedzmstring-view: an mstring_view (O(1) length()) which is guaranteed to be null terminated
Unfortunately, I think legacy C API's are here to stay for a long time. Especially operating system API's like posix. There are some cases when we know the underlying string data is null terminated and we can use this fact to interact easily and efficiently with C API's. zstring_view is more limited in that there is no substr() operation.
//We throw away the null termination invariant, even though it is guaranteed to exist.
constexpr string_view foo = "foo"sv;
//We retain the null termination invariant and can take advantage of it.
constexpr zstring_view bar = "bar"zsv;
constexpr zstring_view kLibraryPath = "./path/libfoo.so"zsv;
void *dl = dlopen(kLibraryPath.c_str());
zstring_ptr: a thin type wrapper around const char* (zstring_ptr::strlen() for length)zmstring_ptr: a thin type wrapper around char* (zstring_ptr::strlen() for length)This one has more uses than just a C APIStoring a string as a single null terminated pointer is more compact than storing a pointer and a length. If size is important, this can cut the memory usage of a data structure storing string_views in half.
On Thursday, May 14, 2015 at 9:32:38 PM UTC-4, Matthew Fioravante wrote:I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.mstring_view: a mutable string_view
array<mstring_view,kMaxSplits+1> split_store;
auto p = parser(filename);
array_view<mstring_view> s = p.splitNextLine(split_store,',');In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.
As a bikeshed point, if it's a "view", then it's not mutable. That's why the word "view" was chosen; you can look, but not touch.
Another is that right now we still don't have a string_view compatible strtod() and friends. You can use zstring_view and strtod() to quickly parse comma separated numbers.
On Thursday, May 14, 2015 at 9:39:23 PM UTC-4, Jeffrey Yasskin wrote:On Thu, May 14, 2015 at 6:32 PM, Matthew Fioravante <fmatth...@gmail.com> wrote:I've found some cases where having a library of string_view types allow efficient text processing, parsing, and storage in a strong type-safe manner. The potential downside is a large number of string types.mstring_view: a mutable string_view
array<mstring_view,kMaxSplits+1> split_store;
auto p = parser(filename);
array_view<mstring_view> s = p.splitNextLine(split_store,',');In the above example, p reads the next line of the file, and splits it using ',' into split_store, returning a view to std::max(split_store,size()< number of splits + 1). parser is implemented efficiently, so that the resulting view points directly into the underlying internal I/O buffer to the file. This approach limits the split algorithm to only 1 memory allocation for the internal file buffer, zero data copies, 1 parsing pass which can be optimized with simd instructions. This level of efficiency is not possible with the API of fgets() and/or std::getline().Working with regular string_view is fine, but one can achieve additional optimization if one can write into the resulting buffer to transform the text further without making copies of the data which will likely require memory allocations. The data is just sitting in the internal file buffer unused during this time, so there is no reason not to allow writes to it.Can you give some examples of these transformations?One of the most common transformations is to_upper() or to_lower() if you want to do case insensitive comparison. If you need to compare the data many times, you can transform it in place once and the compiler can use the optimal bulk simd / streaming instructions to compare the strings.
Another is that right now we still don't have a string_view compatible strtod() and friends. You can use zstring_view and strtod() to quickly parse comma separated numbers.
Do you have a link to any code that does this?I don't have links to any public code which does this kind of thing but I've used it before with considerable speed ups over the conventional getline() solution. I'm not sure if any major open source projects are using these techniques for their parsing code.
One of the most common transformations is to_upper() or to_lower() if you want to do case insensitive comparison. If you need to compare the data many times, you can transform it in place once and the compiler can use the optimal bulk simd / streaming instructions to compare the strings.
Another is that right now we still don't have a string_view compatible strtod() and friends. You can use zstring_view and strtod() to quickly parse comma separated numbers.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.
except for English-only text, which we shouldn't be designing for anymore.
template <typename T> error_code parse(T& value, string_view& tail, string_view s);
template <typename T> error_code parse(T& value, string_view s); { string_view t; return parse(value,t,s); }
template <typename T> T parseOr(string_view s, T value_if_error = T());
//I want to be careful
double value;
if(!parse(value, str)) {
//Handle the error
}
//Just give me a default value of 1.0 if the parsing failed
auto value = parseOr(str, 1.0);
I've made a few threads in the past talking about possible strtod() implementations for string_view. Every time we get into a big discussion about the procedural interface (optional? expected? out params? etc...). Since most of these things like optional and expected are still too new, nobody can really come to a conclusion.
<snip>
I thought the FS TS was a great example of how NOT to do it..
On Saturday, May 16, 2015 at 5:14:14 PM UTC-4, Olaf van der Spek wrote:
I thought the FS TS was a great example of how NOT to do it..
What are the major complains about the FS TS procedural interface? The most important thing to me would be making sure exceptions are optional and it looks like they are here.
Vicente