Low level file i/o library

1,114 views
Skip to first unread message

Niall Douglas

unread,
Apr 4, 2018, 4:12:29 AM4/4/18
to ISO C++ Standard - Future Proposals
Please find attached draft 1 of my low level file i/o library proposal which I will be presenting at Rapperswil for standardisation.

It provides:
  • Bare metal performance, no exception throws, no malloc, no mutexes, no threads. Functions are deliberately designed to maximally inline bespoke editions of themselves so overhead is usually unmeasurable over the syscall.
  • Zero whole system memory copy scatter-gather file i/o, including no memory copying of paths.
  • First class support for persistent memory storage.
  • Direct support for the kernel page cache.
  • Race free filesystem.
  • Asynchronous and synchronous file i/o.
  • Comprehensive suite of filesystem mutual exclusion facilities i.e. concurrent modification locks.
  • Deep integration with C++ 20, including Filesystem, Concepts, Coroutines, Ranges.
  • Platform for building out a suite of generic filesystem algorithms with which to replace the venerable iostreams with a v2 modern alternative. Papers on that are forthcoming, but I will essentially be proposing a new Study Group to build out a state of the art standard data persistence layer for C++.
Comments are welcome.

Niall

20180404 DGGGGr0 Low level file io library draft 1.pdf

Nicol Bolas

unread,
Apr 4, 2018, 9:18:46 PM4/4/18
to ISO C++ Standard - Future Proposals

I haven't done a detailed review of the document, but my first impression is that it really needs a solid overview summary section. I'd say that P0515R0 (the initial `operator<=>` paper), section 1.4, is probably the gold standard in this regard.

The technical details seem on first glance reasonably comprehensive, but it's hard to wrap ones head around doing stuff with the API without getting bogged down in the details. A few examples of doing common filesystem tasks would go a long way into getting people into the headspace of the system.

Example code for the following would be very helpful in getting an understanding of the system:

1. Read an entire file (including size querying).
2. Write a buffer to a file.
3. Parse some text through a memory mapped file.

You'll probably know better than I what examples would show your API in the best light.

Niall Douglas

unread,
Apr 5, 2018, 3:20:58 AM4/5/18
to ISO C++ Standard - Future Proposals
I haven't done a detailed review of the document, but my first impression is that it really needs a solid overview summary section. I'd say that P0515R0 (the initial `operator<=>` paper), section 1.4, is probably the gold standard in this regard.

Thanks for reading it through. I appreciate it is quite long.

There is a separate "big picture" paper which sets out the overall vision.

But I can see that some sort of intermediate overview section for just this paper would be useful.
 

The technical details seem on first glance reasonably comprehensive, but it's hard to wrap ones head around doing stuff with the API without getting bogged down in the details. A few examples of doing common filesystem tasks would go a long way into getting people into the headspace of the system.

Example code for the following would be very helpful in getting an understanding of the system:

1. Read an entire file (including size querying).
2. Write a buffer to a file.
3. Parse some text through a memory mapped file.

You'll probably know better than I what examples would show your API in the best light.

All those are very trivial. They look exactly like doing the same in POSIX syscalls. Perhaps that's the point of showing them?

Are there any less trivial examples you'd like to see?

Niall

 

Viacheslav Usov

unread,
Apr 5, 2018, 5:15:21 AM4/5/18
to ISO C++ Standard - Future Proposals
On Thu, Apr 5, 2018 at 9:20 AM, Niall Douglas <nialldo...@gmail.com> wrote:

> All those are very trivial. They look exactly like doing the same in POSIX syscalls.

Trivial, when you know it. As it stands, the proposal goes from a high-level WHY into low-level DETAIL without an equivalent of "Hello, world" anywhere in the proposal, as far as I can tell, so one does not even know where to start.

Cheers,
V.

Niall Douglas

unread,
Apr 5, 2018, 5:10:43 PM4/5/18
to ISO C++ Standard - Future Proposals

Trivial, when you know it. As it stands, the proposal goes from a high-level WHY into low-level DETAIL without an equivalent of "Hello, world" anywhere in the proposal, as far as I can tell, so one does not even know where to start.


Ok, I'll see what I can do.

Is there anything else, apart from use case examples, that stands out as problematic in the proposal paper?

Niall 

Bengt Gustafsson

unread,
Apr 7, 2018, 7:40:15 AM4/7/18
to ISO C++ Standard - Future Proposals
Is there an API specification or header file somewhere? While the proposal details a hierarchy of classes there is no details on what APIs those classes are to have. Do you have a public reference implementation?

Niall Douglas

unread,
Apr 7, 2018, 11:01:13 AM4/7/18
to ISO C++ Standard - Future Proposals
On Saturday, April 7, 2018 at 12:40:15 PM UTC+1, Bengt Gustafsson wrote:
Is there an API specification or header file somewhere? While the proposal details a hierarchy of classes there is no details on what APIs those classes are to have. Do you have a public reference implementation?

As the second paragraph in the paper posted says:

"A reference implementation of the proposed library with reference API documentation can be found at https://ned14.github.io/afio/. It works well on FreeBSD, MacOS, Linux and Microsoft Windows on ARM, AArch64, x64 and x86."

It has drifted a bit from the proposed library, but I'm working right now on bringing it back into line.

The reason I don't get into specific API detail is because the paper would become a TS if I did that. Lot of effort before I know if WG21 wants this at all yet.

Niall

Niall Douglas

unread,
Apr 7, 2018, 9:53:49 PM4/7/18
to ISO C++ Standard - Future Proposals

Example code for the following would be very helpful in getting an understanding of the system:

1. Read an entire file (including size querying).
2. Write a buffer to a file.
3. Parse some text through a memory mapped file.

You'll probably know better than I what examples would show your API in the best light.

Ok, I've implemented example code for all of the above at https://github.com/ned14/afio/blob/develop/example/use_cases.cpp

Do let me know if they are satisfactory, and if you think you need any more examples of usage to understand the proposal paper.

Niall

Bengt Gustafsson

unread,
Apr 9, 2018, 5:55:45 PM4/9/18
to ISO C++ Standard - Future Proposals
Thanks for posting the link, and for making examples.

Here are some comments:

- The read/write functions would benefit from overloads with the traditional pointer/size parameters in addition to the scatter/gather overloads. Maybe also, in the spirit of Ranges TS, provide overloads for "containers" so that std::string and std::vector<T> can be read/written directly. I'm uncertain if reading into a container should resize it or read up to its original size. Maybe resize down but never up?

- Using 'file' as the name of a free function to open a file seems odd. I think the rule to let a verb lead a function name and a noun a class name is good, and I would guess that file was a class, not a function.

- Forcing all calls to end with a .value() to get the value actually returned does solve the problem of a missed error code check (and dual return value) but it gets very boring to write on every call. Where there is no return value to get as in close() appending a .value() is particularly strange. An alternative approach would be to let the return type have a cast operator to T which throws if there is an error and an operator!() which can be used to test for errors in an if-statement if you want to check for error codes manually. The dtor would throw a "missed check" error if neither of these operations were used (regardless of if there was an actual error or not). This however needs a member flag to remember if a check was made, which is annoying as it is knowable at compile time (at least in most cases).

- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?

- Are all (relevant) APIs duplicated as methods and functions? Your examples uses a free function read() and a method write(). Duplicating everything was the choice in std::filesystem but is it really a good way forward forever? I would personally prefer not to revert back to C functions for everything but let C++ stay object oriented...

- I'm trying to understand how path_view handles a wchar_t* ctor parameter on Windows. Is there a flag inside which tells syscall wrappers which type of characters the view points to? If so it will introduce a portability problem. If not I wonder where the temporary string is stored. I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.

- What is the point of first creating a section and then mapping it? I know this is how it works in Win32, but I always felt it was unnecessarily complex to have two steps. Is it important for file integrity or something? The Linux mmap() does not have these two levels, right? As far as I can see the map_file example does not use the sh variable after creating the mh variable despite quite complicated use pattern.

- The function name truncate() is misleading as it can also extend the file length. set_length() would be more appropriate I think. The documentation should specify what the implementation should produce on reading bytes extended to but never written. It should also detail under what circumstances the implementation may return another size than asked for, or preferably remove such cases (returning out of disk error for instance).

- I have a hard time understanding the difference between the two mapping methods using a section_handle/map_handle vs. a mapped_file_handle, but I suspect the latter is some kind of shortcut. Is the need for this discussed in the proposal?

- The term inode seems very Unix specific. Would there be an advantage to use a more generic term? Are corresponding handles used in Windows and if so, what are they called?

- I was happy to see that mapped_span provides a byte offset ctor parameter, but I would suggest that it be placed before the length parameter as it is so common to have a file with a header and then "the rest" is an array of some fixed size data records.

Nicol Bolas

unread,
Apr 9, 2018, 6:25:42 PM4/9/18
to ISO C++ Standard - Future Proposals
On Monday, April 9, 2018 at 5:55:45 PM UTC-4, Bengt Gustafsson wrote:
Thanks for posting the link, and for making examples.

Here are some comments:

- The read/write functions would benefit from overloads with the traditional pointer/size parameters in addition to the scatter/gather overloads. Maybe also, in the spirit of Ranges TS, provide overloads for "containers" so that std::string and std::vector<T> can be read/written directly. I'm uncertain if reading into a container should resize it or read up to its original size. Maybe resize down but never up?

That seems a little much for a low-level file API.

- Using 'file' as the name of a free function to open a file seems odd. I think the rule to let a verb lead a function name and a noun a class name is good, and I would guess that file was a class, not a function.

- Forcing all calls to end with a .value() to get the value actually returned does solve the problem of a missed error code check (and dual return value) but it gets very boring to write on every call.

I presume in real code, you're supposed to do error checking instead of just calling `.value()`. That being said, it would be good to see an example of what code with proper error checking would look like.

Where there is no return value to get as in close() appending a .value() is particularly strange. An alternative approach would be to let the return type have a cast operator to T which throws if there is an error and an operator!() which can be used to test for errors in an if-statement if you want to check for error codes manually. The dtor would throw a "missed check" error if neither of these operations were used (regardless of if there was an actual error or not). This however needs a member flag to remember if a check was made, which is annoying as it is knowable at compile time (at least in most cases).

- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?

- Are all (relevant) APIs duplicated as methods and functions? Your examples uses a free function read() and a method write(). Duplicating everything was the choice in std::filesystem but is it really a good way forward forever?

I don't recall much duplication of this sort in the Filesystem API. Not of this sort, in any case. Path member functions do not affect the filesystem; they ask questions that are purely about the string contents of the path. Filesystem free functions directly query information about the filesystem.

The principle duplication in the Filesystem API was for error handling.

Niall Douglas

unread,
Apr 9, 2018, 6:36:57 PM4/9/18
to ISO C++ Standard - Future Proposals

- The read/write functions would benefit from overloads with the traditional pointer/size parameters in addition to the scatter/gather overloads.

I had those until recently in fact, but I realised that they are a design mistake because they cause inadvertent sloppy programming, specifically, not using scatter-gather when scatter-gather is the right thing to use.

By forcing users to write {{ addr, len }} it reminds them they are using a scatter-gather list of one item. And it's not much extra typing, so I think it's a fair call.
 
Maybe also, in the spirit of Ranges TS, provide overloads for "containers" so that std::string and std::vector<T> can be read/written directly. I'm uncertain if reading into a container should resize it or read up to its original size. Maybe resize down but never up?

Out of scope for this low level library. The serialisation library we eventually choose will do that though.

(AFIO v1 had that support, Boost peer review rejected it)
 

- Using 'file' as the name of a free function to open a file seems odd. I think the rule to let a verb lead a function name and a noun a class name is good, and I would guess that file was a class, not a function.

It's consistent throughout though. If you want a directory_handle, you call directory(). I know it's a bit Pythonesque, but Ranges makes C++ a lot more Pythonesque.

Also, we can't use constructors to construct any of these, it must be a static function, and directory() is a lot nicer than directory_handle::directory().
 

- Forcing all calls to end with a .value() to get the value actually returned does solve the problem of a missed error code check (and dual return value) but it gets very boring to write on every call. Where there is no return value to get as in close() appending a .value() is particularly strange. An alternative approach would be to let the return type have a cast operator to T which throws if there is an error and an operator!() which can be used to test for errors in an if-statement if you want to check for error codes manually. The dtor would throw a "missed check" error if neither of these operations were used (regardless of if there was an actual error or not). This however needs a member flag to remember if a check was made, which is annoying as it is knowable at compile time (at least in most cases).

I think it's an open secret by now that Herb has a proposal in the works for replacing the C++ exception handling system with something always deterministic. SG14 are working on it. Right now the current draft effectively standardises Boost.Outcome, but it may be very different in a few months time.

So the need for .value(), which stems from the low level i/o library being written with Outcome, would go away if something like Outcome becomes the new C++ exception handling system.
 

- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?

I am not sure if std::byte* has the same aliasing rules as char*. That's probably ignorance on my part, but the char* does indeed greatly simplify implementation as well.
 

- Are all (relevant) APIs duplicated as methods and functions?

No, it's a subset. Not dissimilar to ASIO/Networking TS.
 
Your examples uses a free function read() and a method write(). Duplicating everything was the choice in std::filesystem but is it really a good way forward forever? I would personally prefer not to revert back to C functions for everything but let C++ stay object oriented...

It's to aid writing generic code. If I want to truncate something, I can call the free function truncate() and it'll ADL resolve to the right thing. That's why I chose a strict, most common denominator, subset.
 

- I'm trying to understand how path_view handles a wchar_t* ctor parameter on Windows. Is there a flag inside which tells syscall wrappers which type of characters the view points to?

It stores what the source representation is, yes. That way it knows if it needs to convert and how.
 
If so it will introduce a portability problem. If not I wonder where the temporary string is stored.

No temporary string is stored. UTF conversion is performed on demand. No memory allocation.
 
I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.

Literally exactly how it works. If you supply UTF-8 on POSIX, it is fed through unchanged. If you supply UTF-16 on Windows, it is fed through unchanged. If you supply UTF-8 on Windows, it is converted to UTF-16 onto the stack just before the syscall.
 

- What is the point of first creating a section and then mapping it? I know this is how it works in Win32, but I always felt it was unnecessarily complex to have two steps.

Sections represent mappable memory. They need not be associated with a file backing, they can be freestanding.
 
Is it important for file integrity or something? The Linux mmap() does not have these two levels, right? As far as I can see the map_file example does not use the sh variable after creating the mh variable despite quite complicated use pattern.

On POSIX section_handles can perform resource management. For example, if you resize the backing file, they might coordinate the updating of the committed pages within the address reservation. It depends on your POSIX flavour.
 

- The function name truncate() is misleading as it can also extend the file length. set_length() would be more appropriate I think.

Ah, that's deliberate. Technically speaking, files do not have length, they have maximum extent, and their contents are made up of extents, not data.

So you truncate the maximum extent, it's always to some absolute value. Direction doesn't mean anything, because there is no such thing as length. It's a high water mark you are setting.

I agree it's a filesystem term foreign to C++. But that's no harm I think, it's very important to not think that files have a length. They don't. C++ programmers just think they do.
 
The documentation should specify what the implementation should produce on reading bytes extended to but never written.

Implementation specific I am afraid.
 
It should also detail under what circumstances the implementation may return another size than asked for, or preferably remove such cases (returning out of disk error for instance).

Also implementation specific.

I do intend to list what the major platforms do as a note. But we can't standardise what we don't know.
 

- I have a hard time understanding the difference between the two mapping methods using a section_handle/map_handle vs. a mapped_file_handle, but I suspect the latter is some kind of shortcut. Is the need for this discussed in the proposal?

Yes it is discussed.

mapped_file_handle is made out of a file_handle, a section_handle, and a map_handle combined together for your convenience. It's a convenience bundle.
 

- The term inode seems very Unix specific. Would there be an advantage to use a more generic term?

I'm open to suggestions, but everybody knows what an inode is. There is no ambiguity is the name choice.
 
Are corresponding handles used in Windows and if so, what are they called?

Windows has inodes too, complete with inode number, at least on NTFS. They work exactly the same as on POSIX.
 

- I was happy to see that mapped_span provides a byte offset ctor parameter, but I would suggest that it be placed before the length parameter as it is so common to have a file with a header and then "the rest" is an array of some fixed size data records.

That's an interesting viewpoint. I had assumed that mapped_span might be frequently destructed and reconstructed with an updated length you see. Hence I placed it before offset. Glad to hear feedback on this though.

Niall

Nicol Bolas

unread,
Apr 9, 2018, 7:05:34 PM4/9/18
to ISO C++ Standard - Future Proposals
On Monday, April 9, 2018 at 6:36:57 PM UTC-4, Niall Douglas wrote:
- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?

I am not sure if std::byte* has the same aliasing rules as char*.

FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.
 
That's probably ignorance on my part, but the char* does indeed greatly simplify implementation as well.
 
I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.

Literally exactly how it works. If you supply UTF-8 on POSIX, it is fed through unchanged. If you supply UTF-16 on Windows, it is fed through unchanged. If you supply UTF-8 on Windows, it is converted to UTF-16 onto the stack just before the syscall.

That's disconcerting, since stack space is finite. How do you deal with Windows's extremely long pathnames? `std::filesystem::path` on Windows will be able to use their `\\?\` syntax for long paths painlessly and transparently. Will `path_view` provide the same? Will it make every system call take up 64KB of stack space?

Niall Douglas

unread,
Apr 10, 2018, 6:13:02 PM4/10/18
to ISO C++ Standard - Future Proposals
On Tuesday, April 10, 2018 at 12:05:34 AM UTC+1, Nicol Bolas wrote:
On Monday, April 9, 2018 at 6:36:57 PM UTC-4, Niall Douglas wrote:
- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?

I am not sure if std::byte* has the same aliasing rules as char*.

FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.

https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)
 
 
That's probably ignorance on my part, but the char* does indeed greatly simplify implementation as well.
 
I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.

Literally exactly how it works. If you supply UTF-8 on POSIX, it is fed through unchanged. If you supply UTF-16 on Windows, it is fed through unchanged. If you supply UTF-8 on Windows, it is converted to UTF-16 onto the stack just before the syscall.

That's disconcerting, since stack space is finite. How do you deal with Windows's extremely long pathnames? `std::filesystem::path` on Windows will be able to use their `\\?\` syntax for long paths painlessly and transparently.

The reference implementation implements that, plus a few others:
  • For any paths beginning with \!!\, we pass the path + 3 characters directly through. This prefix is a pure AFIO extension, and will not be recognised by other code.
  • For any paths beginning with \??\, we pass the path + 0 characters directly through. Note the NT kernel keeps a symlink at \??\ which refers to the DosDevices namespace for the current login, so as an incorrect relation which you should not rely on, the Win32 path C:\foo probably will appear at \??\C:\foo.
  • \\?\ which is used to tell a Win32 API that the remaining path is longer than a DOS path.
  • \\.\ which since Windows 7 is treated exactly like \\?\.
I doubt those will enter any standard, but a note to implementers will be added. The \??\ prefix is the fastest by far, NT uses memcmp() instead of slow locale-specific mapping. You can, quite literally, call a random number generator and pass the untranslated random data in as a filename if prefixed with \??\. Works fine, though nothing in Windows can cope with that file. 

Will `path_view` provide the same? Will it make every system call take up 64KB of stack space?

Yes.

But it's not a problem in any typical code. Most people will use UTF-8 literals, which are pass through on POSIX, so this only affects Windows. And most people don't use long string literals. Paths retrieved on Windows from some API are usually in UTF-16. That gets passed through. Windows also does not require zero termination if using the NT API, so you can safely use subset views of paths without copying onto the stack.

And finally, even in the worst case, Windows does not care much about 64Kb on the stack, which would only occur anyway if someone is passing in a 32Kb path, which is exceedingly rare. I think it a fair design compromise, and memory allocation and most memory copying is entirely avoided.

Niall
 

Richard Smith

unread,
Apr 12, 2018, 5:37:56 AM4/12/18
to std-pr...@isocpp.org
On Tue, 10 Apr 2018, 23:13 Niall Douglas, <nialldo...@gmail.com> wrote:
On Tuesday, April 10, 2018 at 12:05:34 AM UTC+1, Nicol Bolas wrote:
On Monday, April 9, 2018 at 6:36:57 PM UTC-4, Niall Douglas wrote:
- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?

I am not sure if std::byte* has the same aliasing rules as char*.

FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.

https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)

Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.

That's probably ignorance on my part, but the char* does indeed greatly simplify implementation as well.
 
I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.

Literally exactly how it works. If you supply UTF-8 on POSIX, it is fed through unchanged. If you supply UTF-16 on Windows, it is fed through unchanged. If you supply UTF-8 on Windows, it is converted to UTF-16 onto the stack just before the syscall.

That's disconcerting, since stack space is finite. How do you deal with Windows's extremely long pathnames? `std::filesystem::path` on Windows will be able to use their `\\?\` syntax for long paths painlessly and transparently.

The reference implementation implements that, plus a few others:
  • For any paths beginning with \!!\, we pass the path + 3 characters directly through. This prefix is a pure AFIO extension, and will not be recognised by other code.
  • For any paths beginning with \??\, we pass the path + 0 characters directly through. Note the NT kernel keeps a symlink at \??\ which refers to the DosDevices namespace for the current login, so as an incorrect relation which you should not rely on, the Win32 path C:\foo probably will appear at \??\C:\foo.
  • \\?\ which is used to tell a Win32 API that the remaining path is longer than a DOS path.
  • \\.\ which since Windows 7 is treated exactly like \\?\.
I doubt those will enter any standard, but a note to implementers will be added. The \??\ prefix is the fastest by far, NT uses memcmp() instead of slow locale-specific mapping. You can, quite literally, call a random number generator and pass the untranslated random data in as a filename if prefixed with \??\. Works fine, though nothing in Windows can cope with that file. 

Will `path_view` provide the same? Will it make every system call take up 64KB of stack space?

Yes.

But it's not a problem in any typical code. Most people will use UTF-8 literals, which are pass through on POSIX, so this only affects Windows. And most people don't use long string literals. Paths retrieved on Windows from some API are usually in UTF-16. That gets passed through. Windows also does not require zero termination if using the NT API, so you can safely use subset views of paths without copying onto the stack.

And finally, even in the worst case, Windows does not care much about 64Kb on the stack, which would only occur anyway if someone is passing in a 32Kb path, which is exceedingly rare. I think it a fair design compromise, and memory allocation and most memory copying is entirely avoided.

Niall
 

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/b27a7889-636f-428e-8f0e-bd60dfcf8309%40isocpp.org.

Niall Douglas

unread,
Apr 12, 2018, 6:22:18 AM4/12/18
to ISO C++ Standard - Future Proposals
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.

https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)

Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.

1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?

2. Before C++ 17, byte-lite uses struct byte { typedef unsigned char type; type v; }; . If Martin changes that to union, would that cause the compiler to assume the type can alias?

3. Would __attribute__((__may_alias__)) be of use here?

Niall

Richard Smith

unread,
Apr 13, 2018, 4:49:35 AM4/13/18
to std-pr...@isocpp.org
On Thu, 12 Apr 2018, 11:22 Niall Douglas, <nialldo...@gmail.com> wrote:
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.

https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)

Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.

1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?

Yes. There is in general no aliasing relationship between an enumeration and its underlying type.

2. Before C++ 17, byte-lite uses struct byte { typedef unsigned char type; type v; }; . If Martin changes that to union, would that cause the compiler to assume the type can alias?

In practice, probably -- unless you turn on more aggressive field-sensitive alias analysis (or it's on by default in your compiler of choice).

3. Would __attribute__((__may_alias__)) be of use here?

Yes (assuming your compiler supports it, of course). In practice, at least clang and GCC recognize the std::byte type and give it special treatment, but I think a regular enum class with that attribute would be equivalent.

(At least, for now -- there are some proposed C++20 changes that might require std::byte to receive special treatment in constant expression evaluation.)

Niall

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Niall Douglas

unread,
Apr 13, 2018, 5:12:50 AM4/13/18
to ISO C++ Standard - Future Proposals
3. Would __attribute__((__may_alias__)) be of use here?

Yes (assuming your compiler supports it, of course). In practice, at least clang and GCC recognize the std::byte type and give it special treatment, but I think a regular enum class with that attribute would be equivalent.

(At least, for now -- there are some proposed C++20 changes that might require std::byte to receive special treatment in constant expression evaluation.)

I've reported the bug with solution to him at https://github.com/martinmoene/byte-lite/issues/3

Niall 

Martin Moene

unread,
Apr 13, 2018, 5:50:10 AM4/13/18
to ISO C++ Standard - Future Proposals


On Friday, April 13, 2018 at 10:49:35 AM UTC+2, Richard Smith wrote:
On Thu, 12 Apr 2018, 11:22 Niall Douglas, <nialldo...@gmail.com> wrote:
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.

https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)

Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.

1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?

Yes. There is in general no aliasing relationship between an enumeration and its underlying type.

So, would the following prevent aliasing:

struct
byte { enum type : unsigned char {}; type v; };

 

Niall Douglas

unread,
Apr 13, 2018, 6:31:21 AM4/13/18
to ISO C++ Standard - Future Proposals
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.

https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)

Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.

1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?

Yes. There is in general no aliasing relationship between an enumeration and its underlying type.

So, would the following prevent aliasing:

struct
byte { enum type : unsigned char {}; type v; };
 
You misunderstand.

std::byte* is specifically guaranteed in the standard to be able to alias any other type, same as char*. Your implementation does not do this. It is therefore unsafe under optimisation.

Niall

Martin Moene

unread,
Apr 13, 2018, 8:58:17 AM4/13/18
to ISO C++ Standard - Future Proposals
I'm not referring to std::byte.


Richard Smith writes:

Yes. There is in general no aliasing relationship between an enumeration and its underlying type.

(emphasis mine).

My question is: does this also hold for struct byte { enum type : unsigned char {}; type v; }; ?

Richard Smith

unread,
Apr 13, 2018, 10:07:08 AM4/13/18
to std-pr...@isocpp.org
Yes. The only types with the special "can alias anything" property (in a compiler not supporting structure::byte) are char, signed char, and unsigned char. Not enums, not structs.


--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Nicol Bolas

unread,
Apr 13, 2018, 12:02:40 PM4/13/18
to ISO C++ Standard - Future Proposals


On Friday, April 13, 2018 at 10:07:08 AM UTC-4, Richard Smith wrote:
On Fri, 13 Apr 2018, 13:58 Martin Moene, <m.j....@eld.physics.leidenuniv.nl> wrote:


On Friday, April 13, 2018 at 12:31:21 PM UTC+2, Niall Douglas wrote:
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.

https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)

Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.

1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?

Yes. There is in general no aliasing relationship between an enumeration and its underlying type.

So, would the following prevent aliasing:

struct byte { enum type : unsigned char {}; type v; };
 
You misunderstand.

std::byte* is specifically guaranteed in the standard to be able to alias any other type, same as char*. Your implementation does not do this. It is therefore unsafe under optimisation.

Niall


I'm not referring to std::byte.

Richard Smith writes:

Yes. There is in general no aliasing relationship between an enumeration and its underlying type.

(emphasis mine).

My question is: does this also hold for struct byte { enum type : unsigned char {}; type v; }; ?

Yes. The only types with the special "can alias anything" property (in a compiler not supporting structure::byte) are char, signed char, and unsigned char. Not enums, not structs.

Actually, `signed char` isn't even on that list; just `char` and `unsigned char`.

Martin Moene

unread,
Apr 13, 2018, 5:11:18 PM4/13/18
to ISO C++ Standard - Future Proposals

On godbolt:

Christopher Jefferson

unread,
Apr 13, 2018, 5:35:11 PM4/13/18
to std-pr...@isocpp.org
On 5 April 2018 at 08:20, Niall Douglas <nialldo...@gmail.com> wrote:
> All those are very trivial. They look exactly like doing the same in POSIX
> syscalls. Perhaps that's the point of showing them?
>
> Are there any less trivial examples you'd like to see?

I suggest that you tell us. At the moment I can't understand why I'd
use this over either 'read/write' or 'mmap' from POSIX. I've never
found those limiting, or limiting performance. I'm sure I just aren't
imaginative enough, but to introduce yet another set of filesystem
APIs needs to meet (in my opinion) an extremely high bar.

Chris

Richard Smith

unread,
Apr 13, 2018, 5:45:20 PM4/13/18
to std-pr...@isocpp.org
:) I should have checked. We have too many variants of this (any of the three char types / char and unsigned char / any char type that is unsigned) for different rules.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Nicol Bolas

unread,
Apr 13, 2018, 6:38:23 PM4/13/18
to ISO C++ Standard - Future Proposals
On Friday, April 13, 2018 at 5:35:11 PM UTC-4, Chris Jefferson wrote:
On 5 April 2018 at 08:20, Niall Douglas <nialldo...@gmail.com> wrote:
> All those are very trivial. They look exactly like doing the same in POSIX
> syscalls. Perhaps that's the point of showing them?
>
> Are there any less trivial examples you'd like to see?

I suggest that you tell us. At the moment I can't understand why I'd
use this over either 'read/write' or 'mmap' from POSIX.

By that reasoning, why bother having the Filesystem library? Or Threads? Or Networking? You can do all of those things with POSIX, so why not use that?

We want them so that we can be cross-platform. Not every platform offers POSIX natively. Or at all.

Niall Douglas

unread,
Apr 14, 2018, 4:52:07 AM4/14/18
to ISO C++ Standard - Future Proposals
A surprisingly prescient comment.

Yesterday I was eating canapes in the ACCU Executive Lounge with Michael Wong, who is one of Direction. He literally asked the same question: "why bother standardising this if it's a mere wrap of the syscalls?"

As he explained in further detail, if this low level i/o library let one get straight to the hardware without any kernel in between, he'd be excited. But it simply wraps the kernel syscalls. You get exactly what the kernel gives you, quirks and all. So what's the value add? Just use the syscalls!

So here are my reasons, let's see what std-proposals makes of them:
  1. This low level file i/o library defines a common language of basic operations across platforms. In other words, it chooses a common denominator across 99% of platforms out there. If you append to a memory mapped file, that'll do the platform-specific magic on all supported platforms.
  2. This low level file i/o library only consumes and produces trivially copyable, trivially relocatable and standard layout objects. Empirical testing has found that the optimiser will eliminate this low level library almost always, inlining the platform specific syscall directly. So, it is no worse in any way over calling the platform syscalls directly, except that this library API is portable.
  3. Where trivial to do so, we encode domain specific knowledge about platform specific quirks. For example, fsync() on MacOS does not do a blocking write barrier, so our barrier() function calls the appropriate magic fcntl() on MacOS only where the barrier() is requested to block until completion.
  4. This low level file i/o library is a bunch of primitives which can be readily combined together to build filesystem algorithms whose implementation code is much cleaner looking and easier to rationalise about than using syscalls directly.
  5. We can provide deep integration with C++ language features in a way which platform specific syscalls cannot. Ranges, Coroutines and Generators are the obvious examples, but we also make a ton of use of span<T>, so all code which understands span<T> automagically works with no extra boilerplate needed.
How's that? Reasonable compelling?

Niall

Niall Douglas

unread,
Apr 15, 2018, 7:41:57 AM4/15/18
to ISO C++ Standard - Future Proposals
Please find attached draft 2 of the low level file i/o proposal which incorporates much feedback from many places.

Niall

DGGGGr0 Low level file io library draft 2.pdf

Niall Douglas

unread,
Apr 17, 2018, 8:29:00 AM4/17/18
to ISO C++ Standard - Future Proposals
Marshall Clow expressed surprise to me this morning at the last day of the LLVM conference that the proposed low level file i/o library also standardises kernel page allocation (i.e. mmap()), and shared memory.

Is this also a surprise to std-proposals from reading the proposal paper?

I supply two new examples of use, both demonstrating the kernel page allocation (malloc1) and shared memory (malloc2) use cases which can be viewed at https://github.com/ned14/afio/blob/develop/example/use_cases.cpp#L192

Before anyone asks, the reason the low level file i/o library "also" standardises kernel page allocation and shared memory is because it is for free. As soon as we implement memory mapped files, it becomes trivial to supply a file descriptor of -1 to mmap() and bam, now you have a kernel page allocator.

Similarly, memory mapping the same file into multiple processes does, by definition, map shared memory between those processes. In fact you must explicitly ask for your copy to be private (i.e. copy on write) on all platforms, so we do exactly the same in the proposed low level file i/o library.

Niall

joewo...@gmail.com

unread,
May 8, 2018, 7:43:34 PM5/8/18
to ISO C++ Standard - Future Proposals
I like it.

Section 4, page 13, under unlink_on_close it states "Causes the entry in the filesystem to disappear on first close by any process in the system."

Shouldn't that be "...last close by the processes..." (Which is what FILE_FLAG_DELETE_ON_CLOSE does in Windows.)

Also curious whether native_handle_type would allow duplicating the handle/descriptor.

Niall Douglas

unread,
May 9, 2018, 4:02:51 AM5/9/18
to ISO C++ Standard - Future Proposals, joewo...@gmail.com
Firstly, the edition attached is the final one which will appear in the pre-Rapperswil mailing.

On Wednesday, May 9, 2018 at 12:43:34 AM UTC+1, joewo...@gmail.com wrote:
I like it.

Cool. I hope so do the committee.
 

Section 4, page 13, under unlink_on_close it states "Causes the entry in the filesystem to disappear on first close by any process in the system."

Shouldn't that be "...last close by the processes..." (Which is what FILE_FLAG_DELETE_ON_CLOSE does in Windows.)

Actually no, the wording is correct.

After the first handle to a file with FILE_FLAG_DELETE_ON_CLOSE set is closed anywhere in the system, it becomes unopenable in most situations because it is now being "deleted".

Also, we emulate POSIX semantics here. On first close, we rename the file to something very random before the close. It'll appear to disappear on Windows. On POSIX, it actually disappears, and apparently in the Spring update for Win10 there is a magic new syscall for making it actually disappear on Windows too. Or at least this is what Microsoft tell me.
 

Also curious whether native_handle_type would allow duplicating the handle/descriptor.

It does not because it cannot. native_handle_type will tell you what kind of handle it is and what config it is in. But it cannot safely duplicate the handle from that alone.

class handle has a virtual member function clone() which will duplicate a handle. It must be virtual, because if the most derived class were say an async_file_handle, the duplicated handle would need to be registered with the i/o service. If it were a mapped_file_handle, the maps would need to be duplicated. And so on.

If you super need to duplicate a native_handle_type, you can attach an external instance to a class handle, clone it, then detach it.

Niall
P1031_file_io.pdf

Bryce Adelstein Lelbach aka wash

unread,
May 10, 2018, 4:25:42 AM5/10/18
to std-pr...@isocpp.org
Niall,

A note:

Section 3 bullet 2 suggests that Coroutines have been voted into C++20; they have not been.

Please ensure this proposal is seen by SG1 as a number of the features will have an impact on the memory model.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Niall Douglas

unread,
May 10, 2018, 4:50:01 AM5/10/18
to ISO C++ Standard - Future Proposals
On Thursday, May 10, 2018 at 9:25:42 AM UTC+1, Bryce Adelstein Lelbach wrote:
Niall,

A note:

Section 3 bullet 2 suggests that Coroutines have been voted into C++20; they have not been.

Sigh. Ok, can you give me a precise wording which correctly describes its current exact status. Last time I saw anything, the national bodies had approved it, with a list of concerns, and those concerns would be addressed at Rapperswil. Obviously the national bodies approving it is not what I thought it was (sorry, my ISO experience is mostly outside WG21)
 

Please ensure this proposal is seen by SG1 as a number of the features will have an impact on the memory model.

We discussed this in a separate thread on std-proposals, and I have removed anything in the low level file i/o library proposal which could have anything to do with Concurrency. That keeps things clean and orthogonal.

I have, separately, linked up the pmem folk from Intel with SG1 so they can collaborate, if they wish, on enhancing the C++ memory model to support persistent memory. But it is very much deliberately orthogonal to this proposal. I have deliberately dropped features and facilities to ensure this, same as I dropped the overlap with the Networking TS.

Niall

Bryce Adelstein Lelbach aka wash

unread,
May 10, 2018, 5:31:45 AM5/10/18
to std-pr...@isocpp.org
Coroutines is published in a Technical Specification, ISO/IEC TS 22277

Fair warning - if this comes to LEWG, one of the first questions is going to be "has SG1 seen this". The proposal has the words "concurrent", "mutex", and "atomic" in it; SG1 will need to see this.

I'd like to see you be able to get feedback on your proposal at the next meeting; these features are important and I'm glad to see someone bring a proposal for them.

But if the proposal comes before LEWG before SG1 has seen it, you may miss out on a valuable opportunity to make progress on this proposal at this meeting. LEWG does not have the ability to review this; SG1 does. You will probably get asked by LEWG to take it to SG1. This means you will lose time, because you will have to try to get scheduled in SG1's queue mid-meeting, and then you would have to get time scheduled with LEWG again.

Also, keep in mind that the chairs select which papers their groups review. This paper will probably be put in the SG1 queue either way; it's probably best for you to be aware of that and request things be scheduled such that SG1 sees it before LEWG does, to ensure the best chance of you making forward progress.

I would suggest that you email the LEWG chair, Titus Winters, and the SG1 chair, Olivier Giroux, and ask them how to proceed.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Niall Douglas

unread,
May 10, 2018, 1:48:05 PM5/10/18
to ISO C++ Standard - Future Proposals
On Thursday, May 10, 2018 at 10:31:45 AM UTC+1, Bryce Adelstein Lelbach wrote:
Coroutines is published in a Technical Specification, ISO/IEC TS 22277

Eh, okay. I'm obviously very wrong then. Thanks for the bug report. It's fixed in R1.
 

Fair warning - if this comes to LEWG, one of the first questions is going to be "has SG1 seen this". The proposal has the words "concurrent", "mutex", and "atomic" in it; SG1 will need to see this.

I'm entirely expecting Rapperswil to be full of reasons why it and my other five papers should be refused. But let's get ahead of the SG1 issue at least.
 

I'd like to see you be able to get feedback on your proposal at the next meeting; these features are important and I'm glad to see someone bring a proposal for them.

But if the proposal comes before LEWG before SG1 has seen it, you may miss out on a valuable opportunity to make progress on this proposal at this meeting. LEWG does not have the ability to review this; SG1 does. You will probably get asked by LEWG to take it to SG1. This means you will lose time, because you will have to try to get scheduled in SG1's queue mid-meeting, and then you would have to get time scheduled with LEWG again.

Do remember that P1031 is not a proposal. It's an outline of a proposal. So it doesn't need a review per se, just a general yay or nay nod. It's purely there as supporting information for P1026 A call for a Data Persistence (iostream v2) study group to illustrate one of the first things which said proposed study group would start working upon, if approved.

I've tried to formally ask Direction for guidance on whether they would like said study group. It does not fit obviously into P0939's Direction for ISO C++, you see. Despite multiple attempts to get hold of someone, I have received no reply, which rather makes the point of there being a Direction group to ask for guidance kinda pointless.

If you could give them a prod for me, I'd appreciate it. I'll forward you the formal request email there now.
 

Also, keep in mind that the chairs select which papers their groups review. This paper will probably be put in the SG1 queue either way; it's probably best for you to be aware of that and request things be scheduled such that SG1 sees it before LEWG does, to ensure the best chance of you making forward progress.

I would suggest that you email the LEWG chair, Titus Winters, and the SG1 chair, Olivier Giroux, and ask them how to proceed.

I'll email them as you suggest, and CC you. But my prior emails have not been answered. No harm trying again.

Niall 

Vinnie Falco

unread,
Jun 26, 2018, 9:36:26 PM6/26/18
to ISO C++ Standard - Future Proposals
On Saturday, April 14, 2018 at 1:52:07 AM UTC-7, Niall Douglas wrote:

> we also make a ton of use of span<T>, so all code which understands
> span<T> automagically works with no extra boilerplate needed.
> How's that? Reasonable compelling?

It seems this low-level file I/O paper uses the same flawed reasoning for memory buffers that was proffered during the Boost.Beast formal review. There was strong objection to Boost.Asio's ConstBufferSequence and MutableBufferSequence concepts. I believe the phrase used was "10 year old outdated technology."

ConstBufferSequence requirements
<https://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/ConstBufferSequence.html>

MutableBufferSequence requirements
<https://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/MutableBufferSequence.html>

In particular you didn't understand why elements of buffer sequences needed to be convertible to boost::asio::const_buffer and boost::asio::mutable_buffer. I'll note that these concepts are now part of Networking.TS and very likely will not change before being voted into the standard. So they are very much with us, and new papers which work with buffer sequences (e.g. libraries components which wrap calls to ::readv, ::writev) need to be harmonious with established practice.

The missing ingredient is that since const_buffer and mutable_buffer are separate types, library implementors have the option of conditionally compiling in code for doing buffer debugging. This is particular useful on MSVC's standard library which supports checked iterators:

<https://www.boost.org/doc/libs/1_67_0/doc/html/boost_asio/overview/core/buffers.html#boost_asio.overview.core.buffers.buffer_debugging>

The buffer debugging feature is a natural fit for [networking.ts] buffer sequences because the function template std::experimental::net::buffer() has the container's type information before performing the type-erasure implied by converting to const_buffer or mutable_buffer. Therefore, the necessary std::function<> for doing the debugging can be stashed away in the buffer object and invoked later (when the macro for enabling buffer debugging is set).

TL;DR: Any low-level file I/O paper should use the buffer sequence concepts from [networking.ts] (remember the Vasa).


Nicol Bolas

unread,
Jun 26, 2018, 10:24:24 PM6/26/18
to ISO C++ Standard - Future Proposals
On Tuesday, June 26, 2018 at 9:36:26 PM UTC-4, Vinnie Falco wrote:
On Saturday, April 14, 2018 at 1:52:07 AM UTC-7, Niall Douglas wrote:

> we also make a ton of use of span<T>, so all code which understands
> span<T> automagically works with no extra boilerplate needed.
> How's that? Reasonable compelling?

It seems this low-level file I/O paper uses the same flawed reasoning for memory buffers that was proffered during the Boost.Beast formal review. There was strong objection to Boost.Asio's ConstBufferSequence and MutableBufferSequence concepts. I believe the phrase used was "10 year old outdated technology."

ConstBufferSequence requirements
<https://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/ConstBufferSequence.html>

MutableBufferSequence requirements
<https://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/MutableBufferSequence.html>

In particular you didn't understand why elements of buffer sequences needed to be convertible to boost::asio::const_buffer and boost::asio::mutable_buffer. I'll note that these concepts are now part of Networking.TS and very likely will not change before being voted into the standard. So they are very much with us, and new papers which work with buffer sequences (e.g. libraries components which wrap calls to ::readv, ::writev) need to be harmonious with established practice.

Would it not make more sense for `const_buffer` and `mutable_buffer` to be convertible from/to `span<byte>`? The latter is the lower-level, lingua-franca type after all.

The latter is also more likely to hit C++20 than the former ;)

The missing ingredient is that since const_buffer and mutable_buffer are separate types, library implementors have the option of conditionally compiling in code for doing buffer debugging. This is particular useful on MSVC's standard library which supports checked iterators:

<https://www.boost.org/doc/libs/1_67_0/doc/html/boost_asio/overview/core/buffers.html#boost_asio.overview.core.buffers.buffer_debugging>

The buffer debugging feature is a natural fit for [networking.ts] buffer sequences because the function template std::experimental::net::buffer() has the container's type information before performing the type-erasure implied by converting to const_buffer or mutable_buffer. Therefore, the necessary std::function<> for doing the debugging can be stashed away in the buffer object and invoked later (when the macro for enabling buffer debugging is set).

Would such debugging be even possible without sacrificing how low-level the library is?

Networking is a middle-ground between the lowest level guts of a platforms networking system, and the higher-level application. The LLFIO is not a middle-ground; it's the bottom floor. Sadly, debugging tools usually don't work in the basement.

TL;DR: Any low-level file I/O paper should use the buffer sequence concepts from [networking.ts] (remember the Vasa).

LLFIO should not use buffer sequence concepts unless they are useful in the low-level world. And these buffer concepts seem rather high-level. Low level libraries need not be consistent with aspects of high level libraries. For the same reason, LLFIO doesn't need to provide a direct interface to iostream buffers.

LLFIO seems to want to deal directly in a single, contiguous span of bytes. If you need to do reads into multiple spans, you either do virtual memory gymnastics to make the two spans appear as a contiguous array (which I think LLFIO wants to allow you to do) or you do multiple reads. Because that's how file IO works at the low levels.

Niall Douglas

unread,
Jun 27, 2018, 4:31:13 AM6/27/18
to ISO C++ Standard - Future Proposals

> we also make a ton of use of span<T>, so all code which understands
> span<T> automagically works with no extra boilerplate needed.
> How's that? Reasonable compelling?

It seems this low-level file I/O paper uses the same flawed reasoning for memory buffers that was proffered during the Boost.Beast formal review. There was strong objection to Boost.Asio's ConstBufferSequence and MutableBufferSequence concepts. I believe the phrase used was "10 year old outdated technology."

Yes, and it is becoming ever more clear that you should have taken my advice at the time. If I knew then what I know now, I would have objected even stronger than I did.
 
In particular you didn't understand why elements of buffer sequences needed to be convertible to boost::asio::const_buffer and boost::asio::mutable_buffer.

No, you didn't seem to understand that LLFIO's buffer_type is 100% compatible with ASIO's generalised buffer infrastructure. You can mark it up with free function overloads to tell ASIO about it, if you really wish.
 
I'll note that these concepts are now part of Networking.TS and very likely will not change before being voted into the standard. So they are very much with us, and new papers which work with buffer sequences (e.g. libraries components which wrap calls to ::readv, ::writev) need to be harmonious with established practice.

No, the Networking TS will be modified (eventually) to support Range idioms, which LLFIO was written around after its post-peer review redesign. It'll be Networking which is changed to meet LLFIO.
 

The missing ingredient is that since const_buffer and mutable_buffer are separate types, library implementors have the option of conditionally compiling in code for doing buffer debugging. This is particular useful on MSVC's standard library which supports checked iterators:

<https://www.boost.org/doc/libs/1_67_0/doc/html/boost_asio/overview/core/buffers.html#boost_asio.overview.core.buffers.buffer_debugging>

The buffer debugging feature is a natural fit for [networking.ts] buffer sequences because the function template std::experimental::net::buffer() has the container's type information before performing the type-erasure implied by converting to const_buffer or mutable_buffer. Therefore, the necessary std::function<> for doing the debugging can be stashed away in the buffer object and invoked later (when the macro for enabling buffer debugging is set).

LLFIO uses a buffer type which must be standard layout and trivially copyable, and arrays of it must have an identical layout to an equivalent array of the struct iovec on your platform. A note in the TS wording will strongly encourage implementations to static assert this.

A major use case for LLFIO will be iostreams v2 where we envisage constexpr generation of scatter-gather buffer lists from Reflection which serialise and deserialise objects. This, naturally, generates a lot of small scatter-gather operations, and lots of said operations, potentially thousands.

ASIO's current implementation is highly like to force the compiler to repack the scatter-gather buffers list, destroying performance. LLFIO's implementation is specifically designed to avoid that - what is supplied to the i/o call is what is passed to the kernel. No repacking.
 

TL;DR: Any low-level file I/O paper should use the buffer sequence concepts from [networking.ts] (remember the Vasa).
 
As a C++ 23 targeted library, LLFIO is specifically designed around Ranges, specifically ContiguousRange. I am currently minded that it should all be spans of spans, but Eric really would prefer ContiguousRanges, and we discussed this in Rapperswil. I'm of the opinion that spans are full subsets of ContiguousRange (they are now after a vote at LEWG in Rapperwsil), and they come with the huge advantage of not creating an acyclic dependency between Ranges and LLFIO, and Eric had to admit that was a valuable feature of my current opinion on this. We will no doubt debate that in public at a future LEWG meeting.

The current reference library is spans of buffer types, and the simple reason for that is because span-lite is currently runtime bounds checking which murders performance. The TS wording I am currently writing is spans of span, which is the exact advice I gave to you during the Boost peer review of Beast.
LLFIO seems to want to deal directly in a single, contiguous span of bytes. If you need to do reads into multiple spans, you either do virtual memory gymnastics to make the two spans appear as a contiguous array (which I think LLFIO wants to allow you to do) or you do multiple reads. Because that's how file IO works at the low levels.

Eric and Casey have lots of Ranges based data transformation stuff coming after this current first chunk of Ranges. LLFIO was explicitly designed to slot well into that, except where in my opinion Eric has made a mistake in his assumptions of how the low level must work. We discussed that in detail at Rapperswil, and we made good progress on narrowing down our differences.

In short, I think I persuaded Eric that he needs to adjust his design to work in fixed granularity chunks, so specifically, if he wants his async file i/o which I currently object to, he needs to make his end work very well in $PAGESIZE chunks of data. If he can deliver that, I'll remove my objections to standardising async file i/o.

Niall
Reply all
Reply to author
Forward
0 new messages