I haven't done a detailed review of the document, but my first impression is that it really needs a solid overview summary section. I'd say that P0515R0 (the initial `operator<=>` paper), section 1.4, is probably the gold standard in this regard.
The technical details seem on first glance reasonably comprehensive, but it's hard to wrap ones head around doing stuff with the API without getting bogged down in the details. A few examples of doing common filesystem tasks would go a long way into getting people into the headspace of the system.
Example code for the following would be very helpful in getting an understanding of the system:
1. Read an entire file (including size querying).
2. Write a buffer to a file.
3. Parse some text through a memory mapped file.
You'll probably know better than I what examples would show your API in the best light.
Trivial, when you know it. As it stands, the proposal goes from a high-level WHY into low-level DETAIL without an equivalent of "Hello, world" anywhere in the proposal, as far as I can tell, so one does not even know where to start.
Is there an API specification or header file somewhere? While the proposal details a hierarchy of classes there is no details on what APIs those classes are to have. Do you have a public reference implementation?
Example code for the following would be very helpful in getting an understanding of the system:
1. Read an entire file (including size querying).
2. Write a buffer to a file.
3. Parse some text through a memory mapped file.
You'll probably know better than I what examples would show your API in the best light.
Thanks for posting the link, and for making examples.Here are some comments:- The read/write functions would benefit from overloads with the traditional pointer/size parameters in addition to the scatter/gather overloads. Maybe also, in the spirit of Ranges TS, provide overloads for "containers" so that std::string and std::vector<T> can be read/written directly. I'm uncertain if reading into a container should resize it or read up to its original size. Maybe resize down but never up?
- Using 'file' as the name of a free function to open a file seems odd. I think the rule to let a verb lead a function name and a noun a class name is good, and I would guess that file was a class, not a function.- Forcing all calls to end with a .value() to get the value actually returned does solve the problem of a missed error code check (and dual return value) but it gets very boring to write on every call.
Where there is no return value to get as in close() appending a .value() is particularly strange. An alternative approach would be to let the return type have a cast operator to T which throws if there is an error and an operator!() which can be used to test for errors in an if-statement if you want to check for error codes manually. The dtor would throw a "missed check" error if neither of these operations were used (regardless of if there was an actual error or not). This however needs a member flag to remember if a check was made, which is annoying as it is knowable at compile time (at least in most cases).- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?- Are all (relevant) APIs duplicated as methods and functions? Your examples uses a free function read() and a method write(). Duplicating everything was the choice in std::filesystem but is it really a good way forward forever?
- The read/write functions would benefit from overloads with the traditional pointer/size parameters in addition to the scatter/gather overloads.
Maybe also, in the spirit of Ranges TS, provide overloads for "containers" so that std::string and std::vector<T> can be read/written directly. I'm uncertain if reading into a container should resize it or read up to its original size. Maybe resize down but never up?
- Using 'file' as the name of a free function to open a file seems odd. I think the rule to let a verb lead a function name and a noun a class name is good, and I would guess that file was a class, not a function.
- Forcing all calls to end with a .value() to get the value actually returned does solve the problem of a missed error code check (and dual return value) but it gets very boring to write on every call. Where there is no return value to get as in close() appending a .value() is particularly strange. An alternative approach would be to let the return type have a cast operator to T which throws if there is an error and an operator!() which can be used to test for errors in an if-statement if you want to check for error codes manually. The dtor would throw a "missed check" error if neither of these operations were used (regardless of if there was an actual error or not). This however needs a member flag to remember if a check was made, which is annoying as it is knowable at compile time (at least in most cases).
- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?
- Are all (relevant) APIs duplicated as methods and functions?
Your examples uses a free function read() and a method write(). Duplicating everything was the choice in std::filesystem but is it really a good way forward forever? I would personally prefer not to revert back to C functions for everything but let C++ stay object oriented...
- I'm trying to understand how path_view handles a wchar_t* ctor parameter on Windows. Is there a flag inside which tells syscall wrappers which type of characters the view points to?
If so it will introduce a portability problem. If not I wonder where the temporary string is stored.
I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.
- What is the point of first creating a section and then mapping it? I know this is how it works in Win32, but I always felt it was unnecessarily complex to have two steps.
Is it important for file integrity or something? The Linux mmap() does not have these two levels, right? As far as I can see the map_file example does not use the sh variable after creating the mh variable despite quite complicated use pattern.
- The function name truncate() is misleading as it can also extend the file length. set_length() would be more appropriate I think.
The documentation should specify what the implementation should produce on reading bytes extended to but never written.
It should also detail under what circumstances the implementation may return another size than asked for, or preferably remove such cases (returning out of disk error for instance).
- I have a hard time understanding the difference between the two mapping methods using a section_handle/map_handle vs. a mapped_file_handle, but I suspect the latter is some kind of shortcut. Is the need for this discussed in the proposal?
- The term inode seems very Unix specific. Would there be an advantage to use a more generic term?
Are corresponding handles used in Windows and if so, what are they called?
- I was happy to see that mapped_span provides a byte offset ctor parameter, but I would suggest that it be placed before the length parameter as it is so common to have a file with a header and then "the rest" is an array of some fixed size data records.
- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?I am not sure if std::byte* has the same aliasing rules as char*.
That's probably ignorance on my part, but the char* does indeed greatly simplify implementation as well.
I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.
Literally exactly how it works. If you supply UTF-8 on POSIX, it is fed through unchanged. If you supply UTF-16 on Windows, it is fed through unchanged. If you supply UTF-8 on Windows, it is converted to UTF-16 onto the stack just before the syscall.
On Monday, April 9, 2018 at 6:36:57 PM UTC-4, Niall Douglas wrote:- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?I am not sure if std::byte* has the same aliasing rules as char*.
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.
That's probably ignorance on my part, but the char* does indeed greatly simplify implementation as well.I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.Literally exactly how it works. If you supply UTF-8 on POSIX, it is fed through unchanged. If you supply UTF-16 on Windows, it is fed through unchanged. If you supply UTF-8 on Windows, it is converted to UTF-16 onto the stack just before the syscall.
That's disconcerting, since stack space is finite. How do you deal with Windows's extremely long pathnames? `std::filesystem::path` on Windows will be able to use their `\\?\` syntax for long paths painlessly and transparently.
\!!\
, we pass the path + 3 characters directly through. This prefix is a pure AFIO extension, and will not be recognised by other code.\??\
, we pass the path + 0 characters directly through. Note the NT kernel keeps a symlink at \??\
which refers to the DosDevices namespace for the current login, so as an incorrect relation which you should not rely on, the Win32 path C:\foo
probably will appear at \??\C:\foo
.\\?\
which is used to tell a Win32 API that the remaining path is longer than a DOS path.\\.\
which since Windows 7 is treated exactly like \\?\
.Will `path_view` provide the same? Will it make every system call take up 64KB of stack space?
On Tuesday, April 10, 2018 at 12:05:34 AM UTC+1, Nicol Bolas wrote:On Monday, April 9, 2018 at 6:36:57 PM UTC-4, Niall Douglas wrote:- I get that using char* rather than void* in buffer related APIs simplifies pointer handling, but wouldn't C++17's std::byte be a better choice?I am not sure if std::byte* has the same aliasing rules as char*.
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)
That's probably ignorance on my part, but the char* does indeed greatly simplify implementation as well.I don't really see a reason for this to work differently on Windows vs. Linux: Let path_view remember what it points at and let the underlying system convert to whatever the syscall needs on the particular OS. Then, obviously, the optimum performance will be attained with different data types for the file name but this should not be a concern on this low level, it should always *work*.Literally exactly how it works. If you supply UTF-8 on POSIX, it is fed through unchanged. If you supply UTF-16 on Windows, it is fed through unchanged. If you supply UTF-8 on Windows, it is converted to UTF-16 onto the stack just before the syscall.
That's disconcerting, since stack space is finite. How do you deal with Windows's extremely long pathnames? `std::filesystem::path` on Windows will be able to use their `\\?\` syntax for long paths painlessly and transparently.The reference implementation implements that, plus a few others:
- For any paths beginning with
\!!\
, we pass the path + 3 characters directly through. This prefix is a pure AFIO extension, and will not be recognised by other code.- For any paths beginning with
\??\
, we pass the path + 0 characters directly through. Note the NT kernel keeps a symlink at\??\
which refers to the DosDevices namespace for the current login, so as an incorrect relation which you should not rely on, the Win32 pathC:\foo
probably will appear at\??\C:\foo
.\\?\
which is used to tell a Win32 API that the remaining path is longer than a DOS path.\\.\
which since Windows 7 is treated exactly like\\?\
.I doubt those will enter any standard, but a note to implementers will be added. The \??\ prefix is the fastest by far, NT uses memcmp() instead of slow locale-specific mapping. You can, quite literally, call a random number generator and pass the untranslated random data in as a filename if prefixed with \??\. Works fine, though nothing in Windows can cope with that file.Will `path_view` provide the same? Will it make every system call take up 64KB of stack space?Yes.But it's not a problem in any typical code. Most people will use UTF-8 literals, which are pass through on POSIX, so this only affects Windows. And most people don't use long string literals. Paths retrieved on Windows from some API are usually in UTF-16. That gets passed through. Windows also does not require zero termination if using the NT API, so you can safely use subset views of paths without copying onto the stack.And finally, even in the worst case, Windows does not care much about 64Kb on the stack, which would only occur anyway if someone is passing in a 32Kb path, which is exceedingly rare. I think it a fair design compromise, and memory allocation and most memory copying is entirely avoided.Niall
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/b27a7889-636f-428e-8f0e-bd60dfcf8309%40isocpp.org.
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?
2. Before C++ 17, byte-lite uses struct byte { typedef unsigned char type; type v; }; . If Martin changes that to union, would that cause the compiler to assume the type can alias?
3. Would __attribute__((__may_alias__)) be of use here?
Niall--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/9d8ce0b1-d348-42b2-b2e5-8f3c854593a5%40isocpp.org.,
3. Would __attribute__((__may_alias__)) be of use here?Yes (assuming your compiler supports it, of course). In practice, at least clang and GCC recognize the std::byte type and give it special treatment, but I think a regular enum class with that attribute would be equivalent.(At least, for now -- there are some proposed C++20 changes that might require std::byte to receive special treatment in constant expression evaluation.)
On Thu, 12 Apr 2018, 11:22 Niall Douglas, <nialldo...@gmail.com> wrote:FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?Yes. There is in general no aliasing relationship between an enumeration and its underlying type.
struct byte { enum type : unsigned char {}; type v; };
FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?Yes. There is in general no aliasing relationship between an enumeration and its underlying type.
So, would the following prevent aliasing:
struct byte { enum type : unsigned char {}; type v; };
Yes. There is in general no aliasing relationship between an enumeration and its underlying type.
struct byte { enum type : unsigned char {}; type v; };
?
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/2f6608af-3284-4453-b37e-1934056f10ef%40isocpp.org.
On Fri, 13 Apr 2018, 13:58 Martin Moene, <m.j....@eld.physics.leidenuniv.nl> wrote:
On Friday, April 13, 2018 at 12:31:21 PM UTC+2, Niall Douglas wrote:FYI: it does. Indeed, the whole point of `byte*` is to give us a way to stop using `char*` for non-string purposes.https://github.com/martinmoene/byte-lite appears to implement byte right back to C++ 98. Ok, I'll byte :)Only if you compile with -fno-strict-aliasing or equivalent. Otherwise that implementation fails to provide the std::byte aliasing guarantees.1. On C++ 17 or later, byte-lite uses enum class byte : unsigned char {}; . I take it from you that the compiler is permitted to assume that strong enums never alias despite the underlying char type?Yes. There is in general no aliasing relationship between an enumeration and its underlying type.
So, would the following prevent aliasing:
struct byte { enum type : unsigned char {}; type v; };
You misunderstand.std::byte* is specifically guaranteed in the standard to be able to alias any other type, same as char*. Your implementation does not do this. It is therefore unsafe under optimisation.Niall
I'm not referring to std::byte.
Richard Smith writes:Yes. There is in general no aliasing relationship between an enumeration and its underlying type.
(emphasis mine).
My question is: does this also hold forstruct byte { enum type : unsigned char {}; type v; };
?Yes. The only types with the special "can alias anything" property (in a compiler not supporting structure::byte) are char, signed char, and unsigned char. Not enums, not structs.
signed char
. clang does not warn, but does remove the initialization with 5.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/451cff31-2b00-4ed8-9966-189d2b97e4b0%40isocpp.org.
On 5 April 2018 at 08:20, Niall Douglas <nialldo...@gmail.com> wrote:
> All those are very trivial. They look exactly like doing the same in POSIX
> syscalls. Perhaps that's the point of showing them?
>
> Are there any less trivial examples you'd like to see?
I suggest that you tell us. At the moment I can't understand why I'd
use this over either 'read/write' or 'mmap' from POSIX.
I like it.
Section 4, page 13, under unlink_on_close it states "Causes the entry in the filesystem to disappear on first close by any process in the system."
Shouldn't that be "...last close by the processes..." (Which is what FILE_FLAG_DELETE_ON_CLOSE does in Windows.)
Also curious whether native_handle_type would allow duplicating the handle/descriptor.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3053aa79-f0e0-46cd-b92b-535469a7f02b%40isocpp.org.
Niall,A note:Section 3 bullet 2 suggests that Coroutines have been voted into C++20; they have not been.
Please ensure this proposal is seen by SG1 as a number of the features will have an impact on the memory model.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/a83d08b2-6833-4f9e-b444-52b7facf5836%40isocpp.org.
Coroutines is published in a Technical Specification, ISO/IEC TS 22277
Fair warning - if this comes to LEWG, one of the first questions is going to be "has SG1 seen this". The proposal has the words "concurrent", "mutex", and "atomic" in it; SG1 will need to see this.
I'd like to see you be able to get feedback on your proposal at the next meeting; these features are important and I'm glad to see someone bring a proposal for them.But if the proposal comes before LEWG before SG1 has seen it, you may miss out on a valuable opportunity to make progress on this proposal at this meeting. LEWG does not have the ability to review this; SG1 does. You will probably get asked by LEWG to take it to SG1. This means you will lose time, because you will have to try to get scheduled in SG1's queue mid-meeting, and then you would have to get time scheduled with LEWG again.
Also, keep in mind that the chairs select which papers their groups review. This paper will probably be put in the SG1 queue either way; it's probably best for you to be aware of that and request things be scheduled such that SG1 sees it before LEWG does, to ensure the best chance of you making forward progress.I would suggest that you email the LEWG chair, Titus Winters, and the SG1 chair, Olivier Giroux, and ask them how to proceed.
On Saturday, April 14, 2018 at 1:52:07 AM UTC-7, Niall Douglas wrote:
> we also make a ton of use of span<T>, so all code which understands
> span<T> automagically works with no extra boilerplate needed.
> How's that? Reasonable compelling?It seems this low-level file I/O paper uses the same flawed reasoning for memory buffers that was proffered during the Boost.Beast formal review. There was strong objection to Boost.Asio's ConstBufferSequence and MutableBufferSequence concepts. I believe the phrase used was "10 year old outdated technology."
ConstBufferSequence requirements
<https://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/ConstBufferSequence.html>
MutableBufferSequence requirements
<https://www.boost.org/doc/libs/1_66_0/doc/html/boost_asio/reference/MutableBufferSequence.html>In particular you didn't understand why elements of buffer sequences needed to be convertible to boost::asio::const_buffer and boost::asio::mutable_buffer. I'll note that these concepts are now part of Networking.TS and very likely will not change before being voted into the standard. So they are very much with us, and new papers which work with buffer sequences (e.g. libraries components which wrap calls to ::readv, ::writev) need to be harmonious with established practice.
The missing ingredient is that since const_buffer and mutable_buffer are separate types, library implementors have the option of conditionally compiling in code for doing buffer debugging. This is particular useful on MSVC's standard library which supports checked iterators:
<https://www.boost.org/doc/libs/1_67_0/doc/html/boost_asio/overview/core/buffers.html#boost_asio.overview.core.buffers.buffer_debugging>
The buffer debugging feature is a natural fit for [networking.ts] buffer sequences because the function template std::experimental::net::buffer() has the container's type information before performing the type-erasure implied by converting to const_buffer or mutable_buffer. Therefore, the necessary std::function<> for doing the debugging can be stashed away in the buffer object and invoked later (when the macro for enabling buffer debugging is set).
TL;DR: Any low-level file I/O paper should use the buffer sequence concepts from [networking.ts] (remember the Vasa).
> we also make a ton of use of span<T>, so all code which understands
> span<T> automagically works with no extra boilerplate needed.
> How's that? Reasonable compelling?It seems this low-level file I/O paper uses the same flawed reasoning for memory buffers that was proffered during the Boost.Beast formal review. There was strong objection to Boost.Asio's ConstBufferSequence and MutableBufferSequence concepts. I believe the phrase used was "10 year old outdated technology."
In particular you didn't understand why elements of buffer sequences needed to be convertible to boost::asio::const_buffer and boost::asio::mutable_buffer.
I'll note that these concepts are now part of Networking.TS and very likely will not change before being voted into the standard. So they are very much with us, and new papers which work with buffer sequences (e.g. libraries components which wrap calls to ::readv, ::writev) need to be harmonious with established practice.
The missing ingredient is that since const_buffer and mutable_buffer are separate types, library implementors have the option of conditionally compiling in code for doing buffer debugging. This is particular useful on MSVC's standard library which supports checked iterators:
<https://www.boost.org/doc/libs/1_67_0/doc/html/boost_asio/overview/core/buffers.html#boost_asio.overview.core.buffers.buffer_debugging>
The buffer debugging feature is a natural fit for [networking.ts] buffer sequences because the function template std::experimental::net::buffer() has the container's type information before performing the type-erasure implied by converting to const_buffer or mutable_buffer. Therefore, the necessary std::function<> for doing the debugging can be stashed away in the buffer object and invoked later (when the macro for enabling buffer debugging is set).
TL;DR: Any low-level file I/O paper should use the buffer sequence concepts from [networking.ts] (remember the Vasa).
LLFIO seems to want to deal directly in a single, contiguous span of bytes. If you need to do reads into multiple spans, you either do virtual memory gymnastics to make the two spans appear as a contiguous array (which I think LLFIO wants to allow you to do) or you do multiple reads. Because that's how file IO works at the low levels.