file streams and access to the file descriptor

1,502 views
Skip to first unread message

torto...@gmail.com

unread,
Aug 7, 2017, 4:35:24 AM8/7/17
to ISO C++ Standard - Future Proposals
I thought something like this had already been proposed but I could not find the link. Does anyone recall and previous discussion?
I present this as something which could be turned into a proposal if the group thinks there is any merit.

Motivation

Some operations on a POSIX system, notably locking, require access to a low-level file descriptor.

The C++ standard does not mandate any mechanism to contruct a stream buffer from a file descriptor
or to obtain the underlying file descriptor though it is almost certainly accessible in an implementation
on a POSIX system.
Many implementations of libstdc++ provide their own extensions allowing a buffer to be constructed from one
directly.

The canonical and arguably more portable method of doing this is for the user to derive a new streambuf themselves
based on stdio functions. There are a few variants of this available off the shelf (e.g. stdio_buffer).

This is the typical answer given to the FAQ of how to do this on stack overflow and elsewhere.

This is 'portable' because libc and stdio are required if even POSIX is not.

Proposal

I propose extending std::filebuf with:

  • a constructor accepting a representation of a low level file descriptor
  • a method for obtaining the representation of the underlying low level file descriptor

Options for the low level file descriptor representation:

  • a naked file descriptor. i.e. an integer
  • a FILE*
  • a struct wrapping a file descriptor which in POSIX is just an integer

A naked file descriptor is just an integer and therefore a very leaky abstraction.
However this is the abstraction that is most often available in existing implementations (notably gcc).

A FILE* mandates C stdio behaviour which may include buffering (see for example https://stackoverflow.com/questions/2423628/whats-the-difference-between-a-file-descriptor-and-file-pointer).
It may be that there are use cases where buffering identical to stdio is desired. These are not considered by this proposal.

I therefore suggest the methods make use of a 'file descriptor' struct which is opaque to the standard
but allows implementation defined access to the low level representation.

On a POSIX system this could be a POD type like:

struct std::file_descriptor
{
   int fileDes;
};

There already exist feature testing macros for POSIX itself.
The C++ standard could recognise the existance of _POSIX_C_SOURCE and if defined mandate the fileDes member exists.

This does introduce a coupling between ISO C++ and POSIX but it is a trivial one.

On non-POSIX systems the type would remain opaque but there would still be a standard way
to obtain low level access that might have similar application on those platforms.

It is notable that the filesystem ts introduces several Unix friendly functions such as those
for dealing with symbolic links. This proposal aims to go in a similar direction.

use cases

The primary use case considered is to provide interoperability with the POSIX functions
 fcntl, lockf & fdopen.

Specifically to enable:
  • file locking

'file' descriptor access also enables the creation of:

  • streams representing a pipe.


alternatives to this proposal:

  • provide an implementation of stdio_buffer in the library which is required to be based on FILE*
  • provide an implementation of stdio_buffer in the library which is based on a file descriptor if _POSIX_C_SOURCE is defined
  • provide higher-level interfaces for the use cases considered
  • do nothing. Leave the status quo as is.
I argue against the "do nothing" option as this is a frequent requirement on POSIX based systems
as evidenced by questions on stack overflow and elsewhere going back many years.
See: https://www.google.co.uk/search?q=fstream+and+file+descriptor

and notably:
 https://stackoverflow.com/questions/2746168/how-to-construct-a-c-fstream-from-a-posix-file-descriptor

Implementing a fdbuf is also described in Nicolai Josuttis's "The C++ Standard Library".

Providing higher-level interfaces is fraught with implementation defined issues which would need to be worked out in full.
Such proposals might require something like this proposal as a low-level interface on which to build
or they might use an entirely different mechanism e.g. http://www.boost.org/doc/libs/1_44_0/doc/html/boost/interprocess/file_lock.html

Relation to previous proposals

Though this must come up often. I have not seen a recent concrete proposal.
The most recent discussion I can find is:

https://groups.google.com/a/isocpp.org/forum/?fromgroups#!searchin/std-proposals/posix/std-proposals/Q4RdFSZggSE/RcRWo1S7dKsJ

Boost ASIO includes a posix::stream_descriptor (see http://www.boost.org/doc/libs/1_47_0/doc/html/boost_asio/reference/posix__stream_descriptor.html).
This does not appear in the networking TS proposal - http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4656.pdf

Ville Voutilainen

unread,
Aug 7, 2017, 4:44:53 AM8/7/17
to ISO C++ Standard - Future Proposals
On 7 August 2017 at 11:35, <torto...@gmail.com> wrote:
> I thought something like this had already been proposed but I could not find
> the link. Does anyone recall and previous discussion?
> I present this as something which could be turned into a proposal if the
> group thinks there is any merit.
>
> Motivation
>
> Some operations on a POSIX system, notably locking, require access to a
> low-level file descriptor.
>
> The C++ standard does not mandate any mechanism to contruct a stream buffer
> from a file descriptor
> or to obtain the underlying file descriptor though it is almost certainly
> accessible in an implementation
> on a POSIX system.
> Many implementations of libstdc++ provide their own extensions allowing a
> buffer to be constructed from one
> directly.
>
> The canonical and arguably more portable method of doing this is for the
> user to derive a new streambuf themselves
> based on stdio functions. There are a few variants of this available off the
> shelf (e.g. stdio_buffer).
>
> This is the typical answer given to the FAQ of how to do this on stack
> overflow and elsewhere.
>
> This is 'portable' because libc and stdio are required if even POSIX is not.

In general, yes, please, it would be about time C++ gains this
functionality in a semi-portable manner.

> Proposal
>
> I propose extending std::filebuf with:
>
> a constructor accepting a representation of a low level file descriptor
> a method for obtaining the representation of the underlying low level file
> descriptor

Why not just add a stdio_filebuf and a native_filebuf (for the native
descriptor)? Why should
we extend filebuf?

> Options for the low level file descriptor representation:
>
> a naked file descriptor. i.e. an integer
> a FILE*
> a struct wrapping a file descriptor which in POSIX is just an integer

If we want to be minimal, fdopen() already gives us a FILE* from a
descriptor, and fileno() gives a descriptor
from a FILE*. If we want to provide convenience, we should support
creating streams from both a FILE* and
a descriptor. I suggest we should provide convenience.

torto...@gmail.com

unread,
Aug 7, 2017, 5:33:59 AM8/7/17
to ISO C++ Standard - Future Proposals
What would be the motivating use cases for providing both stdio_filebuf and a native_filebuf rather than extending filebuf
which seems to me like a more simple change?

If you want convenience you could add a FILE* method as well as a file descriptor based one.
I think to do that you might have to consider the subtle interactions between FILE* buffering and fstream buffering.

There is already the case of what to do with file operations made via a file descriptor (or FILE*) rater than via the fstream.
My initial thought was any read, write or seek operation could make the subsequent state of the fstream undefined (due to differences in buffering behaviour)
Otherwise we must mandate that filebuf is based on FILE* or filedes to guarantee it is defined.
Perhaps that is why you prefer stdio_filebuf and native_filebuf?

Niall Douglas

unread,
Aug 8, 2017, 7:15:01 PM8/8/17
to ISO C++ Standard - Future Proposals
The primary use case considered is to provide interoperability with the POSIX functions
 fcntl, lockf & fdopen.

Specifically to enable:
  • file locking

AFIO also provides a "filesystem template algorithms library (the FTL)"  which implements file locking via multiple algorithms: https://ned14.github.io/afio/namespaceafio__v2__xxx_1_1algorithm_1_1shared__fs__mutex.html

Niall

torto...@gmail.com

unread,
Aug 8, 2017, 8:27:57 PM8/8/17
to ISO C++ Standard - Future Proposals

I'm not sure I understand the relevance here. There are many libraries that provide support for file locking. Is AFIO on track to become part of the standard?
or are you just suggesting that alternative approaches (like handling locking entirely independently of fstream) are in general better?
Even if that is true, does it eliminate all realistic uses for accessing the file descripter of an fstream?

Regards,

Bruce.

Niall Douglas

unread,
Aug 8, 2017, 9:49:14 PM8/8/17
to ISO C++ Standard - Future Proposals

I'm not sure I understand the relevance here. There are many libraries that provide support for file locking. Is AFIO on track to become part of the standard?
or are you just suggesting that alternative approaches (like handling locking entirely independently of fstream) are in general better?
Even if that is true, does it eliminate all realistic uses for accessing the file descripter of an fstream?
 
I'm in favour of iostreams exposing the underlying native handle as you propose.

I am not in favour of any C++ standard proposing any file locking API except the one AFIO implements. AFIO's is as portable as it can be, and even then I can tell you in advance that WG21 will be appalled at the lack of ability to implement consistent behaviours across platforms.

The Filesystem Template Library is my proposal for leveraging C++ genericity to work around those platform specific differences in a viable way. It's taken many years of work to reach even this stage where I don't expect AFIO to land before WG21 until at least 2020. This stuff is very hard to get right, let alone standardise. POSIX has badly ballsed it up already, and efforts to fix things there have also taken years to get to now with many years remaining till fixing the standard can land. But we're all working it as fast as we can. This is all unpaid effort, and inertia to change is immense.

Niall

Niall Douglas

unread,
Aug 8, 2017, 9:54:36 PM8/8/17
to ISO C++ Standard - Future Proposals

I therefore suggest the methods make use of a 'file descriptor' struct which is opaque to the standard
but allows implementation defined access to the low level representation.

On a POSIX system this could be a POD type like:

struct std::file_descriptor
{
   int fileDes;
};


Also a further data point: this is the file descriptor struct which Boost peer review consensus came up with after the v1 AFIO peer review:


It's basically a bitfield for metadata and a union for constexpr init, int fd, int pid and void *handle.

This was felt by the Boost community to be the correct design at the time of the review.

Niall

Thiago Macieira

unread,
Aug 8, 2017, 11:26:39 PM8/8/17
to std-pr...@isocpp.org
On Tuesday, 8 August 2017 18:54:36 PDT Niall Douglas wrote:
> It's basically a bitfield for metadata and a union for constexpr init, int
> fd, int pid and void *handle.

What if it's a file descriptor that represents a process? :-)

See FreeBSD's pdfork(2) function and the proposed Linux CLONE_FD[1] (stuck in
review waiting for someone with ptrace knowledge to help us).

[1] https://lkml.org/lkml/2015/3/15/10

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Niall Douglas

unread,
Aug 8, 2017, 11:43:12 PM8/8/17
to ISO C++ Standard - Future Proposals
On Wednesday, August 9, 2017 at 4:26:39 AM UTC+1, Thiago Macieira wrote:
On Tuesday, 8 August 2017 18:54:36 PDT Niall Douglas wrote:
> It's basically a bitfield for metadata and a union for constexpr init, int
> fd, int pid and void *handle.

What if it's a file descriptor that represents a process? :-)

Absolutely right. Some platforms represent process handles differently to process ids, some view them as the same. And that's even amongst POSIX implementations, let alone other OSs. QNX for example has the notion of a "triplet", so machine + process + fd which can, with UB, be squeezed into an unsigned long long.

AFIO only concerns itself with file i/o, so the native_handle_type linked to only makes use of that part rather than properly specifying the whole thing. But the overall design of kind-of-bitfield + union-of-intish-like-things was the design proposed by the peer review as they felt my approach in v1 AFIO was completely wrong. I find it acceptable personally.

In terms of what best suits the OP's design, either he takes the long road and specifies out properly a native_handle_type which is correct and everybody would find a great value add, or he takes the short road and just returns an int. He should be aware that on MSVCRT, he needs to return BOTH an int AND a void * because MSVCRT wraps up its HANDLE with a partial fd emulation. And end using code needs to know about both depending on their use case.

It's for all these reasons this stuff isn't in the standard already. But I agree with the OP that it should be. Someone just needs to push it over the line.

Niall

torto...@gmail.com

unread,
Aug 9, 2017, 3:44:27 AM8/9/17
to ISO C++ Standard - Future Proposals

I'm against returning a naked file no. I put it up there to show it as the "simple and wrong" answer to be argued against. I was proposing a more opaque (and thus implementation defined) structure. I don't see that as incompatible with a proper native hande. It could easily be an incremental step towards the structure you need. I don't think there is any way I could justify the PID field at this point. That could be added in a separate "native file handle" proposal which you sound in a good position to lay the ground work for.

I was just hoping to fix a simple and obvious defect without being too ambitious. With that in mind can you suggest good arguments for just returning the handle versus providing stdio_filebuf and native_filebuf as Ville suggested?

Regards,

Bruce.

Niall Douglas

unread,
Aug 9, 2017, 9:38:11 AM8/9/17
to ISO C++ Standard - Future Proposals
I was just hoping to fix a simple and obvious defect without being too ambitious. With that in mind can you suggest good arguments for just returning the handle versus providing stdio_filebuf and native_filebuf as Ville suggested?


I'm also warm on Ville's suggestion.

Most STL implementations use a C FILE * under the bonnet for iostreams anyway, so his suggestion makes sense.

If you do write up a proposal, every major STL already provides a proprietary mechanism for retrieving the underlying native handle. It would useful if all those mechanisms were surveyed before you propose your standardised mechanism.

Niall

torto...@gmail.com

unread,
Aug 9, 2017, 7:49:49 PM8/9/17
to ISO C++ Standard - Future Proposals

Absolutely. Though C++ 'standard'isation is in an odd (but good) place these days. It used to be you standardised what people had been doing for years anyway but nowadays the aim is often to improve on the current working practice instead.
 

torto...@gmail.com

unread,
Aug 9, 2017, 8:52:47 PM8/9/17
to ISO C++ Standard - Future Proposals

For the case where filebuf is implemented as a FILE* or a FD based streambuf how do we discourage (or make safe) the case where users make their code less portable by assuming that filebuf is a stdio_buffer or native_buffer?
I think it would be going too far to mandate that filebuf is in fact a stdio_buffer and forcing existing implementations that aren't to sacrifice performance or whatever other gains they may have.

Perhaps the type of buffer should be an additional template parameter to a basic_fstream?
The new parameter would require a streambuf with close() and is_open() functions. The open function is a bit more problematic as that is the one you want to be able to alter for new kinds of file like buffer thing (e.g pipe, socket, memory mapped file).
It feels like quite a weakly defined concept.

Or perhaps following your suggestion that a file descriptor is a union that includes FILE* and FD we need only modify filebuf, essentially as I originally suggested. It would need to be constructable from both a FILE* and a FD (though not naked).
I am happy with the file_descriptor that goes in being the one that comes out. I am also happy with extra information being included where its available (e.g. where FILE* and FDs are easily interchanged via OS APIs)
but presumably there needs to be a way (a template to specialise) for library vendors to provide implementations using just FILE* and/or just FD as they see fit.

What platforms beyond POSIXish and Windows need to be considered? Where in the standard is that documented?

Bruce.

Niall Douglas

unread,
Aug 9, 2017, 10:24:08 PM8/9/17
to ISO C++ Standard - Future Proposals

What platforms beyond POSIXish and Windows need to be considered? Where in the standard is that documented?

Historically both C and C++ have taken the view that only POSIX need be considered. But it's been changing, the Filesystem TS goes far out of its way to accommodate Windows without saying so explicitly except in footnotes. 

I'd start with int fd's and go from there. Your review of existing practice is what will persuade that your design is reasonable, and I'm afraid that's just donkey work of reading lots of documentation and summarising into your proposal with lists of URLs to the source references.

Niall

Nicol Bolas

unread,
Aug 9, 2017, 11:32:22 PM8/9/17
to ISO C++ Standard - Future Proposals
On Wednesday, August 9, 2017 at 10:24:08 PM UTC-4, Niall Douglas wrote:

What platforms beyond POSIXish and Windows need to be considered? Where in the standard is that documented?

Historically both C and C++ have taken the view that only POSIX need be considered.

... they have? I don't recall any specific parts of those standards that were non-POSIX-hostile. Indeed, the fact that `wchar_t` exists at all (and is implementation-defined), rather than keeping to the POSIX assumption of byte-sized `char`s suggests quite a lot of consideration of non-POSIX systems.

Niall Douglas

unread,
Aug 9, 2017, 11:58:09 PM8/9/17
to ISO C++ Standard - Future Proposals


Historically both C and C++ have taken the view that only POSIX need be considered.

... they have? I don't recall any specific parts of those standards that were non-POSIX-hostile.

There's a fair chunk of the C standard which was painful/impossible to implement on Windows, even up to the C11 standard.
 
Indeed, the fact that `wchar_t` exists at all (and is implementation-defined), rather than keeping to the POSIX assumption of byte-sized `char`s suggests quite a lot of consideration of non-POSIX systems.

wchar_t is widely used by Java, Python, lots of other stuff besides Windows.

Niall 

Myriachan

unread,
Aug 11, 2017, 10:30:42 PM8/11/17
to ISO C++ Standard - Future Proposals

^^^ This.  Windows isn't really any more difficult to implement C on than any other general-purpose operating system.  Probably the weirdest stuff for Windows would involve stdio's handling of text versus binary files, which among the popular systems only matters on Windows.

Anyway...

A problem that I see here is that the definition of "native file handle" on Windows isn't straightforward like it is in the POSIX world.  Microsoft uses two underlying file handles for each FILE or basic_filebuf: the POSIX-like file number, and the Win32 file handle.

In Windows, something to keep in mind is that the C language isn't hardwired into the user-mode interface to the operating system.  The POSIX world tends to have libc as where the kernel system call interfaces are located, whereas Windows separates the concepts of C runtime and the kernel interface.  Applications are supposed to provide their own C runtime if one is needed.

Microsoft's C design implements some of the POSIX file functions within the C runtime library, not within the operating system.  Microsoft's open() returns an int, but this int is simply an index into internal tables within the C runtime library you're using.  This table contains, among other things, the mapping to the Win32 file handle.  The Win32 file handle is the true native file handle, usually.  (There is a corner case to this I won't go into.)  The Win32 file handle is nominally of type HANDLE, which is just a typedef for void *.

MinGW uses the Microsoft C runtime for the most part.  (It should have provided its own C runtime, given the "rules" of Windows, but it doesn't.  See https://blogs.msdn.microsoft.com/oldnewthing/20140411-00/?p=1273 )

So I ask: which "native file handle" should such an API return on Windows?

Melissa

torto...@gmail.com

unread,
Aug 12, 2017, 3:53:01 AM8/12/17
to ISO C++ Standard - Future Proposals

I would say its implementation defined. The minimum interface would be to allow access to a mostly opaque from the C++ standards point of view (but not from the implementations point of view) file descriptor type. I would like to have the standard recognise _POSIX_C_SOURCE (http://pubs.opengroup.org/onlinepubs/007904975/functions/xsh_chap02_02.html) and then mandate that it contains a FD that can be used with fcntl etc. If we follow the reasoning that one of the important use cases is locking files then on windows it should contain a HANDLE as that is what the LockFile() and LockFileEx() functions of win32 take. You can't mandate that in the standard though unless you also make it recognise Windows and give it special treatment. I'm assuming you can't define _POSIX_C_SOURCE in windows so please correct me if I'm wrong.


For arguments sake though why shouldn't FILE* and HANDLE=void* be used interchangeably? We could require the file descriptor struct contains a pointer which on Posix is a FILE* and on windows is a HANDLE. You can get the integer FD on Posix using fileno(). In fact why not go further and have the opaque file descriptor just be a pointer which is FILE* if _POSIX_C_SOURCE is defined and HANDLE=void* on Windows. What would that break and what limits would it impose?


Bruce.


 

Niall Douglas

unread,
Aug 12, 2017, 11:27:52 AM8/12/17
to ISO C++ Standard - Future Proposals


For arguments sake though why shouldn't FILE* and HANDLE=void* be used interchangeably? We could require the file descriptor struct contains a pointer which on Posix is a FILE* and on windows is a HANDLE. You can get the integer FD on Posix using fileno(). In fact why not go further and have the opaque file descriptor just be a pointer which is FILE* if _POSIX_C_SOURCE is defined and HANDLE=void* on Windows. What would that break and what limits would it impose?


You may also wish to retrieve the underlying FILE * on Windows.

Also you can't assume HANDLE = void *. If you turn on strict type checking in windows.h, it won't be.

Niall 

Dietmar Kühl

unread,
Aug 12, 2017, 1:27:48 PM8/12/17
to std-pr...@isocpp.org
On 12 Aug 2017, at 16:27, Niall Douglas <nialldo...@gmail.com> wrote:
> You may also wish to retrieve the underlying FILE * on Windows.
>
> Also you can't assume HANDLE = void *. If you turn on strict type checking in windows.h, it won't be.

Note that the implicit assumption that basic_filebuf has to be implemented in terms of FILE* is wrong! Despite the description of the standard defining behavior in terms of <stdio.h> there is no mandate that it is implemented that way. I know that Plauger's *choice* was to do so but other choices are possible and are arguably more reasonable.

My implementation certainly travels in terms of file descriptors (I never really cared about Windows). Using a FILE* would imply that either filebuf really is a FILE (i.e., the buffers are conflated, that's the libstdc++ choice when based on glibc as far as I know) or that independent buffers are used making file streams noticably slower. On POSIX it is common that a FILE* can be constructed from a file descriptor. If I had to create a FILE* from a stream it could be done but the underlying buffering data structures would be entirely separate.

If there should be access to some underlying structure I'd argue against anything which proposes something different than access to a native handle. That would be a non-owning representation of a possibly implement defined type (rather than an unspecified type). An implement would then chiise what to return: a file descriptor, a FILE*, a HANDLE, or whatever else it sees fit.

If users want to use something different I think they are best off just creating their own stream buffer in term of whatever file access they see fit. We shouldn't waste much time on a fairly niche requirement and we shall certainly not impose any constraints limiting implementation freedom for something like that! Streams are user-extensible and there is no need to put everything into them. That is quite different to FILE* based operations which are not user extensible.

torto...@gmail.com

unread,
Aug 13, 2017, 7:39:24 PM8/13/17
to ISO C++ Standard - Future Proposals


On Saturday, 12 August 2017 18:27:48 UTC+1, Dietmar Kühl wrote:
On 12 Aug 2017, at 16:27, Niall Douglas <nialldo...@gmail.com> wrote:
> You may also wish to retrieve the underlying FILE * on Windows.
>
> Also you can't assume HANDLE = void *. If you turn on strict type checking in windows.h, it won't be.

Note that the implicit assumption that basic_filebuf has to be implemented in terms of FILE* is wrong! Despite the description of the standard defining behavior in terms of <stdio.h> there is no mandate that it is implemented that way. I know that Plauger's *choice* was to do so but other choices are possible and are arguably more reasonable.

I didn't think there was any such assumption (not from me anyway) but lets make it explicit that there isn't just to be sure.
 
My implementation certainly travels in terms of file descriptors (I never really cared about Windows). Using a FILE* would imply that either filebuf really is a FILE (i.e., the buffers are conflated, that's the libstdc++ choice when based on glibc as far as I know) or that independent buffers are used making file streams noticably slower. On POSIX it is common that a FILE* can be constructed from a file descriptor. If I had to create a FILE* from a stream it could be done but the underlying buffering data structures would be entirely separate.

If there should be access to some underlying structure I'd argue against anything which proposes something different than access to a native handle. That would be a non-owning representation of a possibly implement defined type (rather than an unspecified type). An implement would then chiise what to return: a file descriptor, a FILE*, a HANDLE, or whatever else it sees fit.

If users want to use something different I think they are best off just creating their own stream buffer in term of whatever file access they see fit. We shouldn't waste much time on a fairly niche requirement and we shall certainly not impose any constraints limiting implementation freedom for something like that! Streams are user-extensible and there is no need to put everything into them. That is quite different to FILE* based operations which are not user extensible.


This seems to have come full circle. Let's try and summarise where we are:

Requirement:

* A streambuf for reading and writing a file:
* which has an additional constructor based on a file descriptor of some kind
* and which provides a member function to obtain that descriptor

On a Posix system there needs to be a way of converting the descriptor returned into an integer FD so that functions like fcntl can be invoked.

This requirement must not over constrain the implementation of std::filebuf.
If there is any indication of that it should be met by a new derived class of streambuf rather than by altering std::filebuf

Two suggestions were given for new streambuf derivatives very similar to filebuf:
* stdio_filebuf - based on a FILE*
* native_filebuf - based on a native file handle

Aside from the type given to the "file descriptor" based constructor and the "file descriptor" getter,
these are identical to each other and to std::filebuf as it is now.

What justification is there for having both a stdio_filebuf and a native_filebuf?

What justification is there for std::filebuf not being the same as native_filebuf?

We have justification that std::filebuf must not be a stdio_filebuf as this would constrain implementations too much.

---

I don't see a strong need for stdio_filebuf on a POSIX system.
Assuming native_filebuf is based on a wrapped file descriptor you can construct a native_filebuf from a FILE* using fileno() to get a naked file descriptor
which you can then wrap and pass to a native_filebuf.

If std::filebuf is a native_filebuf you can retrieve the FD as required. On Posix you can unwrap the naked file descriptor to access fcntl() etc.
If you need to you can (try to) convert it into a FILE* using fdopen().

Ville Voutilainen

unread,
Aug 14, 2017, 12:47:53 AM8/14/17
to ISO C++ Standard - Future Proposals
On 14 August 2017 at 02:39, <torto...@gmail.com> wrote:
> What justification is there for having both a stdio_filebuf and a
> native_filebuf?

stdio_filebuf's API is completely portable. It doesn't use any
non-portable types.
native_filebuf does.

torto...@gmail.com

unread,
Aug 14, 2017, 5:26:24 AM8/14/17
to ISO C++ Standard - Future Proposals

Portability is not a justification for existence. There needs to be a use for it as well.
It does allows you to trivially implement a conforming (but non optimal) std::filestream  but of what use is that?
Does it give any useful guarantees beyond those for std::filebuf?
The only thing I can think of is synchronisation with C stdio, which might allow the standard to drop sync_with_io() from elsewhere.

I've never quite understood the use for that myself.
I guess it would help if you wanted to use streams while accessing the same file simultaneously from a C library
but that always sounded like a "bad thing" to me.

I note here http://en.cppreference.com/w/cpp/io/ios_base/sync_with_stdio that it says:

  "By default, all eight standard C++ streams are synchronized with their respective C streams."


That could imply cin, cout & cerr are filestreams based on stdio_filebuf rather than native_filebuf. That sounds like it would be limiting (e.g. performance wise).
You could allow setfilebuf() to switch the type from native_filebuf to stdio_filebuf but I think that would be painful to implement.

So far the only 'non portable' part of native_filebuf is the file descripter.
What is wrong with leaving that mostly implementation defined so that use of that use that part of the API is non-portable?

torto...@gmail.com

unread,
Aug 14, 2017, 5:31:05 AM8/14/17
to ISO C++ Standard - Future Proposals, torto...@gmail.com

Incidentally can't you implement sync_with_stdio() by just calling fsync() on the native descriptor before each stream operation and flush() on the stream after wards.
Is there some other trick that sync_with_stdio enables better performance?

Ville Voutilainen

unread,
Aug 14, 2017, 7:57:50 AM8/14/17
to ISO C++ Standard - Future Proposals
On 14 August 2017 at 12:31, <torto...@gmail.com> wrote:
>>> stdio_filebuf's API is completely portable. It doesn't use any
>>> non-portable types.
>>> native_filebuf does.
>>
>>
>> Portability is not a justification for existence. There needs to be a use
>> for it as well.

The use is C interoperability. Systems that have open FILE*s in flight
cannot wrap
them into iostreams, and then they end up using FILE* and stdio
everywhere, which
kinda blows. Another portability aspect here is that some library
vendors already
ship something like a stdio_filebuf as an extension, but those extensions aren't
agreed on by all vendors. Which is why I think we should standardize
stdio_filebuf.

>> It does allows you to trivially implement a conforming (but non optimal)
>> std::filestream but of what use is that?

See above. The use of such a thing is C interoperability.

>> So far the only 'non portable' part of native_filebuf is the file
>> descripter.
>> What is wrong with leaving that mostly implementation defined so that use
>> of that use that part of the API is non-portable?

When I see a facility that I know contains no non-portable parts, I
know that I don't
need to worry about platform differences. When I see a facility that I
know contains
non-portable parts, I know that I need to worry about platform
differences. That's why
a) I don't want the native handles anywhere near std::filebuf b) I
don't want them
anywhere near std::stdio_filebuf. Sure, the native_handles in <thread>
et al. are done
differently, as part of the api of the portable thing. We don't need
to repeat that choice.

Nicol Bolas

unread,
Aug 14, 2017, 10:23:40 AM8/14/17
to ISO C++ Standard - Future Proposals

I would add that I feel the enforced native-handle design makes a lot more sense for `thread` than `filebuf`. `std::thread` is intended by the specification to be the lowest-level portable wrapper around an OS thread. `filebuf` is not; it's not even the only wrapper around OS file IO in the standard, let alone the "lowest-level portable wrapper".

Another important difference: you cannot create a `std::thread` instance from a `thread::native_handle_type`. You can only get the handle instance from an existing `std::thread`, not the other way around. By contrast, I'd say 90% of the point of these `*_filebuf` APIs is to be able to create an iostream buffer from an existing handle.

Ville Voutilainen

unread,
Aug 14, 2017, 10:39:08 AM8/14/17
to ISO C++ Standard - Future Proposals
On 14 August 2017 at 17:23, Nicol Bolas <jmck...@gmail.com> wrote:
> from an existing `std::thread`, not the other way around. By contrast, I'd
> say 90% of the point of these `*_filebuf` APIs is to be able to create an
> iostream buffer from an existing handle.


Yes, absolutely for the ones I have suggested. It's perhaps a source
of confusion
that std::filebuf doesn't do that; it opens a file - whereas the
suggested new filebufs
would indeed most often be used on already-open files. That's an
important aspect
of the motivation for native_filebuf: there are many things that
open() can do that
we can't necessarily reasonably map into C++, especially not into portable code.
The motivation of stdio_filebuf is different, it's mostly for use
cases where some
other part of your code already gave you a FILE*. And at that point
it's too late
to go back to a descriptor since some things must be done at open(). Conversely,
getting a FILE* from a descriptor doesn't help, because we
specifically have a FILE*
at hand. That's why I think we need both.

Niall Douglas

unread,
Aug 14, 2017, 12:41:33 PM8/14/17
to ISO C++ Standard - Future Proposals, torto...@gmail.com


Incidentally can't you implement sync_with_stdio() by just calling fsync() on the native descriptor before each stream operation and flush() on the stream after wards.
Is there some other trick that sync_with_stdio enables better performance?

As much as I support the exposure of the underlying file handle for iostreams, I do have to question the use cases:

1. Byte range locking has highly non-portable semantics, and is downright dangerous to use on POSIX with iostreams. Any code using the underlying fd for byte range locking on POSIX is probably incorrect.

2. fsync() generally does not do what people think it does, or what POSIX says it must do, and that is an increasing problem with time rather than decreasing i.e. fsync() is ever more becoming a partial or total noop on more and more systems. Any code using fsync() is probably incorrect.

What other major use case is there for exposing the native file handle for iostreams? I suppose maybe handing fds off to child processes. But that's best implemented by you opening the fds by hand, configuring them, then wrapping them into iostreams if needed.

That basically leaves twiddling the fd's flags e.g. turning on O_SYNC. But many POSIX implementations don't permit O_SYNC to be changed after fd open. So I'm kinda running out of valid use cases now.

I'm open to being corrected, but I've gotta wonder here, if someone wants to poke the internals of iostreams, perhaps they should just not use iostreams?

Niall

Ville Voutilainen

unread,
Aug 14, 2017, 1:50:03 PM8/14/17
to ISO C++ Standard - Future Proposals
On 14 August 2017 at 19:41, Niall Douglas <nialldo...@gmail.com> wrote:
> As much as I support the exposure of the underlying file handle for
> iostreams, I do have to question the use cases:
>
> 1. Byte range locking has highly non-portable semantics, and is downright
> dangerous to use on POSIX with iostreams. Any code using the underlying fd
> for byte range locking on POSIX is probably incorrect.
>
> 2. fsync() generally does not do what people think it does, or what POSIX
> says it must do, and that is an increasing problem with time rather than
> decreasing i.e. fsync() is ever more becoming a partial or total noop on
> more and more systems. Any code using fsync() is probably incorrect.
>
> What other major use case is there for exposing the native file handle for
> iostreams? I suppose maybe handing fds off to child processes. But that's
> best implemented by you opening the fds by hand, configuring them, then
> wrapping them into iostreams if needed.
>
> That basically leaves twiddling the fd's flags e.g. turning on O_SYNC. But
> many POSIX implementations don't permit O_SYNC to be changed after fd open.
> So I'm kinda running out of valid use cases now.
>
> I'm open to being corrected, but I've gotta wonder here, if someone wants to
> poke the internals of iostreams, perhaps they should just not use iostreams?


These are yet further reasons to leave iostreams' and filebuf's API
alone. It's not
generally sane to expose the underlying file descriptor at that level.
If you think it's
something to fiddle with, make sure you know that you're dealing with a filebuf
that is willing to expose the descriptor, cast down to that derived
filebuf and get
the descriptor from there.

torto...@gmail.com

unread,
Aug 14, 2017, 3:37:15 PM8/14/17
to ISO C++ Standard - Future Proposals


On Monday, 14 August 2017 18:50:03 UTC+1, Ville Voutilainen wrote:
On 14 August 2017 at 19:41, Niall Douglas <nialldo...@gmail.com> wrote:
> As much as I support the exposure of the underlying file handle for
> iostreams, I do have to question the use cases:
>
> 1. Byte range locking has highly non-portable semantics, and is downright
> dangerous to use on POSIX with iostreams. Any code using the underlying fd
> for byte range locking on POSIX is probably incorrect.
>
I think you mean non-portable rather than incorrect. I know of several implementations that work correctly.
I would not trust them to work correctly on a different platform out of the box or even a different file-system
on same platform. That doesn't make it a bad thing to do.
They can be made a little more portable with effort.

> 2. fsync() generally does not do what people think it does, or what POSIX
> says it must do, and that is an increasing problem with time rather than
> decreasing i.e. fsync() is ever more becoming a partial or total noop on
> more and more systems. Any code using fsync() is probably incorrect.
>
> What other major use case is there for exposing the native file handle for
> iostreams? I suppose maybe handing fds off to child processes. But that's
> best implemented by you opening the fds by hand, configuring them, then
> wrapping them into iostreams if needed.
>

A pipe is exactly one of the other uses I would have in mind.
However, the semantics of pipe mean it must be an entirely separate class from filebuf
(obviously as it isn't really a file).

 
> That basically leaves twiddling the fd's flags e.g. turning on O_SYNC. But
> many POSIX implementations don't permit O_SYNC to be changed after fd open.
> So I'm kinda running out of valid use cases now.
>
> I'm open to being corrected, but I've gotta wonder here, if someone wants to
> poke the internals of iostreams, perhaps they should just not use iostreams?

It is a good thing to be able to use the same abstraction for different kinds of I/O (provided it has compatible behaviour).
It always feels a little dirty to me when i have to drop down to C or Posix APIs.
Typically the first thing I do is create wrappers to them (which is being discussed on this thread https://groups.google.com/a/isocpp.org/forum/?fromgroups#!searchin/std-proposals/file$20locking/std-proposals/83tGrL1GhM4/fL9WnFYRiasJ)
 

These are yet further reasons to leave iostreams' and filebuf's API
alone. It's not
generally sane to expose the underlying file descriptor at that level.
If you think it's
something to fiddle with, make sure you know that you're dealing with a filebuf
that is willing to expose the descriptor, cast down to that derived
filebuf and get
the descriptor from there.

Are you suggesting it should be possible to cast from a filebuf to a native_filebuf?
If they are really the same thing then requiring an explicit downcast to native_filebuf isn't much more protection
from "here be dragons" than a similar warning on the method to retrieve the native file descriptor.

If they are different? What is the significant difference? (in terms of implementation or abstraction)

There was a remark earlier that filebuf is not the only interface to file IO in C++.
The only other one I know of is cstdio. I don't really class that as 'C++' anymore than I would another library with a C binding.
It is in the standard because compatibility with C is not only good but essential.
I find the earlier arguments convincing that a stdio_buffer would be a good thing to have.

So that just leaves the native_filebuf == filebuf question.
 

Ville Voutilainen

unread,
Aug 14, 2017, 3:48:26 PM8/14/17
to ISO C++ Standard - Future Proposals
On 14 August 2017 at 22:37, <torto...@gmail.com> wrote:
>> These are yet further reasons to leave iostreams' and filebuf's API
>> alone. It's not
>> generally sane to expose the underlying file descriptor at that level.
>> If you think it's
>> something to fiddle with, make sure you know that you're dealing with a
>> filebuf
>> that is willing to expose the descriptor, cast down to that derived
>> filebuf and get
>> the descriptor from there.
>
>
> Are you suggesting it should be possible to cast from a filebuf to a
> native_filebuf?

Perhaps it shouldn't. If native_filebuf derives from filebuf, the cast becomes
automatically possible because it's just a dynamic_cast.

> If they are really the same thing then requiring an explicit downcast to
> native_filebuf isn't much more protection
> from "here be dragons" than a similar warning on the method to retrieve the
> native file descriptor.

Sure, except that you need to know the target type and explicitly
downcast to it. Sure,
you can also use typeid to query what it really is. That doesn't mean
that filebuf
and native_filebuf are a same thing, and perhaps native_filebuf should
not derive
from filebuf but from streambuf.

> If they are different? What is the significant difference? (in terms of
> implementation or abstraction)

The significant difference is that native_filebuf has a file
descriptor API, whereas filebuf
does not.

> So that just leaves the native_filebuf == filebuf question.

That should evaluate to false, and perhaps even so that you can't cast
a filebuf to a native_filebuf,
i.e. they wouldn't have an inheritance relationship.

Nicol Bolas

unread,
Aug 14, 2017, 4:37:38 PM8/14/17
to ISO C++ Standard - Future Proposals
On Monday, August 14, 2017 at 3:37:15 PM UTC-4, Bruce Adams wrote:
There was a remark earlier that filebuf is not the only interface to file IO in C++.
The only other one I know of is cstdio. I don't really class that as 'C++' anymore than I would another library with a C binding.
It is in the standard because compatibility with C is not only good but essential.

Too many C++ programmers use cstdio instead of iostreams for us to label it as a for-compatibility-only feature. The availability of `fprintf` alone makes it too useful of a feature for us to discount it as a thing for compatibility.

Niall Douglas

unread,
Aug 14, 2017, 5:09:31 PM8/14/17
to ISO C++ Standard - Future Proposals

>
> 1. Byte range locking has highly non-portable semantics, and is downright
> dangerous to use on POSIX with iostreams. Any code using the underlying fd
> for byte range locking on POSIX is probably incorrect.
>
I think you mean non-portable rather than incorrect. I know of several implementations that work correctly.

No, I mean incorrect.

The byte range locking API is so severely broken in POSIX as to make it impossible to write correct code with it.

It is possible to write correct code if and only if:

1. If you control all file descriptors in the entire process.

2. Files are never big.

3. Files are never on a network drive.

4. You don't care about pathological performance occurring (like, single digit grants per second).

5. You don't use threads.

6. You don't care about power consumption.

7. You don't switch between shared and exclusive on non-identical ranges.

8. You never recurse into code which needs to take a lock whilst holding a lock.

9. You can guarantee no third party is permuting the bit of filesystem you are using.

But if you can meet all those conditions, then almost any other form of synchronisation is better and faster. The sole thing which byte range locks have which is useful is that they auto-release if the holding process suddenly exits. That's it.
 
I would not trust them to work correctly on a different platform out of the box or even a different file-system
on same platform. That doesn't make it a bad thing to do.
They can be made a little more portable with effort.

AFIO provides four implementations of afio::algorithm::shared_fs_mutex. It makes use of byte range locks to implement those, but mostly solely as sentinels for detecting sudden process exit by another process holding a lock.

Now, Windows on the other hand really nailed byte range locks beautifully. Correct design. Amazing performance, Scales beautifully. Doesn't burn the CPU. Works with threads. Async API. POSIX literally should take the NT byte range lock design and use it verbatim. It's the right design. Shame about the mandatory locking though, choice of advisory or mandatory would have been better.
 

>
> What other major use case is there for exposing the native file handle for
> iostreams? I suppose maybe handing fds off to child processes. But that's
> best implemented by you opening the fds by hand, configuring them, then
> wrapping them into iostreams if needed.
>

A pipe is exactly one of the other uses I would have in mind.
However, the semantics of pipe mean it must be an entirely separate class from filebuf
(obviously as it isn't really a file).

Unless they've removed it, the Networking TS should implement iostreams integration for pipes. ASIO certainly does. If it's still there, then you're covered by the TS and can untick that use case too.

Niall

torto...@gmail.com

unread,
Aug 14, 2017, 8:14:02 PM8/14/17
to ISO C++ Standard - Future Proposals
That's circular reasoning. Why shouldn't filebuf have a file descriptor API?
 
> So that just leaves the native_filebuf == filebuf question.

That should evaluate to false, and perhaps even so that you can't cast
a filebuf to a native_filebuf,
i.e. they wouldn't have an inheritance relationship.

I'm still not seeing a clear reason.

torto...@gmail.com

unread,
Aug 14, 2017, 8:17:20 PM8/14/17
to ISO C++ Standard - Future Proposals

I wasn't suggesting that officially. Its just a matter of personal taste (caveat C programmers people writing C style programs in C++). Though in this case I guess its the only way to do unbuffered IO without having a native_filebuf?

Ville Voutilainen

unread,
Aug 14, 2017, 8:31:32 PM8/14/17
to ISO C++ Standard - Future Proposals
On 15 August 2017 at 03:14, <torto...@gmail.com> wrote:
>> > If they are different? What is the significant difference? (in terms of
>> > implementation or abstraction)
>>
>> The significant difference is that native_filebuf has a file
>> descriptor API, whereas filebuf
>> does not.
>>
> That's circular reasoning. Why shouldn't filebuf have a file descriptor API?

Because everything in filebuf's API is portable, and it has no
platform-specific parts.
As soon as I see you create a filebuf (and not a filebuf* that might
be something else
due to polymorphism), I know that I'm dealing with a cross-platform
facility, and platform-specific
facilities don't enter the picture.

File descriptors are platform-specific, and have platform-specific
semantics. Since I don't
have to amend the cross-platform API of filebuf with platform-specific
parts, I choose not to.
Every interesting way of getting a file descriptor is
platform-specific; I want to wrap iostreams
on top of descriptors acquired via platform-specific ways. Something
like fileno() is of no
interest to me whatsoever. What is of interest is open(), and whatever
the Windows equivalent
is.

While it might be to some extent convenient to just create a filebuf
directly from a file descriptor,
it becomes less convenient when the filebuf type no longer serves its
purpose as a cross-platform
abstraction that has no platform-specific parts. I can architecturally
separate my cross-platform
parts from my platform-specific parts by using different types in
these different abstraction layers,
and instantly recognize which kinds of operations and types I'm dealing with.

torto...@gmail.com

unread,
Aug 14, 2017, 8:47:06 PM8/14/17
to ISO C++ Standard - Future Proposals


On Monday, 14 August 2017 22:09:31 UTC+1, Niall Douglas wrote:

>
> 1. Byte range locking has highly non-portable semantics, and is downright
> dangerous to use on POSIX with iostreams. Any code using the underlying fd
> for byte range locking on POSIX is probably incorrect.
>
I think you mean non-portable rather than incorrect. I know of several implementations that work correctly.

No, I mean incorrect.

The byte range locking API is so severely broken in POSIX as to make it impossible to write correct code with it.

It is possible to write correct code if and only if:

1. If you control all file descriptors in the entire process.

why wouldn't you? or rather why would you think it was a good idea not to if you are trying to lock things?
 
2. Files are never big.

The reason for this one is less clear? performance perhaps? How big is big?
 
3. Files are never on a network drive.

Yes. File locking 101 - don't use NFS or Samba. Although allegedly it might work better on NFS4 which no-one has properly implemented yet.
I haven't tried it on other remote file systems like sshfs. But I'm sure it would be 'interesting'.
 
4. You don't care about pathological performance occurring (like, single digit grants per second).

That sounds like it should be the program's fault for locking too often. i.e. poor design
 
5. You don't use threads.

Just be careful about which thread is doing the locking. Its an issue as with many other shareable resources.
 
6. You don't care about power consumption.

This one makes little sense to me. Even on an embedded system it should be a case of flipping a few bit and checking if they're flipped.
 
7. You don't switch between shared and exclusive on non-identical ranges.

That sounds like a potentially bad design too.
 
8. You never recurse into code which needs to take a lock whilst holding a lock.

Don't deadlock. Multithreading 101
 
9. You can guarantee no third party is permuting the bit of filesystem you are using.

Yes. File locking is for interprocess synchronisation. You have to be in control of the processes involved in that.
 
But if you can meet all those conditions, then almost any other form of synchronisation is better and faster. The sole thing which byte range locks have which is useful is that they auto-release if the holding process suddenly exits. That's it.

Wouldn't a lot of other forms of synchronisation have to re-invent advisory locking for themselves to do this?
 
I would not trust them to work correctly on a different platform out of the box or even a different file-system
on same platform. That doesn't make it a bad thing to do.
They can be made a little more portable with effort.

AFIO provides four implementations of afio::algorithm::shared_fs_mutex. It makes use of byte range locks to implement those, but mostly solely as sentinels for detecting sudden process exit by another process holding a lock.

Your need for locks here seems to agree with my assessment above?
 
Now, Windows on the other hand really nailed byte range locks beautifully. Correct design. Amazing performance, Scales beautifully. Doesn't burn the CPU. Works with threads. Async API. POSIX literally should take the NT byte range lock design and use it verbatim. It's the right design. Shame about the mandatory locking though, choice of advisory or mandatory would have been better.
 
That is not something I often here about a Windows API but it could perhaps be a Windows person versus Unix person thing?
Care to elaborate?
 

>
> What other major use case is there for exposing the native file handle for
> iostreams? I suppose maybe handing fds off to child processes. But that's
> best implemented by you opening the fds by hand, configuring them, then
> wrapping them into iostreams if needed.
>

A pipe is exactly one of the other uses I would have in mind.
However, the semantics of pipe mean it must be an entirely separate class from filebuf
(obviously as it isn't really a file).

Unless they've removed it, the Networking TS should implement iostreams integration for pipes. ASIO certainly does. If it's still there, then you're covered by the TS and can untick that use case too.

Niall


Pipes was never ticked for this proposal.

Pipe doesn't appear in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4656.pdf other than in reference to sigpipe.
But I think ASIO was split into bits so they might be in another proposal somewhere.

Bruce.

Thiago Macieira

unread,
Aug 14, 2017, 8:55:46 PM8/14/17
to std-pr...@isocpp.org
On Monday, 14 August 2017 17:47:06 PDT torto...@gmail.com wrote:
> > 1. If you control all file descriptors in the entire process.
> >
> why wouldn't you? or rather why would you think it was a good idea not to
> if you are trying to lock things?

When you write a library, you don't control the rest of the process.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

torto...@gmail.com

unread,
Aug 14, 2017, 8:59:53 PM8/14/17
to ISO C++ Standard - Future Proposals

I see what your saying but I'm not sure I buy it here.

Does the rest of the standard separating portable and non-portable classes or does it just label functions accordingly?

A slightly related issue. You have an extra cognitive burden of needing another filebuf variant which needs to be specified in full.
I'm not sure that standardese lets you say - exactly like this class but with two additional functions. Maybe it does?
Adding a couple of functions looks dangerously like derivation but in this case the derivation is possibly inverted. What is a filebuf if not a native_filebuf with those
two functions hidden?

I think if we want to make a distinction it has to be on more solid grounds.
Buffering might make sense. If you said a native_filebuf has no buffering at all and a filebuf has some mandated buffering that might be the significant difference to justify it.

Thiago Macieira

unread,
Aug 14, 2017, 9:01:43 PM8/14/17
to std-pr...@isocpp.org
On Monday, 14 August 2017 17:47:06 PDT torto...@gmail.com wrote:
> Pipe doesn't appear in
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4656.pdf other
> than in reference to sigpipe.

Which is another POSIX braindamage (in hindsight), like most of signal
support. You can't handle signals cleanly unless you control the entire
application.

Now, with Linux you can skip SIGPIPE on sockets by always using MSG_NOSIGNAL
when sending, whereas on macOS and NetBSD, you use fcntl for F_SETNOSIGPIPE.
NetBSD also has SOCK_NOSIGPIPE to set the flag atomically.

On Linux, you can't cope with SIGPIPE on pipes unless you ignore the signal
globally, for every pipe and socket in the process.

torto...@gmail.com

unread,
Aug 14, 2017, 9:02:57 PM8/14/17
to ISO C++ Standard - Future Proposals

True but you delegate the decision to your library's users.
Or better yet let them set the locking policy on your library.

Ville Voutilainen

unread,
Aug 14, 2017, 9:07:55 PM8/14/17
to ISO C++ Standard - Future Proposals
On 15 August 2017 at 03:59, <torto...@gmail.com> wrote:
> I see what your saying but I'm not sure I buy it here.
>
> Does the rest of the standard separating portable and non-portable classes
> or does it just label functions accordingly?

False equivalence. A std::thread creates a thread, and gives you
access to the native handle.
However, this leads to the main difference of filebuf and
native_filebuf. filebuf opens a file
for you. While native_filebuf can be made to do that as well, its main
motivation is to
adopt an existing descriptor to an already open file.

> A slightly related issue. You have an extra cognitive burden of needing
> another filebuf variant which needs to be specified in full.
> I'm not sure that standardese lets you say - exactly like this class but
> with two additional functions. Maybe it does?
> Adding a couple of functions looks dangerously like derivation but in this
> case the derivation is possibly inverted. What is a filebuf if not a
> native_filebuf with those
> two functions hidden?

See above.

> I think if we want to make a distinction it has to be on more solid grounds.
> Buffering might make sense. If you said a native_filebuf has no buffering at
> all and a filebuf has some mandated buffering that might be the significant
> difference to justify it.


That would be complete nonsense. Whether a native_filebuf ends up
doing buffered i/o
depends on how you opened the file. And filebuf has no mandated
buffering, it can hit
the disk on every byte if it so chooses, although that would be a poor
implementation.

Nicol Bolas

unread,
Aug 14, 2017, 9:31:54 PM8/14/17
to ISO C++ Standard - Future Proposals

And what of the libraries that are designed to use cross-platform file IO (filebuf/FILE*) internally, and therefore are written in complete ignorance of platform-specific feature like "locking policy" and so forth? Isn't that essentially admitting to Niall's original point: "The byte range locking API is so severely broken in POSIX as to make it impossible to write correct code with it."

Sure, it's not "impossible" (which he goes on to admit to and explain what you have to do to make it "possible"), but it's certainly not something that an application can do in a vacuum. Either the entire application is written with an eye to it, or it cannot work.

torto...@gmail.com

unread,
Aug 14, 2017, 9:46:08 PM8/14/17
to ISO C++ Standard - Future Proposals


On Tuesday, 15 August 2017 02:07:55 UTC+1, Ville Voutilainen wrote:
On 15 August 2017 at 03:59,  <torto...@gmail.com> wrote:
> I see what your saying but I'm not sure I buy it here.
>
> Does the rest of the standard separating portable and non-portable classes
> or does it just label functions accordingly?

False equivalence. A std::thread creates a thread, and gives you
access to the native handle.

I am not and have never been thinking of std::thread and filebuf as equivalent.

There must be quite a few functions that can be used non-portably beyond
those inherited from the C library.

The filesystem ts has functionality relating to symbolic links that might qualify.
While other bits of it are more portable.

I'm sure there are a few other cases of undefined, unspecified or implementation defined
that might quality. I recall having problems with the portability of seek() a while back though
it could have been down to a non-conforming implementation.
 
However, this leads to the main difference of filebuf and
native_filebuf. filebuf opens a file
for you. While native_filebuf can be made to do that as well, its main
motivation is to
adopt an existing descriptor to an already open file.

I disagree. The main motivation for me is to obtain the descriptor for a file opened using fstream.
Adopting an existing descriptor is a valid use case as well.

 
> A slightly related issue. You have an extra cognitive burden of needing
> another filebuf variant which needs to be specified in full.
> I'm not sure that standardese lets you say - exactly like this class but
> with two additional functions. Maybe it does?
> Adding a couple of functions looks dangerously like derivation but in this
> case the derivation is possibly inverted. What is a filebuf if not a
> native_filebuf with those
> two functions hidden?

See above.

> I think if we want to make a distinction it has to be on more solid grounds.
> Buffering might make sense. If you said a native_filebuf has no buffering at
> all and a filebuf has some mandated buffering that might be the significant
> difference to justify it.


That would be complete nonsense. Whether a native_filebuf ends up
doing buffered i/o
depends on how you opened the file. And filebuf has no mandated
buffering, it can hit
the disk on every byte if it so chooses, although that would be a poor
implementation.

True. I didn't think that one through at all.

Niall Douglas

unread,
Aug 14, 2017, 10:22:57 PM8/14/17
to ISO C++ Standard - Future Proposals


The byte range locking API is so severely broken in POSIX as to make it impossible to write correct code with it.

It is possible to write correct code if and only if:

1. If you control all file descriptors in the entire process.

why wouldn't you? or rather why would you think it was a good idea not to if you are trying to lock things?

Almost no software today doesn't make extensive use of third party libraries, often with source code which cannot be easily modified.

And due to POSIX dropping all byte range locks as soon as any fd to that inode is closed, it makes byte range locks inherently problematic.

Consider for example an implementation of filesystem::path::exists() which tries to open the path to test for existence. It would open and then close a fd. If any code elsewhere in the process has byte range locks open on that inode, they get dropped.

Before you say "just use stat() then", you can't in many cases e.g. if the filesystem is permuting randomly, because then you can't use paths at all. Also, incidentally, there is nothing stopping stat() being implemented by your libc as open()-fstat()-close() like it must be on Windows. Again, game over thanks to the design of POSIX byte range locks.
 
 
2. Files are never big.

The reason for this one is less clear? performance perhaps? How big is big?

struct flock due to some amazing bad design uses signed values, thus rendering the top half of your file unlockable.

You might think that not important. For filesystem algorithm programmers who might use the entire 64 bit space as an open hash table using sparse storage and hole punching, it's a showstopper.

I've also seen implementations fail at offsets after 1<<62 rather than 1<<63. Almost certainly a bug. But not comforting.
 
 
3. Files are never on a network drive.

Yes. File locking 101 - don't use NFS or Samba. Although allegedly it might work better on NFS4 which no-one has properly implemented yet.
I haven't tried it on other remote file systems like sshfs. But I'm sure it would be 'interesting'.

Windows-type oplocks, if implemented correctly, are a much better design.
 
 
4. You don't care about pathological performance occurring (like, single digit grants per second).

That sounds like it should be the program's fault for locking too often. i.e. poor design

No, it's poor quality of implementation in some kernels and/or filing systems. In some cases they scale exponentially inverse to physical CPU count for example. Wonderful.
 
 
5. You don't use threads.

Just be careful about which thread is doing the locking. Its an issue as with many other shareable resources.

It's not that.

POSIX byte range locks are per-inode, and don't detect attempts to lock the same region twice, rather they just ignore the second attempt and then release too early.

You'll probably say now that better design of your code would fix this. But you don't control the threads in your process increasingly any more, various third party libraries will spin up threads and go run stuff in the background out of your control.
 
 
6. You don't care about power consumption.

This one makes little sense to me. Even on an embedded system it should be a case of flipping a few bit and checking if they're flipped.

POSIX byte range locks give you exactly two choices: block until lock granted, or return immediately.

You can't wait for a timeout. You can't be notified when it's become free. You can't do other work whilst being blocked.

Thus you end up either spinning on the lock burning CPU, or launching kernel threads for the sole purpose of waiting on the lock asynchronously.

This hits battery life badly on mobile devices. Lock files are actually cheaper on power budget, which is sad.
 
 
7. You don't switch between shared and exclusive on non-identical ranges.

That sounds like a potentially bad design too.

My issue is the lack of specification of atomicity. Implementations don't say whether shared to exclusive upgrades are atomic, for identical ranges or overlapping ranges.

It's a common use case to have a region locked for shared use, then you want to lock some or all of it for exclusive use without anybody else modifying it before the exclusive lock is granted. The POSIX API isn't fit for this purpose, it splits and replaces locks instead of laying exclusive over shared. Unhelpful.

You can't upgrade locks at all on Windows interestingly, but you can atomically downgrade them i.e. exclusive -> shared atomically. This isn't as useful, but at least the behaviour is guaranteed.
 
 
8. You never recurse into code which needs to take a lock whilst holding a lock.

Don't deadlock. Multithreading 101

You don't always control such code. What we'd much prefer to see is EDEADLK being returned.
 
 
9. You can guarantee no third party is permuting the bit of filesystem you are using.

Yes. File locking is for interprocess synchronisation. You have to be in control of the processes involved in that.

A major attack vector is generating races by maliciously fiddling with filesystem under IPC usage. TOCTOU etc

You might say "set perms etc" but in fact all that is unnecessary with a non-broken design. I hate to keep chirping on about Windows, but it won't let anybody permute part of a filesystem being used for synchronisation, thus eliminating TOCTOU et al entirely.
 
 
But if you can meet all those conditions, then almost any other form of synchronisation is better and faster. The sole thing which byte range locks have which is useful is that they auto-release if the holding process suddenly exits. That's it.

Wouldn't a lot of other forms of synchronisation have to re-invent advisory locking for themselves to do this?

If the kernel supplied implementation is really lousy - and everywhere but FreeBSD it is - then you're better off.
 
 
I would not trust them to work correctly on a different platform out of the box or even a different file-system
on same platform. That doesn't make it a bad thing to do.
They can be made a little more portable with effort.

AFIO provides four implementations of afio::algorithm::shared_fs_mutex. It makes use of byte range locks to implement those, but mostly solely as sentinels for detecting sudden process exit by another process holding a lock.

Your need for locks here seems to agree with my assessment above?

When you have multiple processes working on the same data, often you need to synchronise. You try not to of course, you exploit the natural synchronisation built into i/o which is actually finally viable as of just these past months thanks to Microsoft fixing Windows to follow POSIX i/o concurrency guarantees. You need the very latest Windows 10 however, and a pretty recent Linux kernel. But it does work, and it's portable-ish.

AFIO uses a list based system for its locking algorithms, so process A says it'll be locking 5, 22 and 13. Process B says it'll be locking 6, 99, and 13. The synchronisation will happen on the 13.

The numbers are arbitrary and mean whatever the application chooses them to mean. Under the bonnet, the four implementations have very different approaches to implementation. Some have amazing performance but are anti-social. Others scale amazingly. Some are NFS/Samba friendly. Some are shared memory only.

afio::shared_fs_mutex is an abstract base class, so runtime code can swap implementations and higher level code doesn't need to care how the locking works.
 
 
Now, Windows on the other hand really nailed byte range locks beautifully. Correct design. Amazing performance, Scales beautifully. Doesn't burn the CPU. Works with threads. Async API. POSIX literally should take the NT byte range lock design and use it verbatim. It's the right design. Shame about the mandatory locking though, choice of advisory or mandatory would have been better.
 
That is not something I often here about a Windows API but it could perhaps be a Windows person versus Unix person thing?
Care to elaborate?

The Win32 API is generally pants.

But the NT kernel API is lovely. It's really what VMS 6.0 would have been of course had that team stayed at Digital. Such a lovely kernel API design, really well thought through, and so very polished. I find it a joy to program in, but then I always liked VMS's design language.
 

Unless they've removed it, the Networking TS should implement iostreams integration for pipes. ASIO certainly does. If it's still there, then you're covered by the TS and can untick that use case too.

Niall


Pipes was never ticked for this proposal.

Pipe doesn't appear in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4656.pdf other than in reference to sigpipe.
But I think ASIO was split into bits so they might be in another proposal somewhere.

I think pipe support makes far more sense next to sockets than to files.

Shame about the removal. I have always wished the Networking TS were much lower level than what it is, at a different abstraction level. But that ship has sailed.

Niall

Thiago Macieira

unread,
Aug 14, 2017, 10:25:30 PM8/14/17
to std-pr...@isocpp.org
On Monday, 14 August 2017 18:02:57 PDT torto...@gmail.com wrote:
> On Tuesday, 15 August 2017 01:55:46 UTC+1, Thiago Macieira wrote:
> > On Monday, 14 August 2017 17:47:06 PDT torto...@gmail.com <javascript:>
> >
> > wrote:
> > > > 1. If you control all file descriptors in the entire process.
> > >
> > > why wouldn't you? or rather why would you think it was a good idea not
> >
> > to
> >
> > > if you are trying to lock things?
> >
> > When you write a library, you don't control the rest of the process.
>
> True but you delegate the decision to your library's users.
> Or better yet let them set the locking policy on your library.

Easier said than done.

It's ok to say something like "don't install a signal handler for SIGPIPE",
since most applications won't do it and any libraries will most likely just
re-ignore it.

It's probably ok to say "don't use getenv/putenv because it's not thread-safe,
use qgetnev and qputenv because they have an internal mutex".

It's not ok to say "don't open other file descriptors". That's REALLY not
something any application has control over.

Thiago Macieira

unread,
Aug 14, 2017, 10:49:01 PM8/14/17
to std-pr...@isocpp.org
On Monday, 14 August 2017 19:22:57 PDT Niall Douglas wrote:
> Consider for example an implementation of filesystem::path::exists() which
> tries to open the path to test for existence. It would open and then close
> a fd. If any code elsewhere in the process has byte range locks open on
> that inode, they get dropped.

Why would it open instead of access() with F_OK?

> Before you say "just use stat() then", you can't in many cases e.g. if the
> filesystem is permuting randomly, because then you can't use paths at all.

The problem here is the attempt to see if the file exists. Why do you want to
know that? If the application wants to know if it can open a file, it should
try to open the file, instead of checking first if it can exists, as you can't
open a file that doesn't exist.

That leads to TOCTOU attacks.

> Also, incidentally, there is nothing stopping stat() being implemented by
> your libc as open()-fstat()-close() like it must be on Windows. Again, game
> over thanks to the design of POSIX byte range locks.

It might be implemented on Windows by way of a CreateFile() call (you need a
handle to get some information), but it doesn't follow that you get a file
descriptor from it. And you said yourself that Windows does implement proper
locking, so I don't see how POSIX's behaviour would apply.

That said, you do have a point: no one in any non-trivial application can
control all the file descriptors.

torto...@gmail.com

unread,
Aug 15, 2017, 3:17:21 AM8/15/17
to ISO C++ Standard - Future Proposals


On Tuesday, 15 August 2017 03:22:57 UTC+1, Niall Douglas wrote:


The byte range locking API is so severely broken in POSIX as to make it impossible to write correct code with it.

It is possible to write correct code if and only if:

1. If you control all file descriptors in the entire process.

why wouldn't you? or rather why would you think it was a good idea not to if you are trying to lock things?

Almost no software today doesn't make extensive use of third party libraries, often with source code which cannot be easily modified.

And due to POSIX dropping all byte range locks as soon as any fd to that inode is closed, it makes byte range locks inherently problematic.

Consider for example an implementation of filesystem::path::exists() which tries to open the path to test for existence. It would open and then close a fd. If any code elsewhere in the process has byte range locks open on that inode, they get dropped.

Before you say "just use stat() then", you can't in many cases e.g. if the filesystem is permuting randomly, because then you can't use paths at all. Also, incidentally, there is nothing stopping stat() being implemented by your libc as open()-fstat()-close() like it must be on Windows. Again, game over thanks to the design of POSIX byte range locks.
 
I would think not but does that also apply if using access()? 

but there are a couple of questions here. Why would you need check if a file exists if you've just locked it? similarly why would you be separately opening and closing the same file descriptor?

This sounds like a multi-threading issue. I'd agree that posix is weaker where threads are concerned as the locking and signalling APIs predate that.
The expectation is that you use processes instead. Its a fundamentally different philosophy than on windows where processes are expensive and threads are king.
Processes are much cheaper on Linux than windows (that is of course not mandated by posix).

This does create a lot of difficulty in trying to write an application that is portable between windows and Linux.

However, you can work around that by designating only one thread is responsible for each lock. It works a bit less well for signals.


 
2. Files are never big.

The reason for this one is less clear? performance perhaps? How big is big?

struct flock due to some amazing bad design uses signed values, thus rendering the top half of your file unlockable.

You might think that not important. For filesystem algorithm programmers who might use the entire 64 bit space as an open hash table using sparse storage and hole punching, it's a showstopper.

I've also seen implementations fail at offsets after 1<<62 rather than 1<<63. Almost certainly a bug. But not comforting.
 
I guess this would be because you can lock relative to the current file position so a signed value is necessary?

Using a 2^64 address space as a hash table strikes me as a bit of a specialist use case that they wouldn't have had in mind when posix was defined.
But I agree a improved design could support such use cases better.

 
3. Files are never on a network drive.

Yes. File locking 101 - don't use NFS or Samba. Although allegedly it might work better on NFS4 which no-one has properly implemented yet.
I haven't tried it on other remote file systems like sshfs. But I'm sure it would be 'interesting'.

Windows-type oplocks, if implemented correctly, are a much better design.
 
The descriptions here (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365713(v=vs.85).aspx) a little off putting
The description of locking a file by using a function called CreateFile() twice on the same file is far from an intuitive API.

Of course it's the semantics that are of interest here and I need to read more to grok the design.

"For example, in the execution of a batch file, the batch file may be opened and closed once for each line of the file. A batch opportunistic lock opens the batch file on the server and keeps it open. As the command processor "opens" and "closes" the batch file, the network redirector intercepts the open and close commands."
 
spoken like a OS that isn't littered with thousands of little scripting languages :).

 
4. You don't care about pathological performance occurring (like, single digit grants per second).

That sounds like it should be the program's fault for locking too often. i.e. poor design

No, it's poor quality of implementation in some kernels and/or filing systems. In some cases they scale exponentially inverse to physical CPU count for example. Wonderful.
 
That would be a bug report then. Though one requiring a complete re-design is quite hard to fix.
 
 
5. You don't use threads.

Just be careful about which thread is doing the locking. Its an issue as with many other shareable resources.

It's not that.

POSIX byte range locks are per-inode, and don't detect attempts to lock the same region twice, rather they just ignore the second attempt and then release too early.

You'll probably say now that better design of your code would fix this. But you don't control the threads in your process increasingly any more, various third party libraries will spin up threads and go run stuff in the background out of your control.
 
Do you have any specific examples in mind? This sounds like the realm of map/reduce frameworks but perhaps you are thinking along different lines.
 
6. You don't care about power consumption.

This one makes little sense to me. Even on an embedded system it should be a case of flipping a few bit and checking if they're flipped.

POSIX byte range locks give you exactly two choices: block until lock granted, or return immediately.

You can't wait for a timeout. You can't be notified when it's become free. You can't do other work whilst being blocked.

Thus you end up either spinning on the lock burning CPU, or launching kernel threads for the sole purpose of waiting on the lock asynchronously.

That is a notable absence.
 
This hits battery life badly on mobile devices. Lock files are actually cheaper on power budget, which is sad.
 
Surely a sleeping thread consumes little or no power?
 
7. You don't switch between shared and exclusive on non-identical ranges.

That sounds like a potentially bad design too.

My issue is the lack of specification of atomicity. Implementations don't say whether shared to exclusive upgrades are atomic, for identical ranges or overlapping ranges.

It's a common use case to have a region locked for shared use, then you want to lock some or all of it for exclusive use without anybody else modifying it before the exclusive lock is granted. The POSIX API isn't fit for this purpose, it splits and replaces locks instead of laying exclusive over shared. Unhelpful.

You can't upgrade locks at all on Windows interestingly, but you can atomically downgrade them i.e. exclusive -> shared atomically. This isn't as useful, but at least the behaviour is guaranteed.
 
 
8. You never recurse into code which needs to take a lock whilst holding a lock.

Don't deadlock. Multithreading 101

You don't always control such code. What we'd much prefer to see is EDEADLK being returned.
 
 
9. You can guarantee no third party is permuting the bit of filesystem you are using.

Yes. File locking is for interprocess synchronisation. You have to be in control of the processes involved in that.

A major attack vector is generating races by maliciously fiddling with filesystem under IPC usage. TOCTOU etc

You might say "set perms etc" but in fact all that is unnecessary with a non-broken design. I hate to keep chirping on about Windows, but it won't let anybody permute part of a filesystem being used for synchronisation, thus eliminating TOCTOU et al entirely.
 
 
But if you can meet all those conditions, then almost any other form of synchronisation is better and faster. The sole thing which byte range locks have which is useful is that they auto-release if the holding process suddenly exits. That's it.

Wouldn't a lot of other forms of synchronisation have to re-invent advisory locking for themselves to do this?

If the kernel supplied implementation is really lousy - and everywhere but FreeBSD it is - then you're better off.

Is that down to bugs or does FreeBSD tighten up and improve on the basic posix specification?
 
 
I would not trust them to work correctly on a different platform out of the box or even a different file-system
on same platform. That doesn't make it a bad thing to do.
They can be made a little more portable with effort.

AFIO provides four implementations of afio::algorithm::shared_fs_mutex. It makes use of byte range locks to implement those, but mostly solely as sentinels for detecting sudden process exit by another process holding a lock.

Your need for locks here seems to agree with my assessment above?

When you have multiple processes working on the same data, often you need to synchronise. You try not to of course, you exploit the natural synchronisation built into i/o which is actually finally viable as of just these past months thanks to Microsoft fixing Windows to follow POSIX i/o concurrency guarantees. You need the very latest Windows 10 however, and a pretty recent Linux kernel. But it does work, and it's portable-ish.

AFIO uses a list based system for its locking algorithms, so process A says it'll be locking 5, 22 and 13. Process B says it'll be locking 6, 99, and 13. The synchronisation will happen on the 13.

The numbers are arbitrary and mean whatever the application chooses them to mean. Under the bonnet, the four implementations have very different approaches to implementation. Some have amazing performance but are anti-social. Others scale amazingly. Some are NFS/Samba friendly. Some are shared memory only.

afio::shared_fs_mutex is an abstract base class, so runtime code can swap implementations and higher level code doesn't need to care how the locking works.
 
Have you considered trying to specify an improved OS APIs that could be aspired to?
It is exactly this kind of implementation experience from which such things tend to drop out.

Niall Douglas

unread,
Aug 15, 2017, 11:19:42 AM8/15/17
to ISO C++ Standard - Future Proposals
On Tuesday, August 15, 2017 at 3:49:01 AM UTC+1, Thiago Macieira wrote:
On Monday, 14 August 2017 19:22:57 PDT Niall Douglas wrote:
> Consider for example an implementation of filesystem::path::exists() which
> tries to open the path to test for existence. It would open and then close
> a fd. If any code elsewhere in the process has byte range locks open on
> that inode, they get dropped.

Why would it open instead of access() with F_OK?

access() is an enormous security hole. I really wish it could be eliminated.

open()-close() is slow, but safe. Indeed Windows internally does exactly this when checking for a file to exist.

Niall

Thiago Macieira

unread,
Aug 15, 2017, 11:30:00 AM8/15/17
to std-pr...@isocpp.org
On Tuesday, 15 August 2017 08:19:42 PDT Niall Douglas wrote:
> access() is an enormous security hole. I really wish it could be eliminated.

Explain. Are you refering to a TOCTOU attack?

And how do you implement X_OK?

> open()-close() is slow, but safe. Indeed Windows internally does exactly
> this when checking for a file to exist.

For that matter, trying W_OK by opening for write is also a bad idea, since it
modifies the file (atime update) and could run afoul of sharing violation on
Windows.

Niall Douglas

unread,
Aug 15, 2017, 11:53:51 AM8/15/17
to ISO C++ Standard - Future Proposals
 
but there are a couple of questions here. Why would you need check if a file exists if you've just locked it? similarly why would you be separately opening and closing the same file descriptor?

It's more that other code, say in a library, may just happen to open-close an inode you have byte range locks open on. At which point, all your locks drop.
 

This sounds like a multi-threading issue. I'd agree that posix is weaker where threads are concerned as the locking and signalling APIs predate that.

It has nothing to do with threads. It has to do with inability to prevent different bits of code interacting in unpredictable ways.

Jeff ... (I forget his last name, he's well known in storage circles) has implemented less broken POSIX locks into recent Linux kernels. They are called OFD locks. They lock by fd, not by inode. A lot better, but still with all the other remaining problems. On the Austin Working Group reflector I have argued for completely new locks. Me and Jeff were supposed to write a paper, but then I got in a work contract, and it fell by the wayside.
 
The expectation is that you use processes instead. Its a fundamentally different philosophy than on windows where processes are expensive and threads are king.
Processes are much cheaper on Linux than windows (that is of course not mandated by posix).

That's a misconception. NT processes are actually just as light as POSIX ones. You can fork a process at the NT kernel level no problem.

It's actually all the Win32 stuff. MSVCRT in particularly blows up spectacularly if you fork it. But it could be easily fixed if Microsoft wanted to. You'd then find COM, Advapi and all that would then blow up on fork. The Win32 stuff was simply never designed for it.

If you run the Linux Subsystem for Windows, there you'll see fork working lovely. I just wish they'd implement file locks for WSL, right now databases just corrupt in front of you because the Linux syscall emulation flat out lies.
 

This does create a lot of difficulty in trying to write an application that is portable between windows and Linux.

However, you can work around that by designating only one thread is responsible for each lock. It works a bit less well for signals.

Or just use AFIO which was designed for portable sub-1us latency filesystem algorithms in modern i.e. threaded code, and is post a Boost peer review already, so it's about one quarter of the way to getting into the standard :)
 
 
The descriptions here (https://msdn.microsoft.com/en-us/library/windows/desktop/aa365713(v=vs.85).aspx) a little off putting
The description of locking a file by using a function called CreateFile() twice on the same file is far from an intuitive API.

Of course it's the semantics that are of interest here and I need to read more to grok the design.

"For example, in the execution of a batch file, the batch file may be opened and closed once for each line of the file. A batch opportunistic lock opens the batch file on the server and keeps it open. As the command processor "opens" and "closes" the batch file, the network redirector intercepts the open and close commands."
 
spoken like a OS that isn't littered with thousands of little scripting languages :).

Ignore the Microsoft documentation. Most of it is wrong.

Trust the Samba documentation and especially its code. It's correct.

Oplocks work very well over a reliable network, but go pathological if even a little packet loss appears. But they're the right approach, they distribute the locking across the network giving optimum (i.e. local) performance whenever possible. Very clever.
 

No, it's poor quality of implementation in some kernels and/or filing systems. In some cases they scale exponentially inverse to physical CPU count for example. Wonderful.
 
That would be a bug report then. Though one requiring a complete re-design is quite hard to fix.

The API provided meets the POSIX specification. So it'll be a wontfix.
 

If the kernel supplied implementation is really lousy - and everywhere but FreeBSD it is - then you're better off.

Is that down to bugs or does FreeBSD tighten up and improve on the basic posix specification?

FreeBSD's kernel leadership really care about the filesystem. That hasn't traditionally been the case in Linux.

FreeBSD contains lots of great improvements over POSIX e.g. kqueues, which if it weren't for Linus then Linux would also have gained years ago because it's the correct design. I really wish POSIX would just standardise it already, but it's contentious. A lot of influential people oppose kqueues.

FreeBSD also contains a best in class file i/o implementation. Very close to the POSIX spec. Very high quality implementation of locks etc. Async i/o design throughout, auto DMA friendly like Windows, it's easy to write high performance filesystem code for Windows and FreeBSD which performs very similarly.

Linux is and has always been the worst to support of them all. Highly nonconforming, lousy implementation of locks, doesn't implement O_SYNC right, the list goes on. Though its O_DIRECT is now finally sane and can be trusted, again, despite Linus' best efforts.

OS X is infamous for frequent showstopper bugs like writes to a mmap not being written to disc ever. But once reported, they do fix them quick. And apart from the awful async i/o implementation, OS X is pretty good otherwise, though they took forever to implement the race free POSIX API. Oh, and fsync is non blocking :(.
  

When you have multiple processes working on the same data, often you need to synchronise. You try not to of course, you exploit the natural synchronisation built into i/o which is actually finally viable as of just these past months thanks to Microsoft fixing Windows to follow POSIX i/o concurrency guarantees. You need the very latest Windows 10 however, and a pretty recent Linux kernel. But it does work, and it's portable-ish.

AFIO uses a list based system for its locking algorithms, so process A says it'll be locking 5, 22 and 13. Process B says it'll be locking 6, 99, and 13. The synchronisation will happen on the 13.

The numbers are arbitrary and mean whatever the application chooses them to mean. Under the bonnet, the four implementations have very different approaches to implementation. Some have amazing performance but are anti-social. Others scale amazingly. Some are NFS/Samba friendly. Some are shared memory only.

afio::shared_fs_mutex is an abstract base class, so runtime code can swap implementations and higher level code doesn't need to care how the locking works.
 
Have you considered trying to specify an improved OS APIs that could be aspired to?
It is exactly this kind of implementation experience from which such things tend to drop out.

I'm new to WG21, but have been involved in standards for a long time. Indeed, I was once the SC22 mirror convenor for my country.

But ultimately it comes down to spare capacity. You see lots of progress when I'm between contracts like now. But having been without income since Christmas, another 12-18 months of paying work beckons soon during which there will be no progress. Gotta go earn. I've been working on AFIO since 2012. I expect it won't land before WG21 until 2020 or so. Best I can do without a sponsor.

And we can't persuade POSIX to change until some major implementation makes the changes existing practice. C++ counts. So, if we standardise AFIO into C++, that lets us fix POSIX. Until then they'll reject all changes as having no existing practice.

Niall

Niall Douglas

unread,
Aug 15, 2017, 11:59:25 AM8/15/17
to ISO C++ Standard - Future Proposals
On Tuesday, August 15, 2017 at 4:30:00 PM UTC+1, Thiago Macieira wrote:
On Tuesday, 15 August 2017 08:19:42 PDT Niall Douglas wrote:
> access() is an enormous security hole. I really wish it could be eliminated.

Explain. Are you refering to a TOCTOU attack?

Mainly. But access() is problematic in lots of other ways too. One should really not implement exists() with it, it's the wrong API.
 

> open()-close() is slow, but safe. Indeed Windows internally does exactly
> this when checking for a file to exist.

For that matter, trying W_OK by opening for write is also a bad idea, since it
modifies the file (atime update) and could run afoul of sharing violation on
Windows.

Just open the file without any permissions to do anything including update metadata (which is an individual permission on Windows). Windows implements this with a special quick fast path, normal NtCreateFile() takes tens of microseconds, unprivileged NtCreateFile() is usually single digit microseconds.

POSIX has a problem of course because O_RDONLY is usually zero.

Niall

Thiago Macieira

unread,
Aug 15, 2017, 1:47:24 PM8/15/17
to std-pr...@isocpp.org
On Tuesday, 15 August 2017 08:59:24 PDT Niall Douglas wrote:
> On Tuesday, August 15, 2017 at 4:30:00 PM UTC+1, Thiago Macieira wrote:
> > On Tuesday, 15 August 2017 08:19:42 PDT Niall Douglas wrote:
> > > access() is an enormous security hole. I really wish it could be
> >
> > eliminated.
> >
> > Explain. Are you refering to a TOCTOU attack?
>
> Mainly. But access() is problematic in lots of other ways too. One should
> really not implement exists() with it, it's the wrong API.

The TOCTOU attack is not the fault of the library or the access() function,
but that of the upper code that checked for existence before trying the
operation it was going to do if the file existed.

So you have two choices: provide a function that returns the existence of the
file or force the user to not check for it. If you choose to provide it, then
you have to use either access(F_OK) or stat(), but the latter is more
expensive than the former.

> > > open()-close() is slow, but safe. Indeed Windows internally does exactly
> > > this when checking for a file to exist.
> >
> > For that matter, trying W_OK by opening for write is also a bad idea,
> > since it
> > modifies the file (atime update) and could run afoul of sharing violation
> > on
> > Windows.
>
> Just open the file without any permissions to do anything including update
> metadata (which is an individual permission on Windows). Windows implements
> this with a special quick fast path, normal NtCreateFile() takes tens of
> microseconds, unprivileged NtCreateFile() is usually single digit
> microseconds.
>
> POSIX has a problem of course because O_RDONLY is usually zero.

You can even emulate F_OK on an unreadable file by checking that you got an
EPERM or EACCES error instead of ENOENT or ENAMETOOLONG or ELOOP. The problem
is what happens if you succeed and the file is a FIFO: you can't close it
again, lest the other side of the pipe be notified of the closure.

That doesn't solve the problem of X_OK.

Niall Douglas

unread,
Aug 15, 2017, 5:32:09 PM8/15/17
to ISO C++ Standard - Future Proposals
 
> Mainly. But access() is problematic in lots of other ways too. One should
> really not implement exists() with it, it's the wrong API.

The TOCTOU attack is not the fault of the library or the access() function,
but that of the upper code that checked for existence before trying the
operation it was going to do if the file existed.

So you have two choices: provide a function that returns the existence of the
file or force the user to not check for it. If you choose to provide it, then
you have to use either access(F_OK) or stat(), but the latter is more
expensive than the former.

Neither choice is sound.

The correct choice is to ban the use of absolute paths everywhere except the open() syscall. That encourages programmers to not write racy code.

You'll see this in AFIO. We never store a path. Paths = race conditions. It also means that all of AFIO's classes have trivial storage, which in turn means no mallocs and no performance surprises which is important when you have a 1 microsecond budget.

Also, absolute paths are slow. Most kernels implement a mutex per inode, so traversing a fifteen deep path means fifteen mutex locks and unlocks minimum, and inode mutexs tend to be of the more heavy RW type mutex. FreeBSD even implements a "path cache" to speed up path-to-inode lookups for recently traversed paths, so even more overhead.

Finally, the very fastest way of checking for existence of a file is actually to enumerate its containing directory via faccessat on POSIX or a single file globbed enumeration call on Windows. Orders of magnitude faster than access() or anything which works with paths.

Niall

Thiago Macieira

unread,
Aug 15, 2017, 6:47:17 PM8/15/17
to std-pr...@isocpp.org
On Tuesday, 15 August 2017 14:32:08 PDT Niall Douglas wrote:
> > So you have two choices: provide a function that returns the existence of
> > the
> > file or force the user to not check for it. If you choose to provide it,
> > then
> > you have to use either access(F_OK) or stat(), but the latter is more
> > expensive than the former.
>
> Neither choice is sound.
>
> The correct choice is to ban the use of absolute paths everywhere except
> the open() syscall. That encourages programmers to not write racy code.

That is telling the programmer to open the file, not check if it exists or is
readable without opening it.

If you have an API that allows checking for the existence without opening,
then it is racy. But if you have that API, there's nothing better than access
or faccessat.

> You'll see this in AFIO. We never store a path. Paths = race conditions. It
> also means that all of AFIO's classes have trivial storage, which in turn
> means no mallocs and no performance surprises which is important when you
> have a 1 microsecond budget.

Path, whether relative or absolute, is not the issue. I was reading Raymond
Chen's post yesterday on hardlinking, which is relevant to this discussion:
https://blogs.msdn.microsoft.com/oldnewthing/20170707-00/?p=96555

One of the commenters asks
"According to the well-known UX guidelines (context) menu entries which can’t
be executed should not be shown to the user. [...] So: how should a proper
implemented shell extension handle it?"

That's similar to the case here. Sometimes the user needs to know if a file
exists before trying to open it. I know it's racy, but if the UX requires it,
the programmer needs to implement it. That means the library needs to provide
an API to check if the file exists.

The fact that you checked that the file is readable does not imply you will be
able to open it later. Regardless of how it's implemented.

Niall Douglas

unread,
Aug 15, 2017, 10:26:20 PM8/15/17
to ISO C++ Standard - Future Proposals

> You'll see this in AFIO. We never store a path. Paths = race conditions. It
> also means that all of AFIO's classes have trivial storage, which in turn
> means no mallocs and no performance surprises which is important when you
> have a 1 microsecond budget.

Path, whether relative or absolute, is not the issue.

People tend to use absolute paths despite that they are (a) slower (b) racier.

I'm a great believer in that programmers are lazy, and will take the least effort approach. So AFIO makes using absolute paths annoying, and relative paths easy. Plus we sprinkle over the docs "don't use absolute paths unless you really have to".
 
I was reading Raymond
Chen's post yesterday on hardlinking, which is relevant to this discussion:
https://blogs.msdn.microsoft.com/oldnewthing/20170707-00/?p=96555

His answer incidentally is severely flawed. NTFS implements a relatively low maximum 1023 hard link limit. It's easy to run into, and would confound his "easy" solution.
 

One of the commenters asks
"According to the well-known UX guidelines (context) menu entries which can’t
be executed should not be shown to the user. [...] So: how should a proper
implemented shell extension handle it?"

That's similar to the case here. Sometimes the user needs to know if a file
exists before trying to open it. I know it's racy, but if the UX requires it,
the programmer needs to implement it. That means the library needs to provide
an API to check if the file exists.
 
I am not disagreeing. That API ought to be enumeration of the containing directory i.e. faccessat(). It's the least worst design.

Niall

torto...@gmail.com

unread,
Aug 21, 2017, 1:07:37 PM8/21/17
to ISO C++ Standard - Future Proposals
Earlier in this thread it was suggested I survey some existing implementations of std::basic_filebuf to see what they do.
I have done some work in this direction and the results are below.

I've stuck to open source implementations I could easily get hold of and visual c++ as the big player on windows and as the prime non-POSIX exemplar.


libstdc++ from GNU g++

filebuf is implemented on top of a file handle class "__basic_file<char>"
which is implemented on top of __c_file* which turns out to be a typedef for FILE*

It also provides out of the box two extensions relevant here:

libstdc++-v3/include/ext/stdio_filebuf.h  - implemented directly over FILE*
libstdc++-v3/include/ext/stdio_sync_filebuf.h  - implemented over FILE* and POSIX integer FD

both are derived from std::basic_filebuf

stdio_sync_filebuf has operations:
      constructor accepting POSIX fd
      constructor accepting FILE*
      fd()    - to return the POSIX fd
      file()   - to return a C FILE*

stdio_filebuf has:
      constructor accepting FILE*
      file()   - to return a C FILE*

The 'sync' variant provides numerous sync functions (hence the name).


clang


fstream is implemented in terms of FILE*

libcxx-4.0.1.src/include/fstream
    FILE* __file_;

It does not accept a FILE* constructor not provide a getter function.


apache stdcxx (the Roguewave standard library implementation was imported in 2007)

implementation is based on FILE*

std::filebuf has (non-standard) constructors accepting either FILE* or FD
and functions:

int fd()  -  returning a POSIX fd
attach(int fd) - to attach to an existing fd
dettach() - to detach from and close the underyling fd


open watcom v2

not sure this is widely used but it is at least open source and therefore easy to assess.

filebuf is implemented in terms of filedesc which is a integer fd.

It has attach() and fd() methods accepting and returning a POSIX style FD.


visual c++
from MSDN you can see the API but not the private members.
Surprisingly access to POSIX style integer FDs appear to be supported out of the box

filedesc fd() const;
filebuf* attach( filedesc fd );
https://msdn.microsoft.com/en-us/library/aa243812(v=vs.60).aspx
https://msdn.microsoft.com/en-us/library/aa243810(v=vs.60).aspx

I can't actually find these in the implementation so it may be squirrelled away in the POSIX subsystem.

To access the implementation I had download and install vc++ build tools on windows

fstream contains basic_filebuf implemented over _Filet*

Filet* turns out to be:

yvals.h:
  #define _Filet FILE

Not a HANDLE in sight!

I would be interested in data points from other implementations to which I don't have access
or pointers to others I could easily find online and should consider.

To summarise my admittedly far from thorough implementation review it would seem that:

Posix only implementations are typically based on integer FDs.
All other implementations are ultimately based on FILE*

Accessing the underlying implementation is a common extension.
g++ has required it to be via a downcast as suggested earlier.
The others that allow it, allow it directly.

A separate stdio_filebuf was previously justified on the grounds that implementations should not be forced to be based on FILE*.
Does this still stand?

I have not found an implementation where FILE* would not be accessible in principle.
If we have a posix FD we can (mostly) get a FILE* and visa versa.

Does this mean we don't need native_filebuf either?

Another justification for stdio_filebuf and native_filebuf would be to keep the interface of filebuf free
from platform & implementation dependent parts.
Does FILE* count as it is part of cstdio?

What harm would exposing a FILE* API directly cause?
Perhaps "who would be harmed?" is a better question as I don't think the implementations I've surveyed so far would be.

Likewise what harm would exposing an FD based API if _POSIX_C_SOURCE is defined cause?

Bruce.






torto...@gmail.com

unread,
Aug 22, 2017, 4:11:18 AM8/22/17
to ISO C++ Standard - Future Proposals
Note that I am not advocating one approach over the other but require arguments to justify additional complexity.

torto...@gmail.com

unread,
Aug 22, 2017, 9:20:53 PM8/22/17
to ISO C++ Standard - Future Proposals
Earlier in this thread it was suggested that requiring std::filebuf to be a stdio_filebuf would unnecessarily constrain the implementation.
I am not sure that this is the case. My reasoning is as follows. Please point out if and where it is flawed.

assertion: iosteams must be implemented in terms of FILE* or a lower level 'native' file descriptor

consider any hypothetical operating system that implements file system IO.
At minimum it must provide an interface like the following (ignoring many embellishments that would be necessary):

   FileHandle os_file_open(const char* file);
   int os_file_read(FileHandle, size_t size, void* buffer);
   int os_file_write(FileHandle, size_t size, void* buffer);

FileHandle could be:
* a integer (Posix style fd)
* a pointer to something (OSHandle* not necessily FILE*)
* a value type "struct OSHandle"

How does this OS implement cstdio?
In particular consider:

FILE* fopen(const char *path, const char *mode);

There must be a mapping from FileHandle to FILE*
E.g.

FILE* fopen(const char *path, const char *mode)
{
   FileHandle handle = os_file_open(path);  //ignore mode for simplicity
   return CFile_from_FileHandle(handle);
}

Now consider the implementation of std::filebuf

filebuf* open (const char* filename,  ios_base::openmode mode);

filebuf must contain either FileHandle in order to use the low level functions:

class filebuf
{
public:
   filebuf* open (const char* filename,  ios_base::openmode mode)
   {
      this->handle = os_file_open(filename); //ignoring mode for brevity
      return this;
   }
private:
   FileHandle handle;
};

or it can use cstdio FILE*:

class filebuf
{
public:
   filebuf* open (const char* filename,  ios_base::openmode mode)
   {
      this->cfile = fopen(filename,to_stdio_mode(mode)); //ignoring mode for brevity
      return this;
   }
private:
   FILE* cfile;
};

Given that CFile_from_FileHandle(handle) must exist. It is trivial to implement:

FILE* std::filebuf::file() const
{
   return CFile_from_FileHandle(this->handle);
}

or:

FILE* std::filebuf::file() const
{
   return this->cfile;
}

as necessary.

Likewise to implement:

   size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

in terms of:

   int os_file_read(FileHandle, size_t size, void* buffer);

there must be functionality to create a FileHandle from a FILE*, for example:;

size_t fread(const void *ptr, size_t size, size_t nmemb,
              FILE *stream)
{
   return os_file_read(FileHandle_To_CFile(stream),size,ptr); //nmemb ignored for brevity
}

This means it should be straight forward to implement a std::filebuf constructor taking a FILE*.

By the same logic #ifdef _POSIX_C_SOURCE should imply accessing the FD is equally straight forward.

I see no constraints on the implementation here.

Is filebuf <--> FILE* a dangerously leaky abstraction?

filebuf <--> fd is 'dangerous' because an fd might not be a file descriptor. It could be a plain integer or a socket, fifo or pipe for example.

C++ permits us to be low-level and accept the consequences. Is there actually any need to wrap these conversions to make them safer?
I think the case can be made for a FD as a naked integer is not type-safe but not anything else.

If I was to write this as a proposal what do stdio_filebuf and or native_filebuf make possible or prevent that altering std::filebuf would not?

Bruce.

T. C.

unread,
Aug 23, 2017, 1:47:14 AM8/23/17
to ISO C++ Standard - Future Proposals
The problem with your argument is that CFile_from_FileHandle need not return the same FILE* when called multiple times with the same file handle value, but your file() had better return the same value when called multiple times.

torto...@gmail.com

unread,
Aug 23, 2017, 3:47:55 AM8/23/17
to ISO C++ Standard - Future Proposals


On Wednesday, 23 August 2017 06:47:14 UTC+1, T. C. wrote:
The problem with your argument is that CFile_from_FileHandle need not return the same FILE* when called multiple times with the same file handle value, but your file() had better return the same value when called multiple times.

Can you give an example where it legitimately could not?


T. C.

unread,
Aug 23, 2017, 4:11:42 AM8/23/17
to ISO C++ Standard - Future Proposals
CFile_from_FileHandle is not a trivial mapping. A FILE* points to a FILE, which, while opaque in the standard, typically contains additional data beyond the native file handle (e.g., pointers to buffers). To implement fopen(), it suffices to malloc() a FILE, initialize it appropriately and return a pointer to it (freeing it in fclose()). You can't do that in your file().

So...how do you plan to do it without maintaining a global fd-to-FILE* map?

torto...@gmail.com

unread,
Aug 23, 2017, 4:23:12 AM8/23/17
to ISO C++ Standard - Future Proposals
The C runtime for the OS must do that anyway to implement stdio. The OS could have low-level functionality of its own to do that. In a C based OS they may well be one and the same but it is not required.

T. C.

unread,
Aug 23, 2017, 4:39:38 AM8/23/17
to ISO C++ Standard - Future Proposals


On Wednesday, August 23, 2017 at 4:23:12 AM UTC-4, Bruce Adams wrote:


On Wednesday, 23 August 2017 09:11:42 UTC+1, T. C. wrote:


On Wednesday, August 23, 2017 at 3:47:55 AM UTC-4, Bruce Adams wrote:


On Wednesday, 23 August 2017 06:47:14 UTC+1, T. C. wrote:
The problem with your argument is that CFile_from_FileHandle need not return the same FILE* when called multiple times with the same file handle value, but your file() had better return the same value when called multiple times.

Can you give an example where it legitimately could not?


CFile_from_FileHandle is not a trivial mapping. A FILE* points to a FILE, which, while opaque in the standard, typically contains additional data beyond the native file handle (e.g., pointers to buffers). To implement fopen(), it suffices to malloc() a FILE, initialize it appropriately and return a pointer to it (freeing it in fclose()). You can't do that in your file().

So...how do you plan to do it without maintaining a global fd-to-FILE* map?

The C runtime for the OS must do that anyway to implement stdio.

The C runtime does not need to maintain a data structure that allows you to lookup a FILE* given a file handle. Your CFile_from_FileHandle need only be called once per file handle, in fopen. Everything in the C library receives FILE*, and to call into the OS it suffices to be able to retrieve the file handle from a FILE*, not the other way around.

If you still insist that this is possible, can you provide a toy implementation on, say, linux+glibc?


torto...@gmail.com

unread,
Aug 23, 2017, 2:54:47 PM8/23/17
to ISO C++ Standard - Future Proposals
With regards to glibc there is no need for a toy implementation. The current implementation can already do it - see libstdc++-v3/include/ext/stdio_filebuf.h
Indeed I suspect its trivial (but open to misuse) on any sane POSIX implementation supporting fileno() & fdopen().

You seem to asserting that the mapping is not symmetrical. That may be difficult to achieve in practice.
I have only used Handle->CFile. You seem to be suggesting the CFile->Handle is trivial but Handle->CFile is not.
How would you implement fopen() without Handle->CFile?

What could be different in the code used by std::fopen() that makes it illegal to use in filebuf::file() ?


 

T. C.

unread,
Aug 23, 2017, 3:00:22 PM8/23/17
to ISO C++ Standard - Future Proposals


On Wednesday, August 23, 2017 at 2:54:47 PM UTC-4, Bruce Adams wrote:
With regards to glibc there is no need for a toy implementation.

A toy implementation that only stores a file descriptor and converts to FILE* on demand. stdio_filebuf is not that.


What could be different in the code used by std::fopen() that makes it illegal to use in filebuf::file() ?

I told you what. Several times. fopen can just malloc a FILE and return a pointer to that. Likewise for fdopen. Your file() cannot, because multiple calls to it need to return the same FILE*.

torto...@gmail.com

unread,
Aug 23, 2017, 7:57:43 PM8/23/17
to ISO C++ Standard - Future Proposals


On Wednesday, 23 August 2017 20:00:22 UTC+1, T. C. wrote:


On Wednesday, August 23, 2017 at 2:54:47 PM UTC-4, Bruce Adams wrote:
With regards to glibc there is no need for a toy implementation.

A toy implementation that only stores a file descriptor and converts to FILE* on demand. stdio_filebuf is not that.

You are suggesting a toy implementation is more useful than a real working implementation?
 

What could be different in the code used by std::fopen() that makes it illegal to use in filebuf::file() ?

I told you what. Several times. fopen can just malloc a FILE and return a pointer to that. Likewise for fdopen. Your file() cannot, because multiple calls to it need to return the same FILE*.

 Okay lets work through that scenario:

FILE* fopen(const char *path, const char *mode)
{
   FileHandle handle = os_file_open(path);  //ignore mode for simplicity
   FILE* stdHandle = NULL;
   mallocate(stdHandle);
   initialiseStdioHandleFromOsHandle(stdHandle, handle);
   return stdHandle;
}

Lets ignore the case where std::filebuf is implemented on top of FILE* and assume it uses FileHandle:

class filebuf
{
public:
   filebuf():
      handle(),
      stdioHandle(nullptr)
   {
   }

   filebuf* open (const char* filename,  ios_base::openmode mode)
   {
      this->handle = os_file_open(filename); //ignoring mode for brevity
      return this;
   }

   FILE* file()
   {
      if (this->stdHandle == nullptr)
      {
         mallocate(this->stdHandle);
         initialiseStdioHandleFromOsHandle(this->stdHandle, this->handle);
      }
      return this->stdHandle;
   }

private:
   FileHandle handle;
   FILE* stdHandle;
};

Okay so I've been forced to add a FILE* member which is initialised on demand.
That is one possible implementation with a slight bloat to filebuf.
Is that a constraint too far?

Now there is a problem with this if you want to add a filebuf constructor from a native FileHandle which already maps to some other FILE* object.
i.e.
  FILE* myFile = open(fileName...);
  FileHandle handle = getNativeHandle(myFile);
  filebuf myBuff(FileHandle).
  FILE* myBuffsFile = myBuff.file();
  assert(myBuffsFile != myFile);

However, if you know that's true you would be better of using a FILE* constructor:

filebuf::filebuf(FILE* stdFile):
   handle(getNativeHandle(stdFile),
   stdioHandle(stdFile)
{
}

Thiago Macieira

unread,
Aug 23, 2017, 8:29:23 PM8/23/17
to std-pr...@isocpp.org
On Wednesday, 23 August 2017 16:57:43 PDT torto...@gmail.com wrote:
> That is one possible implementation with a slight bloat to filebuf.
> Is that a constraint too far?

There's one very important difference between using FILE* and using the
underlying OS: the stdio buffers. Unless you remember to flush, this may come
out in the wrong order:

fwrite("Hello", 5, 1, buf->file());
buf->write("World", 5);

An implementation that always uses FILE* will always be synchronised. An
implementation that can run without FILE* will need to have two codepaths: one
for when it doesn't have FILE* and one for when it does. That leads me to the
conclusion that it's better to just use FILE* directly and avoid the native
codepath.

torto...@gmail.com

unread,
Aug 24, 2017, 3:52:04 AM8/24/17
to ISO C++ Standard - Future Proposals
Or to have separate stdio_filebuf and native_filebuf classes

but which is std::filebuf or is it a different beast altogether?

[we also have sync_with_stdio() for that case though that only applies to the standard streams so is easy to special case.
Besides which they are viewed as iostreams not fstreams regardless of implementation or they'd be in the fstream header]
 
I'm not sure you can mandate a native_filebuf as it is inherantly implementation dependent.
For an implementation based on FILE* it would be redundant.

On the other hand if you know you have a FILE* you can use setvbuf and its ilk to control the use buffering
and thus whether you need to sync.

I guess a subtle but important difference between a stdio_filebuf and a native_filebuf is that the buffering is (I think) always on by default
for stdio. You can't do the equivalent of setvbuf() when you fopen() a file but the buffers could be lazily constructed.

I'm no longer sure what the right thing to put in a proposal (other than the choices) should be.

torto...@gmail.com

unread,
Aug 24, 2017, 4:44:55 AM8/24/17
to ISO C++ Standard - Future Proposals
I guess the problem is that this lies on the cusp of C interoperability, low-level OS interfacing, POSIX interoperability & C++.

I'll bite the bullet and have one more crack at it. Here's the new idea:

1)
require a new stdio_filebuf supporting construction using FILE* and FILE* file() const to extract it.    - I don't think this is contraversal

2)
require a new native_filebuf supporting construction using native_filedescriptor and a getDescriptor method to obtain it.
FileDescriptor is left implementation defined.

3)
#ifdef _POSIX_C_SOURCE

introduce a posix namespace and require

// a non-naked file descriptor
struct std::posix::file_descriptor
{
   int fileDescriptor
};

A std::posix::filebuf class having a constructor allowing a std::posix::file_descriptor in its constructor
and returning the same via a fileno() method.

This is more contraversal as there is currently no C++ posix binding but why not start here?

4)
It is implementation defined whether std::filebuf is the same as (i.e. a typedef of) std::stdio_buffer or std::native_filebuf or something else.

It is implementation whether std::native_filebuf is the same as (i.e. a typedef of) std::posix::filebuf

It is implementation whether std::file_descriptor is the same as (i.e. a typedef of) std::posix::file_descriptor where _POSIX_C_SOURCE is defined

It is unsafe to rely on this in portable code.

5)
Conversion between the various filebuf types.

Direct conversion is not permitted.

A std::stdio_filebuf may be constructed from the other types if also given a FILE*
A std::native_filebuf may be constructed from the other types if also given a std::file_descriptor
A std::posix::filebuf may be constructed from the other types if also given a std::posix::file_descriptor

So we provide 'copy-like' constructors like:

std::stdio_filebuf::stdio_filebuf(std::filebuf buf, FILE*)

This allows implementations to copy the buffer safely rather than allocating a new one.

---
More complicated that originally planned. What do people think?


Ville Voutilainen

unread,
Aug 24, 2017, 4:58:12 AM8/24/17
to ISO C++ Standard - Future Proposals
Strike 3, 4 and 5 and it looks fine. I think stdio_filebuf and
native_filebuf need to be mandated to be different
types from filebuf. The posix-specific type is useless for non-posix
users and is best left to some other specification.
I don't see how you construct a native_filebuf from something that is
not a native_filebuf even if you provide
a file descriptor additionally, and same goes for the other conversions.

torto...@gmail.com

unread,
Aug 24, 2017, 7:24:28 PM8/24/17
to ISO C++ Standard - Future Proposals
3)
A posix binding for C++ is indeed a different beast. I'm not clear whether that would come from C++ or from Posix.
My instinct is that it should come from the subset of C++ people that are posix aware rather than the subset of Posix people who are C++ aware (ignoring the Venn diagram).
There is a lot of semi-standard stuff to build on there e.g. errno exceptions.
I threw it in there as food for thought because I am thinking that on Posix a native_filebuf would typically be exactly the same as a posix_filebuf.
I want to see how they would cohabit in principle.

4)
What is the reason for mandating that stdio_filebuf *must* not be a typedef for filebuf and likewise for native_filebuf?
How would you enforce that?

All I can think is to make sure that no methods are accidentally exposed which are not mandated in the interface of filebuf itself.
However, many implementations add non-standard methods to the standard interfaces already. This isn't always a good thing but
shouldn't the standard specify the minimum capability rather than the maximum in most cases?


Consider the GNU libstdc++ implementation.
They have an implementation of stdio_filebuf
It derives from std::filebuf but adds no additional member variables.
and it adds both FILE* and POSIX FD constructors and accessors.

So here we have a case where stdio_filebuf == native_filebuf == filebuf in terms of member variables at least.

They have a constructor which takes a std::filebuf and gives a stdio_filebuf (== native_filebuf).
This is only possible in their implementation because it uses FILE* under the hood and Posix provides fileno() to get the FD.

I was trying to think how you get from one to the other portably/safely.

The working definition of a native_filebuf is that is can be constructed from a native file descriptor and provides a method
to get it. It could have it as a member.

With regards to item 5 its not so much that you are constructing a native_filebuf from some other filebuf.
You are still constructing it from the native file descriptor but saying that it relates to this other class in the filebuf family which is
in some way 'about the same' file.

The implementation might decide to do something special with that. Maybe the equivalent of setvbuf() or ios::tie().
I don't have a good motivating use case there,

5) was not a good answer to the conversion problem

A question to ask is why would you need to convert?

I suppose if you wanted C style buffering but wanted to use fcntl for posix advisory locks.
You want to drop down to the low-level descriptor for a file you've already opened with stdio_filebuf::open()  (using fopen() internally)
or climb up from it for a file you've already opened with native_filebuf::open()  (using open() internally)

We want to do that without requiring use of fdopen or fileno() as those are posix specific.
I want a C++ friendly wrapper to hide that if possible.

For the Linux case you can have direct conversions. i.e.

class stdio_filebuf
{
   stdio_filebuf(native_filebuf& buf):
      stdioFileHandle(fdopen(buf.fileno(),
                             getStdioMode(buf.getMode())
  {
  }

  operator native_filebuf() { return native_filebuf(*this); }
};

but that may not work in the general.

It needs to be examined for a non-posix case.

A near miss on another Posix use case is wanting to use fprintf to write to a PID file. The PID file is opened using open() with O_EXCL
which has no analogue in fopen(). It turns out that a fprintf for FDs, dprintf() was standardised by POSIX in 2008.
I don't think there is an analogue for fscanf(). There you still need to use fdopen().
Of course in C++ you can just use operator<< and operatior>> instead (not to mention format library proposals under consideration already) so its mostly academic.

On related notes:

What other non-posix platforms merit consideration? (preferably with source available)

What use cases are there for native_filebuf outside fcntl on Posix (e.g. on windows)?

RAII is one. The native file handle will be closed when the buffer is destroyed.
That is enough in itself but are there any others for completeness?

 

torto...@gmail.com

unread,
Sep 7, 2017, 9:17:00 PM9/7/17
to ISO C++ Standard - Future Proposals
I've created a first draft for comment based on this discussion and a little further thought.


access_file_descriptors.pdf
Reply all
Reply to author
Forward
0 new messages