home_directory for filesystem

2,182 views
Skip to first unread message

david.b...@gmail.com

unread,
Sep 9, 2017, 7:37:00 AM9/9/17
to ISO C++ Standard - Future Proposals
There is a root_directory() function to get a root directory, but there is no function to get a home directory.

I would like to propose a new function - home_directory().
Returns a home directory.

It would easy to implement, since it can use environment variables to get path to a home directory:
Unix: HOME
Windows: USERPROFILE (or HOMEDRIVE+HOMEPATH)

Niall Douglas

unread,
Sep 9, 2017, 6:57:31 PM9/9/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com
And what happens if the environment has been zopped? This is not uncommon. Untrusted processes are frequently run with a sanitised environment under a user id which has no home.

What if the environment has been set by a malicious actor?

What happens if someone uses TOCTOU to do timing attacks to swap the home directory for another in the middle of program execution?

If your answer is "do nothing", then I see no value add for this facility. The hard part in implementing this, and why it is missing in the Filesystem TS, is surveying all the proprietary mechanisms each OS has for determining a true home directory for any arbitrary process without using environment variables, and from that survey establishing a standards-quality proposal for the semantics which can be implemented by 99% of the OSs out there such that home_directory() always works correctly and safely.

The cost to benefit ratio is hard to argue in favour of. AFIO, which may become the File I/O TS one day, doesn't implement it publicly either. Beman rightly placed it out of scope for the Filesystem TS.

Niall

torto...@gmail.com

unread,
Sep 9, 2017, 7:12:19 PM9/9/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com


On Saturday, 9 September 2017 23:57:31 UTC+1, Niall Douglas wrote:
And what happens if the environment has been zopped? This is not uncommon. Untrusted processes are frequently run with a sanitised environment under a user id which has no home.

What if the environment has been set by a malicious actor?

What happens if someone uses TOCTOU to do timing attacks to swap the home directory for another in the middle of program execution?

If your answer is "do nothing", then I see no value add for this facility. The hard part in implementing this, and why it is missing in the Filesystem TS, is surveying all the proprietary mechanisms each OS has for determining a true home directory for any arbitrary process without using environment variables, and from that survey establishing a standards-quality proposal for the semantics which can be implemented by 99% of the OSs out there such that home_directory() always works correctly and safely.

The cost to benefit ratio is hard to argue in favour of. AFIO, which may become the File I/O TS one day, doesn't implement it publicly either. Beman rightly placed it out of scope for the Filesystem TS.

Niall

That seems an unduly hostile response. Just because a facility could be abused or subverted in some (I would argue unusual) circumstances doesn't seem a good reason to make it unnecessarily hard.
A common use case would be at the application level. E.g. cache some data in ~/.myappsrprefs when myapp is run by the user for the user. Is that really so dangerous?

To put it another way, why must you use something other than environment variables to find the home directory? and why is it so important that it be the "true" home directory rather than the obvious one as set by the environment?
The security implications don't seem that obvious. Perhaps documenting them would be sufficient?

Niall Douglas

unread,
Sep 9, 2017, 7:58:13 PM9/9/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com

That seems an unduly hostile response.

I re-read my reply. I saw no evidence of hostility. I did ask a set of motivating questions, and listed some of the good reasons it's not in the standard.
 
Just because a facility could be abused or subverted in some (I would argue unusual) circumstances doesn't seem a good reason to make it unnecessarily hard.

It's not hard to call getenv("HOME") if that's what you really want to do.
 
A common use case would be at the application level. E.g. cache some data in ~/.myappsrprefs when myapp is run by the user for the user. Is that really so dangerous?

Yes. Extremely.
 

To put it another way, why must you use something other than environment variables to find the home directory? and why is it so important that it be the "true" home directory rather than the obvious one as set by the environment?
The security implications don't seem that obvious. Perhaps documenting them would be sufficient?

Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.

Now, a `path_handle persistent_files_directory()` which returns an open handle to some directory associated with the process owning user account which is safe to write to, and is probably persistent, I could live with. Pull requests implementing such a thing are welcome at https://github.com/ned14/afio/pulls.

But adding it to the Filesystem TS I think would be very unwise because of all the problems involved. Unlike root_directory(), which is safe, any home_directory() returning an absolute path based on $HOME is highly unsafe And I am sure that enough on WG21 would agree with me that it won't be accepted as the OP proposed it, especially as getenv("HOME") is the trivial (though very unsafe) solution.

Niall

Arthur O'Dwyer

unread,
Sep 9, 2017, 8:06:22 PM9/9/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com
On Saturday, September 9, 2017 at 4:37:00 AM UTC-7, david.b...@gmail.com wrote:
There is a root_directory() function to get a root directory, but there is no function to get a home directory.

I think you've misunderstood the meaning of "root_directory()". The function in question is a member function of fs::path. It doesn't get "the" root directory (there's no such thing) but rather "the root directory portion of the given path."

    assert( fs::path("c:\\foo\\bar").root_name() / fs::path("c:\\foo\\bar").root_directory() == fs::path("c:\\") );

I'm not sure (on any file system) how to construct a root-directory that isn't named "/" or "\\", but I'm sure someone can enlighten me.
Anyway, root_directory() is just an accessor method of fs::path, similar to root_name(), parent_path(), filename(), stem(), extension(), and so on. Not knowing this hurts your presentation immensely.
 

I would like to propose a new function - home_directory().
Returns a home directory.

It would easy to implement, since it can use environment variables to get path to a home directory:
Unix: HOME
Windows: USERPROFILE (or HOMEDRIVE+HOMEPATH)


If you already know how to get what you want from environment variables, why not just use the environment variables? Plus, then you could add your own business logic for getting the install directory of your software, instead of just assuming the user wants you to put everything in $HOME.

If this function were to be added, the proper name would be fs::home_path(), since it returns a path. The two free functions that fit into this category already are — not fs::path::root_directory() but rather — fs::current_path() and fs::temp_directory_path().

current_path() exposes a variable that is already built into the operating system: the current working directory. It's equivalent to calling getcwd(). C++ has always needed a way to get at that variable, so it makes sense that <filesystem> added it.

temp_directory_path() is a little harder to justify. I see that the underlying Windows API provides GetTempPath(), even though all that does is look at a bunch of environment variables. My guess would be that <filesystem> was initially trying to provide a portable secure replacement for tmpnam / mkstemp, but that the actual "file creation" stuff proved difficult or controversial and got neutered at some point along the way, leaving only temp_directory_path() as a sort of useless-but-harmless vestigial organ.
I don't think real software can use temp_directory_path(), because "Where does WidgetServer(TM) store its temp files?" is absolutely the kind of thing that WidgetCorp would want to document in their user-facing documentation, and "i dunno lol, whatever temp_directory_path() does on your platform this week" is not an appropriate level of documentation.

Also, thinking in terms of appropriately general primitives, wouldn't you rather have a full-on tilde-expansion function that could expand "~/foo" to "/home/u/bob/foo" and "~alice/bar" to "/home/u/alice/bar" and "~root/baz" to "/baz" and so on? What's the point of getting just the current user-account's home, when "home" is a concept that applies equally to every user-account on the system?

HTH,
Arthur

Howard Hinnant

unread,
Sep 9, 2017, 8:13:36 PM9/9/17
to std-pr...@isocpp.org
I would use functionality like that. I’ve already had to roll my own, and it isn’t pretty:

https://github.com/HowardHinnant/date/blob/master/tz.cpp#L167-L237

And I’m not confident how portable it is. I’ve tried to cover all my bases, but bug reports that start with “X doesn’t work for my platform” are not vanishingly rare. It would be great if the std::lib could make this code simple and portable.

My code isn’t exactly looking for home directory, but the downloads directory. But the concept is in the ball park. There are a small number of common directories applications need, and don’t have portable access to.

I’m not sure environment variables are the way to do it (I didn’t go that route). But I won’t quibble with implementation details. I’d love to see a proposal in this area.

Howard

signature.asc
Message has been deleted

Thiago Macieira

unread,
Sep 10, 2017, 8:41:46 AM9/10/17
to std-pr...@isocpp.org
On Saturday, 9 September 2017 20:58:12 -03 Niall Douglas wrote:
> > Just because a facility could be abused or subverted in some (I would
> > argue unusual) circumstances doesn't seem a good reason to make it
> > unnecessarily hard.
>
> It's not hard to call getenv("HOME") if that's what you really want to do.

Except that using getenv on Windows for anything containing file paths is
wrong. You need to use _wgetenv. So your code now has #if.

But on Unix, like you said, the environment may have been corrupted. So why
use getenv, if you can just use getpwuid? Or, better, getpwuid_r?

The fact that all those options exist mean a convenience function that gets it
right, the first time, for you is of great use.

> > A common use case would be at the application level. E.g. cache some data
> > in ~/.myappsrprefs when myapp is run by the user for the user. Is that
> > really so dangerous?
>
> Yes. Extremely.

For one thing, you should use ~/.config/myappprefs and the other XDG
directories. Stop polluting my home dir.

> Now, a `path_handle persistent_files_directory()` which returns an open
> handle to some directory associated with the process owning user account
> which is safe to write to, and is probably persistent, I could live with.
> Pull requests implementing such a thing are welcome
> at https://github.com/ned14/afio/pulls.

Agreed, a handle is better. But please find a name people will recognise, like
the subject suggests. And I also advise a set of functions for the XDG
directories too, especially the XDG Runtime Dir. See QStandardPaths for more
examples.

> But adding it to the Filesystem TS I think would be very unwise because of
> all the problems involved. Unlike root_directory(), which is safe, any
> home_directory() returning an absolute path based on $HOME is highly unsafe
> And I am sure that enough on WG21 would agree with me that it won't be
> accepted as the OP proposed it, especially as getenv("HOME") is the trivial
> (though very unsafe) solution.

Then let the API return an object that may or may not contain an opened file
descriptor to the correct dir, to avoid TOCTOU attacks. How it obtains that
file descriptor is its own business, but the fact that it may have something
smarter than getenv() is all the more reason to add such a method.

Then you can argue with Ville about implementing in libstdc++ a D-Bus call to
systemd (probably systemd-logind) on Linux to ask for the file descriptor for
the home dir. And argue with Lennart Poettering too (good luck).

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

torto...@gmail.com

unread,
Sep 10, 2017, 9:59:52 AM9/10/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com


On Sunday, 10 September 2017 00:58:13 UTC+1, Niall Douglas wrote:

That seems an unduly hostile response.

I re-read my reply. I saw no evidence of hostility. I did ask a set of motivating questions, and listed some of the good reasons it's not in the standard.
 
Sorry. I didn't mean to imply personally hostile. Just hostile to the idea. The security implications may be obvious to you but not to the OP or myself.
 
Just because a facility could be abused or subverted in some (I would argue unusual) circumstances doesn't seem a good reason to make it unnecessarily hard.

It's not hard to call getenv("HOME") if that's what you really want to do.

 Except that is not appropriate on non-unix systems like windows. A home_path() function can wrap that.
 
A common use case would be at the application level. E.g. cache some data in ~/.myappsrprefs when myapp is run by the user for the user. Is that really so dangerous?

Yes. Extremely.
 

To put it another way, why must you use something other than environment variables to find the home directory? and why is it so important that it be the "true" home directory rather than the obvious one as set by the environment?
The security implications don't seem that obvious. Perhaps documenting them would be sufficient?

Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.

Why?
Surely to use a relative path you have to have it relative to something. If no other environment is set why is the home directory not a sensible default?

torto...@gmail.com

unread,
Sep 10, 2017, 10:25:32 AM9/10/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com

On Sunday, 10 September 2017 00:58:13 UTC+1, Niall Douglas wrote:

Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.


Can you point to any links that would help explain this stance?

I can't find anything googling various combinations of "home directory" or "absolute path" with "dangerous" or "attack".

The only thing I'm aware of are "path traversal attacks". IMHO those tend to come from doing silly things with your directory permissions
and not limiting access to files that are not beneath a given root directory.
Some kind of sanitize path method (i.e. throw security exception if not a directory below X) might be a useful here.

Regards,

Bruce.

Niall Douglas

unread,
Sep 10, 2017, 7:57:19 PM9/10/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.


Can you point to any links that would help explain this stance?

The problem with the home directory is that it is (a) particularly amenable to TOCTOU attacks as by definition everything has write access to it and (b) when storing settings into it, it is unusually common to update more than one file at once. So this might be common:

Open "/home/ned/.config/mystuff/settings.mbx" and append stuff.
Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.

Looks safe right? It's a very common design pattern. Now think about this:

Process 1: Open "/home/ned/.config/mystuff/settings.mbx" and append stuff
Process 2: Rename  "/home/ned/.config/mystuff" to  "/home/ned/.config/myotherstuff" and rename  "/home/ned/.config/meow" to  "/home/ned/.config/mystuff"
Process 1: Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.

You've just corrupted mystuff's config, and lost the user their data.

This is why absolute paths are the spawn of satan and must be avoided in any correct code. They are only safe to use if you only ever touch a single file or directory. Otherwise your code is incorrect.


I can't find anything googling various combinations of "home directory" or "absolute path" with "dangerous" or "attack".

The inherent raciness of absolute path addressed filing systems was dealt with in 2006 or so by POSIX standardising the Solaris "race free filesystem" API extensions. Those are now available on all major platforms except Windows, though the NT kernel implements them.

There sadly remains a big gap between common programmer practice and correct file system usage. Much of it is ignorance, some of it is lack of good library support.

Niall

Nicol Bolas

unread,
Sep 10, 2017, 8:19:16 PM9/10/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com
On Sunday, September 10, 2017 at 7:57:19 PM UTC-4, Niall Douglas wrote:


Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.


Can you point to any links that would help explain this stance?

The problem with the home directory is that it is (a) particularly amenable to TOCTOU attacks as by definition everything has write access to it and (b) when storing settings into it, it is unusually common to update more than one file at once. So this might be common:

Open "/home/ned/.config/mystuff/settings.mbx" and append stuff.
Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.

Looks safe right? It's a very common design pattern. Now think about this:

Process 1: Open "/home/ned/.config/mystuff/settings.mbx" and append stuff
Process 2: Rename  "/home/ned/.config/mystuff" to  "/home/ned/.config/myotherstuff" and rename  "/home/ned/.config/meow" to  "/home/ned/.config/mystuff"
Process 1: Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.

You've just corrupted mystuff's config, and lost the user their data.

I fail to see how this problem is specific to using the home directory. This can happen when any two processes have concurrent access to the same directory. Which is essentially any two processes. The user can tell process 1 to save to some directory, and process 2 to change the name of a directory. If the save involves manipulating two files, you have an inter-process data race.

How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.

So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.

Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves. This is similar to how we have `filesystem::current_path`, which can set the current path. We don't like modifiable globals, but the current path is a real thing that really exists and real users need to be able to change, so if we don't allow people to set it, people will just use non-portable code to set it.

The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.

Niall Douglas

unread,
Sep 10, 2017, 8:24:58 PM9/10/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com


That seems an unduly hostile response.

I re-read my reply. I saw no evidence of hostility. I did ask a set of motivating questions, and listed some of the good reasons it's not in the standard.
 
Sorry. I didn't mean to imply personally hostile. Just hostile to the idea. The security implications may be obvious to you but not to the OP or myself.

Thanks for clarifying.

You're right that I hate anything returning an absolute path in the filesystem. I particularly take issue with std::filesystem::temp_directory_path() which claims to be "A directory suitable for temporary files. The path is guaranteed to exist and to be a directory" which is so underspecified as to approach uselessness:

1. Is it guaranteed to be writable? The standard doesn't actually say it must be.
2. Is it guaranteed to always exist? The standard only guarantees it exists at the time of check, not that it continues to exist.
3. Is it guaranteed that stuff written there doesn't vanish while I'm using it?
4. Is it guaranteed that if I write temp file A and then temp file B, that B will be placed alongside A in the same directory instance?
5. Does storage at that path count against the system paging file, or against the user's quota?
6. Does its contents persist across reboots?
7. Can others modify things I place there?
8. Can others see things I place there?
9. Is it safe to call from a signal handler and I need to dump crashlogs somewhere guaranteed writable?

And that's just off the top of my head. There are lots more.

I raised these issues with Beman informally at a C++ Now, and he agreed with almost all of it. The problem was that doing something about it would be worse than not doing something about it (or words to that effect).

AFIO does tackle a reasonable number of the more important of the above issues with the temp directory. But then I have the race free API to hand, and Beman did not, so I can do a lot better (Beman specifically excluded the development of a race free file i/o API as out of scope for the Filesystem TS. He said it was too hard, too controversial. I agree on both, the Filesystem TS needed to be shipped now, not in 2025, or later)


I haven't thought about it rigorously, but a similar critique could be applied to any std::filesystem::home_directory_path(). What you'll find is that developers think they want access to $HOME, but if they think a bit harder, what they really want is "What is the Downloads directory for this user?" Or "What is the configuration store for this user so I can store user-specific data?" Or "Where in the filesystem can I create temp files which count towards the user's quota/Where on the filesystem can I create temp files which do NOT count towards the user's quote?" And so on.

A decent standards proposal on this topic which I could support would not propose adding "std::filesystem::home_directory_path()", but would rather propose a set of APIs for discovering paths suitable for various common use cases. And without relying on environment variables to deduce any of that (e.g. look up the true home directory from /etc/passwd and getuid(), and from there deduce - with rigorous checking of permissions and retaining st_ino+st_dev unique identifiers for fast later verification - the various common use case paths).

Such a proposal I'd very strongly support as delivering significant added value. I think so would everyone else.

Niall

FrankHB1989

unread,
Sep 10, 2017, 8:57:09 PM9/10/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com


在 2017年9月9日星期六 UTC+8下午7:37:00,david.b...@gmail.com写道:
Is it %USERPROFILE%? Or %APPDATA%? Or even %LOCALAPPDATA%? Not sure which is the conventional counterpart to %HOME%.
And... What if the user sets %HOME% with a directory layout more like that in UNIX? Can it still be flexible enough to make things easy? I actually set %HOME% to be used for MSYS2 and several other UNIX-like environments (shared along with NTFS junctions and symlinks), which saves me a lot of work (as a system administrator). Not sure how to override the implementation with custom configurations after it is standardized, or I still have to roll my own.


torto...@gmail.com

unread,
Sep 11, 2017, 5:08:50 AM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com


On Monday, 11 September 2017 01:24:58 UTC+1, Niall Douglas wrote:


That seems an unduly hostile response.

I re-read my reply. I saw no evidence of hostility. I did ask a set of motivating questions, and listed some of the good reasons it's not in the standard.
 
Sorry. I didn't mean to imply personally hostile. Just hostile to the idea. The security implications may be obvious to you but not to the OP or myself.

Thanks for clarifying.

You're right that I hate anything returning an absolute path in the filesystem. I particularly take issue with std::filesystem::temp_directory_path() which claims to be "A directory suitable for temporary files. The path is guaranteed to exist and to be a directory" which is so underspecified as to approach uselessness:

That's an interesting list but I think you are mixing things here. People usually mean something very specific when you say a temporary directory.
I would not expect it to be persistant for example. I immediately think of Linux FHS /tmp.
Neither would I want to rely on it being wiped. I think of /var/run for that.
If you need specific properties you need to ask for them. I'd agree that just giving you a temporary directory covers only a small subset of the
possible use cases.
 
1. Is it guaranteed to be writable? The standard doesn't actually say it must be.

I would say that is a defect. It must be writeable or it is useless. 

2. Is it guaranteed to always exist? The standard only guarantees it exists at the time of check, not that it continues to exist.
3. Is it guaranteed that stuff written there doesn't vanish while I'm using it?

I guess these are arguments for returning an open file descriptor rather than a path.
There are still uses for the path though. A configuration dialog might have a slot for the path that should be used and populate that with the default.

I suppose you could encourage using handles first if you had a way of saying handle::get_path() which is the inverse of
what is normally available.
 
4. Is it guaranteed that if I write temp file A and then temp file B, that B will be placed alongside A in the same directory instance?

Do you need that guarantee from the "temp" API?
If you need it shouldn't you be creating a directory yourself inside the path returned? Its arguably neater that way anyhow.
 
5. Does storage at that path count against the system paging file, or against the user's quota?

So you mean is it in ram or on a persistent file-system


6. Does its contents persist across reboots?

Does this need to be separate from the above? You can have a persistent file-system that is wiped on boot which isn't in ram.
I guess that could be important to know for security reasons.
 
7. Can others modify things I place there?
8. Can others see things I place there?

Isn't this an argument for using a directory relative to the users home directory versus a shared temporary directory?

The interesting case is a private temporary directory that is not persistent.
 
9. Is it safe to call from a signal handler and I need to dump crashlogs somewhere guaranteed writable?

You can never have that. If the file-system is full all bets are off.
 
And that's just off the top of my head. There are lots more.

I raised these issues with Beman informally at a C++ Now, and he agreed with almost all of it. The problem was that doing something about it would be worse than not doing something about it (or words to that effect).

AFIO does tackle a reasonable number of the more important of the above issues with the temp directory. But then I have the race free API to hand, and Beman did not, so I can do a lot better (Beman specifically excluded the development of a race free file i/o API as out of scope for the Filesystem TS. He said it was too hard, too controversial. I agree on both, the Filesystem TS needed to be shipped now, not in 2025, or later)

I haven't thought about it rigorously, but a similar critique could be applied to any std::filesystem::home_directory_path(). What you'll find is that developers think they want access to $HOME, but if they think a bit harder, what they really want is "What is the Downloads directory for this user?" Or "What is the configuration store for this user so I can store user-specific data?" Or "Where in the filesystem can I create temp files which count towards the user's quota/Where on the filesystem can I create temp files which do NOT count towards the user's quote?" And so on.

A decent standards proposal on this topic which I could support would not propose adding "std::filesystem::home_directory_path()", but would rather propose a set of APIs for discovering paths suitable for various common use cases. And without relying on environment variables to deduce any of that (e.g. look up the true home directory from /etc/passwd and getuid(), and from there deduce - with rigorous checking of permissions and retaining st_ino+st_dev unique identifiers for fast later verification - the various common use case paths).

Such a proposal I'd very strongly support as delivering significant added value. I think so would everyone else.

Niall

 
I sounds like what we want is APIs where you can specify the required properties of the 'special' directories. That does sound generally useful.
Its probably must easier for temp than for home.
Can't we do that via a layered approach? v1 builds on imperfect APIs like home_directory(), temporary_files_directory() and adds local knowledge FHS and XDG on Linux and the
equivalent for other OSs.


torto...@gmail.com

unread,
Sep 11, 2017, 5:42:51 AM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


On Monday, 11 September 2017 01:19:16 UTC+1, Nicol Bolas wrote:
On Sunday, September 10, 2017 at 7:57:19 PM UTC-4, Niall Douglas wrote:


Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.


Can you point to any links that would help explain this stance?

The problem with the home directory is that it is (a) particularly amenable to TOCTOU attacks as by definition everything has write access to it and (b) when storing settings into it, it is unusually common to update more than one file at once. So this might be common:

Open "/home/ned/.config/mystuff/settings.mbx" and append stuff.
Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.

Looks safe right? It's a very common design pattern. Now think about this:

Process 1: Open "/home/ned/.config/mystuff/settings.mbx" and append stuff
Process 2: Rename  "/home/ned/.config/mystuff" to  "/home/ned/.config/myotherstuff" and rename  "/home/ned/.config/meow" to  "/home/ned/.config/mystuff"
Process 1: Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.

You've just corrupted mystuff's config, and lost the user their data.

I fail to see how this problem is specific to using the home directory. This can happen when any two processes have concurrent access to the same directory. Which is essentially any two processes. The user can tell process 1 to save to some directory, and process 2 to change the name of a directory. If the save involves manipulating two files, you have an inter-process data race.

How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.

So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.

Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves. This is similar to how we have `filesystem::current_path`, which can set the current path. We don't like modifiable globals, but the current path is a real thing that really exists and real users need to be able to change, so if we don't allow people to set it, people will just use non-portable code to set it.

The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.

+1
This is why absolute paths are the spawn of satan and must be avoided in any correct code. They are only safe to use if you only ever touch a single file or directory. Otherwise your code is incorrect.


I can't find anything googling various combinations of "home directory" or "absolute path" with "dangerous" or "attack".

The inherent raciness of absolute path addressed filing systems was dealt with in 2006 or so by POSIX standardising the Solaris "race free filesystem" API extensions. Those are now available on all major platforms except Windows, though the NT kernel implements them.

There sadly remains a big gap between common programmer practice and correct file system usage. Much of it is ignorance, some of it is lack of good library support.

Niall


I tried to locate the "race free filesystem" API extensions you speak of.
I think you might mean the XXXXat() calls mentioned here:


They certainly speak (E.g. man openat) of absolute and relative paths and in this context using an absolute path means not using a file descriptor
whereas relative paths are relative to the file descriptor.

> The motivation for the introduction of this set of interfaces is as
> follows:
> * Interfaces taking a path name are limited by the maximum length of
> a path name(_SC_PATH_MAX). The absolute path of files can far exceed > this length. The current solution would be to change the working > directory and use relative path names. This is not thread-safe. > > * A second motivation is that files accessed outside the current > working directory are subject to attacks caused by the race condition > created by change any of the elements of the path names used. > > * A third motivation is to allow implementing code which uses a > virtual current working directory for each individual thread. In > the current model there is only one current working directory for > all threads.


I think I begin to see where you are coming from. 
When you say relative path you actually mean file descriptor + relative path.

I can see that its quite counter intuitive to think of a path in terms of an open file descriptor+a string, rather than a string.

std::experimental::filesystem::path is just the persistable string version.
So presumably we need an equivalent or an extension that can use a file descriptor + relative path?

To be truly safe wouldn't you have to have an open file descriptor for every path element you might re-use, not just the root element?

Is the filesystem TS expected to use openat() et al anywhere under the hood at present?

Niall Douglas

unread,
Sep 11, 2017, 10:25:48 AM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.

So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.

Nobody is denying anyone access.

What I am saying is that the standard should not encourage obviously suboptimal design patterns when with just a little bit of extra thought, we can enable ideal design patterns.
 

Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.

This is not an argument. People can write random garbage all over memory all by themselves. It doesn't mean we should encourage it.
 
This is similar to how we have `filesystem::current_path`, which can set the current path. We don't like modifiable globals, but the current path is a real thing that really exists and real users need to be able to change, so if we don't allow people to set it, people will just use non-portable code to set it.

That's a bit different again. Being able to set the current path has quite a list of highly beneficial, code improving, consequences. It's not as mostly problematic as the current temp_directory_path() specification.
 

The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.

Assuming the existence of a home directory at all is highly unwise. Daemon code won't have one for example.

The point is that if you do the design right, you don't have to even have the concept of a home directory at all. Need somewhere to write settings? Ask for exactly that.

Sometime in the 2020s I intend to propose a simple transactional key-value store for standardisation into C++. One of those could be used to implement a reliable, transactions-based user settings store. But that's a long way out yet.

Niall

Matthew Woehlke

unread,
Sep 11, 2017, 10:34:55 AM9/11/17
to std-pr...@isocpp.org, torto...@gmail.com, david.b...@gmail.com
On 2017-09-09 19:12, torto...@gmail.com wrote:
> A common use case would be at the application level. E.g. cache some data
> in ~/.myappsrprefs when myapp is run by the user for the user. Is that
> really so dangerous?

There is an XDG spec for this sort of thing. Should C++ adopt that as well?

Maybe it would be good to have a modern (i.e. written atop the FS API),
light-weight, cross-platform XDG library. Write that first, then if it
seems to work out, possibly propose it for standardization.

> On Saturday, 9 September 2017 23:57:31 UTC+1, Niall Douglas wrote:
>> And what happens if the environment has been zopped? [...]
>> Untrusted processes are frequently run with a sanitised environment
>> under a user id which has no home.

...then the application is doomed. (Well, the API should return an
invalid path, because there is nothing else it can do, and the
application needs to be able to deal with that. An application that
wants to use the home directory in such instance is no worse off for
having a standard API to query the home directory.)

>> What happens if someone uses TOCTOU to do timing attacks to swap the home
>> directory for another in the middle of program execution?

What happens *today*? Having a standard API to do something that
*programs already do* does not introduce any *new* security issues.

--
Matthew

Matthew Woehlke

unread,
Sep 11, 2017, 10:46:46 AM9/11/17
to std-pr...@isocpp.org, Niall Douglas, david.b...@gmail.com
On 2017-09-10 20:24, Niall Douglas wrote:
> What you'll find is that developers *think* they want access to
> $HOME, but if they think a bit harder, what they really want is "What
> is the Downloads directory for this user?" Or "What is the
> configuration store for this user so I can store user-specific data?"
> Or "Where in the filesystem can I create temp files which count
> towards the user's quota/Where on the filesystem can I create temp
> files which do NOT count towards the user's quote?" And so on.
Right. And... we already have a spec for that...

> A decent standards proposal on this topic which I could support would not
> propose adding "std::filesystem::home_directory_path()", but would rather
> propose a set of APIs for discovering paths suitable for various common use
> cases. And without relying on environment variables to deduce any of that

...except the reason those environment variables *exist* is so the user
can *change those paths*. Any solution that does *not* honor the
environment variables is unacceptable in my book.

Now... you did say *rely on*. The API certainly ought to work if the
environment variables aren't set, but it should respect them if they are...

--
Matthew

Nicol Bolas

unread,
Sep 11, 2017, 11:26:51 AM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com
On Monday, September 11, 2017 at 10:25:48 AM UTC-4, Niall Douglas wrote:
How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.

So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.

Nobody is denying anyone access.

When you say that the standard shouldn't let people have a cross-platform way to access a cross-platform concept, that's denying access.

What I am saying is that the standard should not encourage obviously suboptimal design patterns when with just a little bit of extra thought, we can enable ideal design patterns.

What you've said does not match your argument. Your overall arguments have been:

1: People may use the home directory incorrectly.

2: The home directory is globally accessible and therefore shouldn't be used for things.

And the closest thing to a better design you've suggested is to provide a plethora of directories, none of them explicitly called "home". That doesn't solve either of those problems, since the user could still use it for the wrong thing and the directories can still be manipulated by the user.

If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".

Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.

This is not an argument.

Um, yes it is. Standards are supposed to standardize existing practice. Home directories are existing practice. As such, they are legitimate candidates for standardization. There is a genuine need for home directory access, and if we don't provide it, someone else will.

I get that you don't like the home directory. But it is existing practice. And that makes it viable for standardization, despite your dislike.
 
People can write random garbage all over memory all by themselves. It doesn't mean we should encourage it.
 
This is similar to how we have `filesystem::current_path`, which can set the current path. We don't like modifiable globals, but the current path is a real thing that really exists and real users need to be able to change, so if we don't allow people to set it, people will just use non-portable code to set it.

That's a bit different again. Being able to set the current path has quite a list of highly beneficial, code improving, consequences. It's not as mostly problematic as the current temp_directory_path() specification.

And like any mutable global state, it has quite a list of highly detrimental, code damaging consequences too. My point is that there are many entirely reasonable arguments against it, yet we have it because the current path is a real thing that real filesystems provide and that real users need to be able to use.

The same is true of the temporary path and the home directory path. You can argue that they can be misused, but you can also argue that they exist and are useful features that users can make good use of.

The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.

Assuming the existence of a home directory at all is highly unwise. Daemon code won't have one for example.

Then make the function return an `optional<path>`, for cases where such a path does not exist. This is not a good enough reason to say no to the whole concept of getting a home directory.

The point is that if you do the design right, you don't have to even have the concept of a home directory at all.

But the system has the "concept of a home directory". Why should I not be able to ask it what that directory is?
 
Need somewhere to write settings? Ask for exactly that.

Personally, I find that to be far more dangerous. That suggests some global directory for settings that every application can write to, where you're expected to create a unique directory for your particular settings or some such.

Also, I have no idea why you frequently conflate "ask for home directory" with "where you write your settings".

Niall Douglas

unread,
Sep 11, 2017, 11:27:33 AM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com

You're right that I hate anything returning an absolute path in the filesystem. I particularly take issue with std::filesystem::temp_directory_path() which claims to be "A directory suitable for temporary files. The path is guaranteed to exist and to be a directory" which is so underspecified as to approach uselessness:

That's an interesting list but I think you are mixing things here. People usually mean something very specific when you say a temporary directory.

Actually it's the other way round. People have very fuzzy, often mutually contradictory notions of what a temp directory is for. Those conflating notions have led to the poor, unreliable and unsafe engineering we have today surrounding /tmp.

It's not hard to significantly improve on this ambiguity. AFIO has for example, there if you need a temporary anonymous inode inaccessible to any other process, you ask for exactly that and you get exactly that. If you need some random path which can be safely handed off to other processes for IPC, again you can ask for exactly that and get exactly that.

And needless to say, AFIO's temp directory retrieval function makes a cast iron guarantee that the directory returned is writable. It can also be called from signal handlers when you need to dump a crash log.

None of this hard. It just requires a bit of investment of thought, and careful design.
 
I would not expect it to be persistant for example. I immediately think of Linux FHS /tmp.
Neither would I want to rely on it being wiped. I think of /var/run for that.

You might assume this. But your colleague sitting next to you might assume other things.

Standards are at their worst when they let people assume properties which are not constant. Standards need to be specific where there is variability in the real world.
 
  
1. Is it guaranteed to be writable? The standard doesn't actually say it must be.

I would say that is a defect. It must be writeable or it is useless. 

Well that depends. Another programmer might take the view that if /tmp is not writable, then we know that we are running in a recovery situation.
 

2. Is it guaranteed to always exist? The standard only guarantees it exists at the time of check, not that it continues to exist.
3. Is it guaranteed that stuff written there doesn't vanish while I'm using it?

I guess these are arguments for returning an open file descriptor rather than a path.
There are still uses for the path though. A configuration dialog might have a slot for the path that should be used and populate that with the default.

I'd live with returning a path if it comes with a post-verifiable st_ino + st_dev unique identifier. Secure code can then de-TOCTOU usage efficiently.

Returning an open file descriptor is better again of course. But you can't graft that into the Filesystem TS. That's what AFIO, hopefully the future File I/O TS, is for.
 

I suppose you could encourage using handles first if you had a way of saying handle::get_path() which is the inverse of
what is normally available.

AFIO provides handle::current_path() which mostly works most of the time.
 
 
4. Is it guaranteed that if I write temp file A and then temp file B, that B will be placed alongside A in the same directory instance?

Do you need that guarantee from the "temp" API?
If you need it shouldn't you be creating a directory yourself inside the path returned? Its arguably neater that way anyhow.

Spot on.

If you need an anonymous directory which nobody else can see, there should be an API for getting that too.
 
 
5. Does storage at that path count against the system paging file, or against the user's quota?

So you mean is it in ram or on a persistent file-system

Yes. Linux tmpfs is billed against the page file. On every other OS, /tmp is billed against the user's quota.

This stuff is important when you're creating a temporary 4Tb file for a sparse array. Which AFIO makes as easy to do as writing std::vector<int> array(1024ULL*1024*1024*1024);
 

6. Does its contents persist across reboots?

Does this need to be separate from the above? You can have a persistent file-system that is wiped on boot which isn't in ram.
I guess that could be important to know for security reasons.

It's more that it's important to know for dumping emergency failure info. The use of the temp directory for this is particularly common on Windows.
 

 
7. Can others modify things I place there?
8. Can others see things I place there?

Isn't this an argument for using a directory relative to the users home directory versus a shared temporary directory?

The interesting case is a private temporary directory that is not persistent.

How it is implemented is specific to each platform. Each has its own conventions.

The key part is that the C++ standard should let users ask for what they want, and not care what the platform-local convention is (though it can be introspected if needs be).
 
 
9. Is it safe to call from a signal handler and I need to dump crashlogs somewhere guaranteed writable?

You can never have that. If the file-system is full all bets are off.

It's more that as currently specified in the standard, std::filesystem::temp_directory_path() cannot be called from a signal handler. Which is unfortunate.
 
 
I sounds like what we want is APIs where you can specify the required properties of the 'special' directories. That does sound generally useful.
Its probably must easier for temp than for home.

It's actually more just a difference of qualitative factors. For example, you might ask for the user's preferred download directory.
 
Can't we do that via a layered approach? v1 builds on imperfect APIs like home_directory(), temporary_files_directory() and adds local knowledge FHS and XDG on Linux and the
equivalent for other OSs.

I am opposed to any API which returns the user's home directory per se as it will encourage sloppy programming, and adds no value over the user just calling getenv by hand which is trivially easy (even with wgetenv on Windows).

I would support an API which lets the user specify exactly what they are looking for in terms of properties requested. That adds significant value by encoding into the standard library a lot of tedious and messy platform specific logic and heuristics so end users no longer have to.

Niall

Niall Douglas

unread,
Sep 11, 2017, 11:30:35 AM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com

So presumably we need an equivalent or an extension that can use a file descriptor + relative path?

 

To be truly safe wouldn't you have to have an open file descriptor for every path element you might re-use, not just the root element?

AFIO actually encodes that into a suite of generic reusable templated algorithms. Or at least, it will some more yet.
 

Is the filesystem TS expected to use openat() et al anywhere under the hood at present?


The Filesystem TS explicitly made race free support out of scope in order to get it past WG21 in a reasonable time period.

An implementation may internally of course use the race free API if it wishes.

Niall 

Matthew Woehlke

unread,
Sep 11, 2017, 11:47:17 AM9/11/17
to std-pr...@isocpp.org, Nicol Bolas, david.b...@gmail.com, torto...@gmail.com
Well... yes, that's pretty much what $XDG_CONFIG_HOME is... Or
$WINHOME/LocalData...

OTOH, compare QSettings, which takes application name and organization
name as inputs and handles the "create a unique directory" for you. I
believe it still respects XDG_CONFIG_HOME, though.

It's certainly plausible, however, that this set of 'get standard path'
API's would somehow consume information about the application's
identity. (In Qt, this happens via global state, but an STL API might
rather have the functions take it as direct parameters.)

--
Matthew

Matthew Woehlke

unread,
Sep 11, 2017, 12:08:31 PM9/11/17
to std-pr...@isocpp.org, Niall Douglas, david.b...@gmail.com
On 2017-09-11 11:27, Niall Douglas wrote:
> Linux tmpfs is billed against the page file. On every other OS, /tmp
> is billed against the user's quota.

Note: tmpfs != /tmp. Yes, it's become something of a *convention* that
most Linux distros use tmpfs for /tmp, but there is no such
*requirement*. Neither that Linux *does* use tmpfs for /tmp, nor that
other OS's *don't* (assuming they have tmpfs or an equivalent).

> I am opposed to any API which returns the user's home directory per se as
> it will encourage sloppy programming, and adds no value over the user just
> calling getenv by hand which is trivially easy (even with wgetenv on
> Windows).

Really? Even in the trivial case, I have to write this function myself
(about 10 LOC) if I want to be able to use the result directly in an
expression. Even if I'm just assigning it to a temporary variable,
that's 5, probably 6 LOC (remember, on Windows, it's not just one, but
*two* environment variables), plus if I'm "being good" and using
_wgetenv, I'm either doing string conversion or dealing with different
string types depending on platform.

Compared to a canned API that does this all for me...

Now, add to that that a *good* API for this won't just crash if $HOME is
unset. Now, in addition to our already-non-trivial function, we have to
add at least minimal error handling. A *good* implementation might even
have the option to derive the home directory in such case from the
process UID and system data (i.e. /etc/passwd).

Doesn't sound "trivially easy" to me...

--
Matthew

Nicol Bolas

unread,
Sep 11, 2017, 12:22:17 PM9/11/17
to ISO C++ Standard - Future Proposals, jmck...@gmail.com, david.b...@gmail.com, torto...@gmail.com
This sounds suspiciously like setting policy for various platforms. If there's no consensus on such directories, then it can't be standardized. And Qt by itself cannot be taken as "consensus".

Thiago Macieira

unread,
Sep 11, 2017, 12:52:45 PM9/11/17
to std-pr...@isocpp.org
On Sunday, 10 September 2017 19:24:58 CDT Niall Douglas wrote:
> 1. Is it guaranteed to be *writable*? The standard doesn't actually say it
> must be.
> 2. Is it guaranteed to *always* exist? The standard only guarantees it
> exists at the time of check, not that it continues to exist.
> 3. Is it guaranteed that stuff written there doesn't vanish while I'm using
> it?
> 4. Is it guaranteed that if I write temp file A and then temp file B, that
> B will be placed alongside A in the same directory instance?
> 5. Does storage at that path count against the system paging file, or
> against the user's quota?
> 6. Does its contents persist across reboots?
> 7. Can others modify things I place there?
> 8. Can others see things I place there?
> 9. Is it safe to call from a signal handler and I need to dump crashlogs
> somewhere guaranteed writable?

10. Is it capable of having Unix sockets or FIFOs created there?
11. Is it capable of reducing permissions on individual files and directories
from world-readable and world-executable?

Niall Douglas

unread,
Sep 11, 2017, 4:19:05 PM9/11/17
to ISO C++ Standard - Future Proposals, torto...@gmail.com, david.b...@gmail.com
On Monday, September 11, 2017 at 3:34:55 PM UTC+1, Matthew Woehlke wrote:
On 2017-09-09 19:12, torto...@gmail.com wrote:
> A common use case would be at the application level. E.g. cache some data
> in ~/.myappsrprefs when myapp is run by the user for the user. Is that
> really so dangerous?

There is an XDG spec for this sort of thing. Should C++ adopt that as well?

Maybe it would be good to have a modern (i.e. written atop the FS API),
light-weight, cross-platform XDG library. Write that first, then if it
seems to work out, possibly propose it for standardization.

A fusion of Windows' Shell Folders spec and XDG and whatever Apple does would be a reasonable starting point. And throw in Android's folder discovery API too.
 

> On Saturday, 9 September 2017 23:57:31 UTC+1, Niall Douglas wrote:
>> And what happens if the environment has been zopped? [...]
>> Untrusted processes are frequently run with a sanitised environment
>> under a user id which has no home.

...then the application is doomed. (Well, the API should return an
invalid path, because there is nothing else it can do, and the
application needs to be able to deal with that. An application that
wants to use the home directory in such instance is no worse off for
having a standard API to query the home directory.)

This viewpoint would stem solely from an incorrect design.

If I ask for somewhere to store files which will be made available to me next time I am run, that is independent on environment variables.
 

>> What happens if someone uses TOCTOU to do timing attacks to swap the home
>> directory for another in the middle of program execution?

What happens *today*? Having a standard API to do something that
*programs already do* does not introduce any *new* security issues.

It perpetuates bad practice.

Niall

Niall Douglas

unread,
Sep 11, 2017, 4:22:05 PM9/11/17
to ISO C++ Standard - Future Proposals, nialldo...@gmail.com, david.b...@gmail.com

Now... you did say *rely on*. The API certainly ought to work if the
environment variables aren't set, but it should respect them if they are...

The API I had in my head would return a ranked list of options available, along with verification metadata for the consumer to check them against if security is needed.

I was thinking of a static giant struct of string/path views in fact. Signal handler safe, and storage can be reused.

Niall

Niall Douglas

unread,
Sep 11, 2017, 4:37:22 PM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com
On Monday, September 11, 2017 at 4:26:51 PM UTC+1, Nicol Bolas wrote:
On Monday, September 11, 2017 at 10:25:48 AM UTC-4, Niall Douglas wrote:
How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.

So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.

Nobody is denying anyone access.

When you say that the standard shouldn't let people have a cross-platform way to access a cross-platform concept, that's denying access.

No, it isn't. This is not the first time you've put words in my mouth and claimed I said things I did not.
 

What I am saying is that the standard should not encourage obviously suboptimal design patterns when with just a little bit of extra thought, we can enable ideal design patterns.

What you've said does not match your argument. Your overall arguments have been:

1: People may use the home directory incorrectly.

2: The home directory is globally accessible and therefore shouldn't be used for things.

Where did I ever say it shouldn't be used for things?

I said it ought to be used correctly. That's very different.
 

And the closest thing to a better design you've suggested is to provide a plethora of directories, none of them explicitly called "home". That doesn't solve either of those problems, since the user could still use it for the wrong thing and the directories can still be manipulated by the user.

If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".

There is no such thing as home.
 

Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.

This is not an argument.

Um, yes it is. Standards are supposed to standardize existing practice. Home directories are existing practice. As such, they are legitimate candidates for standardization. There is a genuine need for home directory access, and if we don't provide it, someone else will.

I get that you don't like the home directory. But it is existing practice. And that makes it viable for standardization, despite your dislike.

Home directories are not standard practice. What is standard practice is some place on the filesystem where the user running the login process which launched the running process is permitted to read and write. Which could be /tmp, which is a legal $HOME value.

That's far short of what you're claiming. Specifically that the home directory is a property of the logged in user, and NOT that of the user running some process as you appear to think.

If you don't believe me, go look it up in the POSIX spec. And be aware Windows follows the exact same principle as well. If you are temporarily acting as a different user, you may, or may not, see "home" differently depending on a wide range of factors impossible to standardise.

This is part of why $HOME cannot be trusted. Or any environment variable.
 

The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.

Assuming the existence of a home directory at all is highly unwise. Daemon code won't have one for example.

Then make the function return an `optional<path>`, for cases where such a path does not exist. This is not a good enough reason to say no to the whole concept of getting a home directory.

The concept of there being a "home" directory is severely flawed. You cannot get what does not exist.

Instead ask for a path with a specified list of desired properties.

Niall

Niall Douglas

unread,
Sep 11, 2017, 4:41:40 PM9/11/17
to ISO C++ Standard - Future Proposals
On Monday, September 11, 2017 at 5:52:45 PM UTC+1, Thiago Macieira wrote:
On Sunday, 10 September 2017 19:24:58 CDT Niall Douglas wrote:
> 1. Is it guaranteed to be *writable*? The standard doesn't actually say it
> must be.
> 2. Is it guaranteed to *always* exist? The standard only guarantees it
> exists at the time of check, not that it continues to exist.
> 3. Is it guaranteed that stuff written there doesn't vanish while I'm using
> it?
> 4. Is it guaranteed that if I write temp file A and then temp file B, that
> B will be placed alongside A in the same directory instance?
> 5. Does storage at that path count against the system paging file, or
> against the user's quota?
> 6. Does its contents persist across reboots?
> 7. Can others modify things I place there?
> 8. Can others see things I place there?
> 9. Is it safe to call from a signal handler and I need to dump crashlogs
> somewhere guaranteed writable?

10. Is it capable of having Unix sockets or FIFOs created there?
11. Is it capable of reducing permissions on individual files and directories
from world-readable and world-executable?

Yes, very good point. Windows locates fifos in the \\.\ namespace, plus it has Local, Global and Session variants of that namespace.

There is great opportunity for standardisation of discovery here. Though I still find it a shame that they excised pipe support from the Networking TS :(

Niall

Matthew Woehlke

unread,
Sep 11, 2017, 5:10:58 PM9/11/17
to std-pr...@isocpp.org, Niall Douglas, david.b...@gmail.com, torto...@gmail.com
On 2017-09-11 16:37, Niall Douglas wrote:
> The concept of there being a "home" directory is severely flawed. You
> cannot get what does not exist.
>
> Instead ask for a path with a specified list of desired properties.

What am I supposed to do when I want the path to what the user considers
to be their "home" directory? Because... that's a thing. See just about
any file dialog...

--
Matthew

Hyman Rosen

unread,
Sep 11, 2017, 5:52:29 PM9/11/17
to std-pr...@isocpp.org, Niall Douglas, david.b...@gmail.com, torto...@gmail.com
In Unix, the various flavors of getpwnam and getpwuid return a pointer to a struct passwd whose pw_dir field is the user's home directory.  In Windows, the GetUserProfileDirectory function is (at least a close approximation to) the user's home directory.  So on those systems (which are the vast majority of all systems that have filesystems and users) there is a concept of home directory without reference to environment variables.

Nicol Bolas

unread,
Sep 11, 2017, 6:34:03 PM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com
On Monday, September 11, 2017 at 4:37:22 PM UTC-4, Niall Douglas wrote:
On Monday, September 11, 2017 at 4:26:51 PM UTC+1, Nicol Bolas wrote:
On Monday, September 11, 2017 at 10:25:48 AM UTC-4, Niall Douglas wrote:
How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.

So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.

Nobody is denying anyone access.

When you say that the standard shouldn't let people have a cross-platform way to access a cross-platform concept, that's denying access.

No, it isn't. This is not the first time you've put words in my mouth and claimed I said things I did not.

If my understanding of your argument is wrong, what is your argument? Why do you think this is a bad idea? Lay out the whole thing in a single post. You've provided what seems to be several different lines of argument thus far (security concerns, absolute paths being bad, etc), and you seem to think that my summary doesn't adequately describe them.

So please do so.

 

What I am saying is that the standard should not encourage obviously suboptimal design patterns when with just a little bit of extra thought, we can enable ideal design patterns.

What you've said does not match your argument. Your overall arguments have been:

1: People may use the home directory incorrectly.

2: The home directory is globally accessible and therefore shouldn't be used for things.

Where did I ever say it shouldn't be used for things?

Well, let's see:

> The problem with the home directory is that it is (a) particularly amenable to TOCTOU attacks as by definition everything has write access to it and (b) when storing settings into it, it is unusually common to update more than one file at once.

That seems to be strongly indicative of a "don't use this" standpoint.

I said it ought to be used correctly. That's very different.

> This is why absolute paths are the spawn of satan and must be avoided in any correct code. They are only safe to use if you only ever touch a single file or directory. Otherwise your code is incorrect.

To me, that reads "People should not use this." Maybe you meant it as something less firm, but when you start throwing terms around like "spawn of satan", it's hard not to get the impression that you think it's a good idea to ever use such a feature.

And the closest thing to a better design you've suggested is to provide a plethora of directories, none of them explicitly called "home". That doesn't solve either of those problems, since the user could still use it for the wrong thing and the directories can still be manipulated by the user.

If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".

There is no such thing as home.

I'm not a filesystem expert at all. But other people seem to disagree with you.

Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.

This is not an argument.

Um, yes it is. Standards are supposed to standardize existing practice. Home directories are existing practice. As such, they are legitimate candidates for standardization. There is a genuine need for home directory access, and if we don't provide it, someone else will.

I get that you don't like the home directory. But it is existing practice. And that makes it viable for standardization, despite your dislike.

Home directories are not standard practice. What is standard practice is some place on the filesystem where the user running the login process which launched the running process is permitted to read and write. Which could be /tmp, which is a legal $HOME value.

That's far short of what you're claiming. Specifically that the home directory is a property of the logged in user, and NOT that of the user running some process as you appear to think.

I don't recognize that as a significant distinction. Nor do users who use applications. As far as users are concerned, they have a home directory. Whether that comes about through the login process or something else is utterly irrelevant to them.

The concept exists for users, and therefore it must exist for writers of applications too.


If you don't believe me, go look it up in the POSIX spec. And be aware Windows follows the exact same principle as well. If you are temporarily acting as a different user, you may, or may not, see "home" differently depending on a wide range of factors impossible to standardise.

Just as with your `temp_directory_path` examples, you're overthinking this. It doesn't matter if "temporarily acting as a different user" causes the home directory to shift or if it doesn't. It doesn't matter if it is writable or not writable.

What matters is that, if you run the main file manager and it pops up the home directory, that you can run another program and have it pop up that same directory.

This is part of why $HOME cannot be trusted. Or any environment variable.
 

The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.

Assuming the existence of a home directory at all is highly unwise. Daemon code won't have one for example.

Then make the function return an `optional<path>`, for cases where such a path does not exist. This is not a good enough reason to say no to the whole concept of getting a home directory.

The concept of there being a "home" directory is severely flawed. You cannot get what does not exist.

Instead ask for a path with a specified list of desired properties.

OK, what set of "desired properties" will give me the home directory? If there isn't a set of properties that would give that to me, then that is a poorer API than just asking for the home directory.

Remember: the home directory is where lots of users keep their stuff. Being able to find it is kind of important. If you're writing a program that loads files for some purpose, the user will want to start looking for stuff in their home directory.

We're not going to give users a dropdown of "desired properties" to pick directories from.

I've seen APIs like you're describing before. They remind me of the usage hints in `glBufferData`. You provide a general idea of how you're going to use the memory, and the implementation would decide where to allocate that storage from.

This idea turned out so bad that OpenGL implementations would in many cases flat-out ignore the usage hints altogether.

Now granted, that's probably not going to be the outcome here. But I just don't see the point of having to play guessing games with properties to figure out which combination spells "home directory".

Thiago Macieira

unread,
Sep 11, 2017, 7:30:18 PM9/11/17
to std-pr...@isocpp.org
On Monday, 11 September 2017 09:22:17 PDT Nicol Bolas wrote:
> > OTOH, compare QSettings, which takes application name and organization
> > name as inputs and handles the "create a unique directory" for you. I
> > believe it still respects XDG_CONFIG_HOME, though.
> >
> > It's certainly plausible, however, that this set of 'get standard path'
> > API's would somehow consume information about the application's
> > identity. (In Qt, this happens via global state, but an STL API might
> > rather have the functions take it as direct parameters.)
>
> This sounds suspiciously like setting policy for various platforms. If
> there's no consensus on such directories, then it can't be standardized.
> And Qt by itself cannot be taken as "consensus".

True, Qt's scope is more limited than that of the Standard, which means it can
find consensus on the platforms it runs on.

Thiago Macieira

unread,
Sep 11, 2017, 7:40:08 PM9/11/17
to std-pr...@isocpp.org
On Monday, 11 September 2017 13:19:05 PDT Niall Douglas wrote:
> A fusion of Windows' Shell Folders spec and XDG and whatever Apple does
> would be a reasonable starting point. And throw in Android's folder
> discovery API too.

See http://doc.qt.io/qt-5.6/qstandardpaths.html#StandardLocation-enum. That's
Shell Folders, XDG, Android, Blackberry 10, and both macOS and iOS. The 4.8
docs will also list Symbian.

There are a couple of things we had to make up so the full set was available
on all platforms. That is to say, we did not settle for lowest common
denominator.

torto...@gmail.com

unread,
Sep 11, 2017, 7:48:13 PM9/11/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


On Monday, 11 September 2017 21:37:22 UTC+1, Niall Douglas wrote:
On Monday, September 11, 2017 at 4:26:51 PM UTC+1, Nicol Bolas wrote:
If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".

There is no such thing as home.

You really missed an opportunity to day "there is no place like home" there.  :)

Niall Douglas

unread,
Sep 12, 2017, 11:33:45 AM9/12/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


If my understanding of your argument is wrong, what is your argument? Why do you think this is a bad idea? Lay out the whole thing in a single post. You've provided what seems to be several different lines of argument thus far (security concerns, absolute paths being bad, etc), and you seem to think that my summary doesn't adequately describe them.

So please do so.

The reason there are multiple strands of argument involved is because this thread has split into such. Most appear to understand that has happened and don't find it a problem. But to spell it out:

1. There is the "big picture" argument regarding good engineering (use absolute paths as sparingly as possible, fds to mark filesystem locations wherever possible) vs what's currently forced on us by lack of API support.

2. There is the "no such thing as home" argument where people have incorrect beliefs in what a "home directory" means, belongs to, and is, and far overestimate how common practice it is given all the embedded C++ devices out there. Moreover, historically root processes saw '/' as their "home" directory, and you can't not return '/' if a root process asks for its "home". Which almost certainly will result in very bad, unexpected, consequences.

3. There is "how do we calculate this" argument where, even though it's not been spelled out yet, I'm arguing for a runtime problng solution which fires once per process lifetime such that strong guarantees can be given to the paths returned as regard their properties.

And there are a few more sub-strands again. And I think most people get this.
 

To me, that reads "People should not use this." Maybe you meant it as something less firm, but when you start throwing terms around like "spawn of satan", it's hard not to get the impression that you think it's a good idea to ever use such a feature.

Most currently written code using the filesystem is incorrect.

Whether people have the choice to write actually correct code is the central problem. Right now you cannot say: create file A and guaranteed sibling file B and both files must exist as siblings or neither. That forces people to write power loss recovery code which is usually buggy, certainly insufficiently tested. Most don't bother, and just allow incorrectness and corruption of persistent state to occur. And when that becomes a problem, they just fire everything into SQLite and call it "solved".

I aim to do something about that eventually, but Rome wasn't built in a day. I only properly work on this stuff when I'm out of contract. It's slow going.
 

There is no such thing as home.

I'm not a filesystem expert at all. But other people seem to disagree with you.

They are mistaken.

What a logged in user sees on the major desktop and mobile OSs is not what other code sees. I really wish Plan 9 replaced Unix as it should have done. They properly formalised this stuff.
 

OK, what set of "desired properties" will give me the home directory? If there isn't a set of properties that would give that to me, then that is a poorer API than just asking for the home directory.

A list of properties has been given already. I agree with Thiago that Qt is well ahead of everyone on the same topic, and starting from what they've done looks to be a great idea.
 

I've seen APIs like you're describing before. They remind me of the usage hints in `glBufferData`. You provide a general idea of how you're going to use the memory, and the implementation would decide where to allocate that storage from.

This idea turned out so bad that OpenGL implementations would in many cases flat-out ignore the usage hints altogether.

Another incommensurate argument.

When OpenGL standardised those hints, they were appropriate for the hardware existing a few years beforehand. As the hardware has evolved, those hints have become unhelpful. The same will happen where any part of the thing you based your design decisions upon is rapidly evolving. POSIX forces a ton of really suboptimal performance and behaviours for modern hardware, same problem, same cause.

Path discovery, if implemented by runtime probing, doesn't suffer from this problem of staleness by design. If I ask for a path where I can create fifos which will be visible to other processes running under the same user, the implementation will go off and try to create a fifo with user rw privs in a sequence of path locations until it finds one which works. It then returns that path, and you know for a guaranteed fact that that path will let you create fifos which will be visible to other processes running under the same user.

Same goes for Downloads, Documents, Pictures, Videos, Music folders. And meta-collections, based on composed search of all local storages, of the same.

Niall

Niall Douglas

unread,
Sep 12, 2017, 11:36:24 AM9/12/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com
If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".

There is no such thing as home.

You really missed an opportunity to day "there is no place like home" there.  :)

You know, I had that on the tip of my tongue, but just couldn't think of it. Then my nine month old started swiping at the laptop screen as he likes to do, so I hit Send. 

Niall

Nevin Liber

unread,
Sep 12, 2017, 1:01:31 PM9/12/17
to std-pr...@isocpp.org
On Sat, Sep 9, 2017 at 10:38 PM, Niall Douglas <nialldo...@gmail.com> wrote:
 

I would use functionality like that.  I’ve already had to roll my own, and it isn’t pretty:

https://github.com/HowardHinnant/date/blob/master/tz.cpp#L167-L237

That's pretty par for the course. I've seen, and written, far less pretty implementations in the past.
 
Don't get me wrong, I'd just love a decent proposal implementing this too. But as I mentioned, you'd need to start with a definitive survey of how each platform implements its particular race-free and secure mechanism for retrieving these paths,

Why?  Not every application or environment where the application is going to be run cares.  And if you do care, the rest of the filesystem library won't be applicable, will it?

I really don't see how this is any worse in terms of either security or race-free compared with the rest of the filesystem library.  We can pretend people aren't going to do this, or are willing to wait until C++26 or C++29 to do it, but neither of those is realistic.
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com>  +1-847-691-1404

Arthur O'Dwyer

unread,
Sep 12, 2017, 2:43:53 PM9/12/17
to ISO C++ Standard - Future Proposals
On Tue, Sep 12, 2017 at 8:33 AM, Niall Douglas <nialldo...@gmail.com> wrote:

If my understanding of your argument is wrong, what is your argument? Why do you think this is a bad idea? Lay out the whole thing in a single post. You've provided what seems to be several different lines of argument thus far (security concerns, absolute paths being bad, etc), and you seem to think that my summary doesn't adequately describe them.

So please do so.

The reason there are multiple strands of argument involved is because this thread has split into such. Most appear to understand that has happened and don't find it a problem. But to spell it out:

1. There is the "big picture" argument regarding good engineering (use absolute paths as sparingly as possible, fds to mark filesystem locations wherever possible) vs what's currently forced on us by lack of API support.

2. There is the "no such thing as home" argument where people have incorrect beliefs in what a "home directory" means, belongs to, and is, and far overestimate how common practice it is given all the embedded C++ devices out there. Moreover, historically root processes saw '/' as their "home" directory, and you can't not return '/' if a root process asks for its "home". Which almost certainly will result in very bad, unexpected, consequences.

I don't think the consequences of expanding "~root" to root's home directory are bad or unexpected at all — but notice that you got root's default home directory slightly wrong; it's "/root", not "/", at least on modern Linux boxes. It seems like common sense to me that my Linux box (where "~root" expands to "/root") is perfectly able to run applications as the root user. Linux applications often try to load config from files in the home directory by default; e.g. "~/.vimrc". An example of an application runnable by root is "vim". When you run "vim" as user-account "root", Vim will look for its configuration in root's home directory, i.e., "/root/.vimrc".

The "obvious" next question is, what happens when you run "vim" via "sudoedit", which lets you masquerade as root? Well, I'm no expert, but my understanding is that "sudo" by default does not change the current value of the environment variable HOME. So when "vim" goes to look in "$HOME/.vimrc", it finds your config; and "vim" will never look in "~root/.vimrc" at all. That is, "vim" seems to consider "the home directory" to be exactly synonymous with "the current value of $HOME".

None of this applies to Windows, but it would not surprise me if the situation is almost as simple there.

None of this applies to embedded devices, period. If you don't have a filesystem, or if you don't have a multi-user environment, or if you have a flat filesystem without directory structure, then you simply have no use for $HOME either, and it seems like this thread ought to be irrelevant to you.


Most currently written code using the filesystem is incorrect.

Whether people have the choice to write actually correct code is the central problem. Right now you cannot say: create file A and guaranteed sibling file B and both files must exist as siblings or neither. That forces people to write power loss recovery code which is usually buggy, certainly insufficiently tested. Most don't bother, and just allow incorrectness and corruption of persistent state to occur. And when that becomes a problem, they just fire everything into SQLite and call it "solved".

I aim to do something about that eventually, but Rome wasn't built in a day. I only properly work on this stuff when I'm out of contract. It's slow going.

You earlier pointed to system calls such as openat(fd, path, flags) as the right way forward so that maybe one day "idiomatic C++" and "secure file handling" will be synonymous. That seems quite sensible to me. One might even say that a well-written program should not rely on the process-global "current working directory" at all. (Heck, I'd go even further and say that all global variables are bad!) I would applaud any movement in the direction of "every context that cares about directories maintains its own CWD and all paths opened/statted by that context are opened/statted using openat/fstatat".

However, that's orthogonal to the ability of the program to access its external environment, which is global by definition. I continue to see no problem with allowing the program to get the name of the home directory; that seems to be Step 1 of any workflow, even if you do everything "securely" from then on.
(Here and throughout, "fs::" means "std::filesystem::" and "fs2::"/"std2::" means things that aren't in C++17 but could be proposed.)

    fs::path home = fs2::get_home_directory_path();
    fs2::working_directory_entry home_wd(home);  // basically an fs::directory_entry plus an open file descriptor
    if (!home_wd.exists())
        throw std::runtime_error("Home directory could not be opened because it disappeared!");
    std::cout << "Successfully opened " << home.string() << std::endl;
    fs2::working_directory_entry git_wd = home_wd.create_directory(".git");
    std2::ifstream config_stream(git_wd, "config");
    if (config_stream.is_open())
        config_stream >> config;

OTOH, I guess now that I've proposed this "working_directory" abstraction, I see no problem with combining the first two steps of that workflow.

    fs2::working_directory_entry home_wd = fs2::home_working_directory();
    if (!home_wd.exists())
        throw std::runtime_error("Home directory does not exist!");
    std::cout << "Successfully opened " << home_wd.path().string() << std::endl;
    fs2::working_directory_entry git_wd = home_wd.create_directory(".git");
    std2::ifstream config_stream(git_wd, "config");
    if (config_stream.is_open())
        config_stream >> config;

If we ever get something like "fs2::working_directory", then it would make sense to provide "fs2::current_working_directory()" in addition to "fs::current_path()", and "fs2::home_working_directory()" in addition to "fs2::home_path()".

TLDR: I applaud your openat()-related opinions, but they are completely unrelated to $HOME.


OK, what set of "desired properties" will give me the home directory? If there isn't a set of properties that would give that to me, then that is a poorer API than just asking for the home directory.

A list of properties has been given already. I agree with Thiago that Qt is well ahead of everyone on the same topic, and starting from what they've done looks to be a great idea.

There's also a "list of special folders" in the Windows API, but I would hate to see the C++ standard imitate that mess.

 
Path discovery, if implemented by runtime probing, doesn't suffer from this problem of staleness by design. If I ask for a path where I can create fifos which will be visible to other processes running under the same user, the implementation will go off and try to create a fifo with user rw privs in a sequence of path locations until it finds one which works. It then returns that path, and you know for a guaranteed fact that that path will let you create fifos which will be visible to other processes running under the same user.

I would hate to use a library function that "will go off and try to create [files] in a sequence of path locations until it finds one which works."  That approach suffers from the same undocumentability problem as the current fs::temp_directory_path() — there's no way to document its file-creation behavior except "i dunno lol, wherever it happened to like today." There is immense value in being able to say, definitively, that "the Foo server creates its pidfile in /var/run/foo.pid".

–Arthur

Niall Douglas

unread,
Sep 12, 2017, 4:45:33 PM9/12/17
to ISO C++ Standard - Future Proposals

2. There is the "no such thing as home" argument where people have incorrect beliefs in what a "home directory" means, belongs to, and is, and far overestimate how common practice it is given all the embedded C++ devices out there. Moreover, historically root processes saw '/' as their "home" directory, and you can't not return '/' if a root process asks for its "home". Which almost certainly will result in very bad, unexpected, consequences.

I don't think the consequences of expanding "~root" to root's home directory are bad or unexpected at all — but notice that you got root's default home directory slightly wrong; it's "/root", not "/", at least on modern Linux boxes.

That's purely a choice of common Linux distros (and a very wise one).

I'm just about old enough to remember when root's home directory was '/'. And that's still legal under POSIX.
 

The "obvious" next question is, what happens when you run "vim" via "sudoedit", which lets you masquerade as root? Well, I'm no expert, but my understanding is that "sudo" by default does not change the current value of the environment variable HOME. So when "vim" goes to look in "$HOME/.vimrc", it finds your config; and "vim" will never look in "~root/.vimrc" at all. That is, "vim" seems to consider "the home directory" to be exactly synonymous with "the current value of $HOME".

Correct. $HOME is set by the initial login process.

Code needs to know if it's writing root-only accessible files into an unrelated user's home directory. This is part of why exposing $HOME is unsafe. Expose "Place where I can write the logged in user's space" and "Place where I can write the current user's space" instead.
 

None of this applies to Windows, but it would not surprise me if the situation is almost as simple there.

That's exactly the API we need to have instead. You ask for the home directory for a specific user, and Python goes and parses the passwd file and gives you the correct, untainted, path.

I have no problem with that API. In fact, that's what I want.
 
However, that's orthogonal to the ability of the program to access its external environment, which is global by definition. I continue to see no problem with allowing the program to get the name of the home directory; that seems to be Step 1 of any workflow, even if you do everything "securely" from then on.
(Here and throughout, "fs::" means "std::filesystem::" and "fs2::"/"std2::" means things that aren't in C++17 but could be proposed.)

    fs::path home = fs2::get_home_directory_path();
    fs2::working_directory_entry home_wd(home);  // basically an fs::directory_entry plus an open file descriptor
    if (!home_wd.exists())
        throw std::runtime_error("Home directory could not be opened because it disappeared!");
    std::cout << "Successfully opened " << home.string() << std::endl;
    fs2::working_directory_entry git_wd = home_wd.create_directory(".git");
    std2::ifstream config_stream(git_wd, "config");
    if (config_stream.is_open())
        config_stream >> config;

OTOH, I guess now that I've proposed this "working_directory" abstraction, I see no problem with combining the first two steps of that workflow.

    fs2::working_directory_entry home_wd = fs2::home_working_directory();
    if (!home_wd.exists())
        throw std::runtime_error("Home directory does not exist!");
    std::cout << "Successfully opened " << home_wd.path().string() << std::endl;
    fs2::working_directory_entry git_wd = home_wd.create_directory(".git");
    std2::ifstream config_stream(git_wd, "config");
    if (config_stream.is_open())
        config_stream >> config;

If we ever get something like "fs2::working_directory", then it would make sense to provide "fs2::current_working_directory()" in addition to "fs::current_path()", and "fs2::home_working_directory()" in addition to "fs2::home_path()".

TLDR: I applaud your openat()-related opinions, but they are completely unrelated to $HOME.

We're back into discussing stuff not possible with the current C++ standard, but you're assuming that the home directory is not renamed in between getting the path to it and using it. This is why all absolute paths are dangerous. If you're ever using an absolute path for more than one filesystem entry, your code is almost certainly incorrect.

I know few here consider this anything more than trivial minutae, and I'm overthinking things, and making mountains from molehills. But this is about correctness dammit! If we were discussing std::memory_order then everybody here would agree that it's important to be correct. But when they talk about the filesystem they become very sloppy. They tolerate, indeed often advocate, all sorts of incorrectness and race conditions which would not be tolerated for std::memory_order related code.

And okay, this is C++ the programming language. It's not the Filesystem language. Historically we set a high bar for correctness in C++, and have not for the Filesystem. I get that. But once upon a time even seasoned hands felt that volatile was all you needed to write thread safe code. And quite a few still do.

Race conditions of any form don't make good engineering. You're either correct, or you're not correct. If there was no alternative, that's one thing. But the kernel folk, and POSIX, have delivered the potential to do better to userspace. We just need to do our bit too, and up our game.
 
 
Path discovery, if implemented by runtime probing, doesn't suffer from this problem of staleness by design. If I ask for a path where I can create fifos which will be visible to other processes running under the same user, the implementation will go off and try to create a fifo with user rw privs in a sequence of path locations until it finds one which works. It then returns that path, and you know for a guaranteed fact that that path will let you create fifos which will be visible to other processes running under the same user.

I would hate to use a library function that "will go off and try to create [files] in a sequence of path locations until it finds one which works."  That approach suffers from the same undocumentability problem as the current fs::temp_directory_path() — there's no way to document its file-creation behavior except "i dunno lol, wherever it happened to like today."

Alas the configuration of systems varies widely. Right down to the individual machine. For example I keep my Windows system partition very small, and mount my User account and a few other directories from a second partition. That way I can safely hose and reinstall my Windows partition, and it's trivially easy to restore my data. But it means different parts of C:\ will have different free space.

I'm not saying that probing should be unavoidable. It should be optional. So you can fetch a list of discovered paths, some or all of which might work. You can also fetch a list of guaranteed paths which have been probed. So for example, if you ask for a temp path where I can store a 10Gb file, it'll go probe for some temp path with that amount of free space.

On my Windows machine at least, C:\TEMP will have only a few Gb free at most. Whereas C:\Users\ned will have dozens  of Gb free.
 
There is immense value in being able to say, definitively, that "the Foo server creates its pidfile in /var/run/foo.pid".

I agree. But such things are conventions per distribution. Not set in stone. POSIX I remember does give a list of possibilities for most of those things, and a conforming implementation ought to check them all when on POSIX.

Niall

Thiago Macieira

unread,
Sep 12, 2017, 5:44:17 PM9/12/17
to std-pr...@isocpp.org
On terça-feira, 12 de setembro de 2017 13:45:33 PDT Niall Douglas wrote:
> We're back into discussing stuff not possible with the current C++
> standard, but you're assuming that the home directory is not renamed in
> between getting the path to it and using it. This is why all absolute paths
> are dangerous. If you're ever using an absolute path for more than one
> filesystem entry, your code is almost certainly incorrect.

Considering we don't yet have race-free filesystem API, then every single
filesystem API is subject to this problem. Shall we just dump the new API and
have the race-free version only that you're working on?

I also agree with Arthur: the problem is orthogonal. I may want the path to
write it in a file, not to open it.

Arthur O'Dwyer

unread,
Sep 12, 2017, 6:20:20 PM9/12/17
to ISO C++ Standard - Future Proposals
On Tue, Sep 12, 2017 at 1:45 PM, Niall Douglas <nialldo...@gmail.com> wrote:

2. There is the "no such thing as home" argument where people have incorrect beliefs in what a "home directory" means, belongs to, and is, and far overestimate how common practice it is given all the embedded C++ devices out there. Moreover, historically root processes saw '/' as their "home" directory, and you can't not return '/' if a root process asks for its "home". Which almost certainly will result in very bad, unexpected, consequences.

I don't think the consequences of expanding "~root" to root's home directory are bad or unexpected at all — but notice that you got root's default home directory slightly wrong; it's "/root", not "/", at least on modern Linux boxes.

That's purely a choice of common Linux distros (and a very wise one).

I'm just about old enough to remember when root's home directory was '/'. And that's still legal under POSIX.

Sure it's legal (I assume), but if everyone agrees that it would be unwise to do in practice, and common distros don't do it, and future distros won't do it, then we don't need to discuss it.


The "obvious" next question is, what happens when you run "vim" via "sudoedit", which lets you masquerade as root? Well, I'm no expert, but my understanding is that "sudo" by default does not change the current value of the environment variable HOME. So when "vim" goes to look in "$HOME/.vimrc", it finds your config; and "vim" will never look in "~root/.vimrc" at all. That is, "vim" seems to consider "the home directory" to be exactly synonymous with "the current value of $HOME".

Correct. $HOME is set by the initial login process.

Code needs to know if it's writing root-only accessible files into an unrelated user's home directory. This is part of why exposing $HOME is unsafe. Expose "Place where I can write the logged in user's space" and "Place where I can write the current user's space" instead.

I would have said, "Correct. $HOME is set by someone, but that someone is never the 'sudo' process itself. It won't override the $HOME setting of the current environment (unless you pass -H)."
Where the current setting of $HOME originally came from is irrelevant for vim's purposes. The user is free to "export HOME=/foo/bar" if they want to, and vim will respect that.

vim doesn't need to know if it's writing root-only-accessible files into some unrelated user's home directory. vim doesn't need to care. It just needs to have a well-supported, simple, stable way to open "the current home directory", so that it can document that that's what it does; and then if the user does something stupid like "sudoedit /MyPhotos/A, take myself out of sudoers, try to open /MyPhotos/A", well, it'll be obvious what the user did wrong and how they should fix it.
I find your examples fairly innocuous. They all seem to be of the form "A user with sudo privileges can mildly bork things up for themselves."
Besides, normal applications shouldn't be running as root in the first place, and normal users shouldn't be sudo'ing most commands — I'd consider these vastly more fundamental security principles than anything about race conditions — and so I also find your "root"-based examples fairly contrived unless I'm misunderstanding something.


None of this applies to Windows, but it would not surprise me if the situation is almost as simple there.

That's exactly the API we need to have instead. You ask for the home directory for a specific user, and Python goes and parses the passwd file and gives you the correct, untainted, path.

I have no problem with that API. In fact, that's what I want.

I agree that expanduser() is a better API than merely home_directory_path(), because it can generate "~bob" in addition to just "~".
IIRC there's a FreeBSD library function to do tilde-expansion like this, but I've forgotten its name and can't figure out the right search terms to find it again. :P
But. When you ask Python to expand a lone "~", the first thing it does is look for the current setting of $HOME and use that. Only if that first approach fails, will it look up the current user (i.e. root, if you're sudo'ed) and then call getpwuid() to fetch that user's home directory.

If all you had was a primitive function get_home_of(username), you'd merely have reduced the current problem to a harder problem — namely, instead of getting the current home directory, we'd have to get the current user's username! This is a harder problem than getting $HOME. Because all the Unix tools are designed to help you get the right value for $HOME (see sudo's behavior above), but for example "sudo" does modify the value of $USER in the sudo'ed environment.

(And then, although this is orthogonal, we'd have two race conditions. You'd have to guard not only against "malicious concurrent moving-around of directory structure re the home path" but also "malicious concurrent calls to usermod -d (change home directory)" and "malicious concurrent calls to usermod -l (change username)".  The fundamental issue here is that both POSIX paths and POSIX usernames are stringly typed; just because someone told me a few milliseconds ago that his identifier was "bob" doesn't mean that that's the correct identifier for him right now.)


TLDR: I applaud your openat()-related opinions, but they are completely unrelated to $HOME.

We're back into discussing stuff not possible with the current C++ standard, but you're assuming that the home directory is not renamed in between getting the path to it and using it. This is why all absolute paths are dangerous. If you're ever using an absolute path for more than one filesystem entry, your code is almost certainly incorrect.

More-or-less agreed.
But, consider again some user-facing application that is documented to "store its config in /foo/bar". And let's say that while this application is running, somebody maliciously runs "mv /foo/bar /foo/baz && mkdir /foo/bar" on the filesystem. The application has three options:
(1) Detect the malicious action and abort with an error. Let's assume aborting is not desirable.
(2) "Use absolute paths": the next time it needs to open a file in the config directory, it looks in the new /foo/bar and finds no config. This will likely result in a problem for the user. However, the user's problem is super easy to explain. "Doctor, it hurts when I do this." "Don't do that, then."
(3) "Use relative paths": the next time it needs to open a file in the config directory, it uses openat() relative to the handle it kept to /foo/baz (née /foo/bar). This will result in new files getting created in /foo/baz, contrary to the documentation's promise. Furthermore, this will result in a problem for the user, if the reason the user cleared out the old directory is that it was getting too full, and/or to atomically create a backup of the existing files. (See also: log rotation.)

Either (2) or (3) could be the "right behavior" in practice. Either (2) or (3) could be the "wrong behavior" in practice.
I would much rather debug a user complaint about (2), though.

Related but not really related — I've seen in practice where a disk filled up due to a runaway server process logging into a file that had been deleted months before. The server process kept its file handle open and was not periodically re-checking the original path. It took altogether too long for us to figure out that all that disk usage was coming from a file that wasn't even in the filesystem anymore!

–Arthur

torto...@gmail.com

unread,
Sep 13, 2017, 4:19:47 AM9/13/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


On Monday, 11 September 2017 16:30:35 UTC+1, Niall Douglas wrote:

So presumably we need an equivalent or an extension that can use a file descriptor + relative path?

  
I saw various handle types but I didn't spot anything with a basic structure of:

class better_path
{
    path_handle root;
    std::string root_path;
    std::string leaf_path;
};

or

class better_path
{
    struct path_element
    {
       // one element may be invalid, one must be valid
       native_file_handle handle; 
       std::string    name;              
    };
    std::vector<path_element> path_impl;
};

You might build this on top of AFIO but it doesn't seem to have it yet.

Ideally you'd be able to use this both where a string is appropriate and where you need to be race-free.
Yes of course, this does potentially leak string based paths which wouldn't be race free but I'm not sure you can avoid that completely.
You just need documentation to educate users.

An issue is that adding a path element might be a purely string operation or an openat(). You need both.
If feels on much shakier ground that having separate purely string based and purely handle based representations and leaving it to programmers to be safe
but it could work.
   

Niall Douglas

unread,
Sep 13, 2017, 8:46:57 AM9/13/17
to ISO C++ Standard - Future Proposals

I'm just about old enough to remember when root's home directory was '/'. And that's still legal under POSIX.

Sure it's legal (I assume), but if everyone agrees that it would be unwise to do in practice, and common distros don't do it, and future distros won't do it, then we don't need to discuss it.

The ISO C++ standard has always targeted POSIX, formerly the ISO portable operating systems standard. It is moot if current distros don't implement a part of POSIX, we have to assume whatever POSIX explicitly guarantees is supported.

Now, if someone logged a defect to the AWG about root's home being '/', apart from one or two I'd think you'd get widespread support. But somebody has to propose it, champion it, and push it through.
 

vim doesn't need to know if it's writing root-only-accessible files into some unrelated user's home directory. vim doesn't need to care. It just needs to have a well-supported, simple, stable way to open "the current home directory", so that it can document that that's what it does; and then if the user does something stupid like "sudoedit /MyPhotos/A, take myself out of sudoers, try to open /MyPhotos/A", well, it'll be obvious what the user did wrong and how they should fix it.
I find your examples fairly innocuous. They all seem to be of the form "A user with sudo privileges can mildly bork things up for themselves."
Besides, normal applications shouldn't be running as root in the first place, and normal users shouldn't be sudo'ing most commands — I'd consider these vastly more fundamental security principles than anything about race conditions — and so I also find your "root"-based examples fairly contrived unless I'm misunderstanding something.

I was trying to simplify for argument using an extremis example.

The real pain point actually is between users where ACLs are being used. On more than one occasion I've gone to delete something from my home directory only to find it refuses to delete, and I need to go reset ACLs in an entire directory tree before I can do what I want.

On Windows I've even found files which cannot be deleted, not by anyone because some a-hole with the SYSTEM account created them and set ACLs to nobody access. Even Administrator can't touch those.

But anyone this is all ancillary. I just want path discovery to let me specify a user name and it will return a path which is appropriate for that user. That's all.
 

None of this applies to Windows, but it would not surprise me if the situation is almost as simple there.

That's exactly the API we need to have instead. You ask for the home directory for a specific user, and Python goes and parses the passwd file and gives you the correct, untainted, path.

I have no problem with that API. In fact, that's what I want.

I agree that expanduser() is a better API than merely home_directory_path(), because it can generate "~bob" in addition to just "~".
IIRC there's a FreeBSD library function to do tilde-expansion like this, but I've forgotten its name and can't figure out the right search terms to find it again. :P
But. When you ask Python to expand a lone "~", the first thing it does is look for the current setting of $HOME and use that. Only if that first approach fails, will it look up the current user (i.e. root, if you're sudo'ed) and then call getpwuid() to fetch that user's home directory.

I have no problem with path discovery using environment variables for override if you ask for that.
 

(And then, although this is orthogonal, we'd have two race conditions. You'd have to guard not only against "malicious concurrent moving-around of directory structure re the home path" but also "malicious concurrent calls to usermod -d (change home directory)" and "malicious concurrent calls to usermod -l (change username)".  The fundamental issue here is that both POSIX paths and POSIX usernames are stringly typed; just because someone told me a few milliseconds ago that his identifier was "bob" doesn't mean that that's the correct identifier for him right now.)

All correct, but ancillary.

The primary class of incorrectness with filesystem paths is mainly "the index file". Consider a directory full of jpegs, and you are making an index file of those with say thumbnails. That index file is absolutely tied, in the strongest possible terms, to that directory. It is fundamentally incorrect to have that index file anywhere but that directory.

This is the class of incorrectness which needs fixing. Other, weaker, relations between files are less important for now at least.
 

But, consider again some user-facing application that is documented to "store its config in /foo/bar". And let's say that while this application is running, somebody maliciously runs "mv /foo/bar /foo/baz && mkdir /foo/bar" on the filesystem. The application has three options:
(1) Detect the malicious action and abort with an error. Let's assume aborting is not desirable.
(2) "Use absolute paths": the next time it needs to open a file in the config directory, it looks in the new /foo/bar and finds no config. This will likely result in a problem for the user. However, the user's problem is super easy to explain. "Doctor, it hurts when I do this." "Don't do that, then."
(3) "Use relative paths": the next time it needs to open a file in the config directory, it uses openat() relative to the handle it kept to /foo/baz (née /foo/bar). This will result in new files getting created in /foo/baz, contrary to the documentation's promise. Furthermore, this will result in a problem for the user, if the reason the user cleared out the old directory is that it was getting too full, and/or to atomically create a backup of the existing files. (See also: log rotation.)

Either (2) or (3) could be the "right behavior" in practice. Either (2) or (3) could be the "wrong behavior" in practice.
I would much rather debug a user complaint about (2), though.

All valid analysis.

The real problem here is that the documentation speaks in terms of paths at all in the first place. That's where the ambiguity stems from.

If the documentation spoke in terms of unique inodes i.e. the configuration shall be placed at the inode referred to as /foo/bar/ at the time of process creation then the ambiguity is removed.

You can, absolutely, say in the documentation that the configuration will be placed in whatever inode exists at /foo/bar/ at the time of configuration writing.

It just takes a bit of clarity of language, and predictability and correctness is hugely improved.

Incidentally, afio::fs_handle::unique_id() returns a unique id for an open fd which is guaranteed to be unique on the running computer. It's useful for hash maps, comparing different open handles, security checking etc.
 

Related but not really related — I've seen in practice where a disk filled up due to a runaway server process logging into a file that had been deleted months before. The server process kept its file handle open and was not periodically re-checking the original path. It took altogether too long for us to figure out that all that disk usage was coming from a file that wasn't even in the filesystem anymore!
 
You can see now why Windows never lets you unlink a file until nobody is using it anymore in the strict atomic sense.

Niall

Niall Douglas

unread,
Sep 13, 2017, 8:59:17 AM9/13/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


So presumably we need an equivalent or an extension that can use a file descriptor + relative path?

  
I saw various handle types but I didn't spot anything with a basic structure of:

class better_path
{
    path_handle root;
    std::string root_path;
    std::string leaf_path;
};

or

class better_path
{
    struct path_element
    {
       // one element may be invalid, one must be valid
       native_file_handle handle; 
       std::string    name;              
    };
    std::vector<path_element> path_impl;
};

You might build this on top of AFIO but it doesn't seem to have it yet.

Heh.

It's actually staring you in the face:

afio::path_handle refers to some inode on the filesystem which could have any path different to when it was constructed.

afio::path_view is a borrowed reference to a string/path stored elsewhere.

Just combine the two to create a race free reference to a path fragment relative to wherever the path_handle is currently located. All the APIs in AFIO accept path_handle + path_view, in fact, it's the only thing they accept.
 

Ideally you'd be able to use this both where a string is appropriate and where you need to be race-free.
Yes of course, this does potentially leak string based paths which wouldn't be race free but I'm not sure you can avoid that completely.
You just need documentation to educate users.

An issue is that adding a path element might be a purely string operation or an openat(). You need both.
If feels on much shakier ground that having separate purely string based and purely handle based representations and leaving it to programmers to be safe
but it could work.
   
AFIO never stores a path, too expensive (malloc), too racy. But you realise right that you can ask any open handle for its current path right? It's afio::handle::current_path().

That call goes and asks the kernel what the current path for that open fd is, makes sure what is returned is correct and returns it.

As the docs for that API say, that function is by definition racy, and you should not use it. It's also quite slow. But it's there when you need it (some of the generic filesystem algorithms really need it e.g. finding, race free, the parent directory of an inode, we need to loop parent directory open and ensure it contains the correct child. And you need correct parent directory finding to implement sibling inode creation, and so on).

Niall

torto...@gmail.com

unread,
Sep 14, 2017, 4:58:56 AM9/14/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com, torto...@gmail.com


On Wednesday, 13 September 2017 13:59:17 UTC+1, Niall Douglas wrote:


So presumably we need an equivalent or an extension that can use a file descriptor + relative path?

  

You might build this on top of AFIO but it doesn't seem to have it yet.

Heh.

It's actually staring you in the face:

afio::path_handle refers to some inode on the filesystem which could have any path different to when it was constructed.

afio::path_view is a borrowed reference to a string/path stored elsewhere.

Just combine the two to create a race free reference to a path fragment relative to wherever the path_handle is currently located. All the APIs in AFIO accept path_handle + path_view, in fact, it's the only thing they accept.
    
AFIO never stores a path, too expensive (malloc), too racy. But you realise right that you can ask any open handle for its current path right? It's afio::handle::current_path().

Ah yes. Missed both of those first time round.


Niall Douglas

unread,
Sep 14, 2017, 6:40:37 PM9/14/17
to ISO C++ Standard - Future Proposals, david.b...@gmail.com
Ok, I specced out how I would prefer to see user account path discovery implemented:

//! \brief Contains functions used to discover suitable paths for things
namespace path_discovery
{
 
//! \brief A discovered path.
 
struct discovered_path
 
{
    path_view path
;  //!< The path discovered.
   
//! Source of the discovered path.
   
enum class source_type
   
{
      hardcoded
,    //!< This path came from an internal hardcoded list of paths likely for this system.
      system
,       //!< This path came from querying the system.
      environment
,  //!< This path came from an environment variable (an override?).
     
local         //!< This path was added locally.
   
} source;


   
/*! If this path was successfully probed for criteria verification, this was its stat after any symlink
    derefencing at that time. Secure applications ought to verify that any handles opened to the path have
    the same `st_ino` and `st_dev` as this structure before use.
    */

    optional
<stat_t> stat;
 
};
 
/*! \brief Returns a list of potential directories which might be usuable for temporary files.

  This is a fairly lightweight call which builds a master list of all potential temporary file directories
  given the environment block of this process (unless SUID or SGID or Privilege Elevation are in effect) and the user
  running this process. It does not verify if any of them exist, or are writable, or anything else about them.
  An internal mutex is held for the duration of this call.

  \mallocs Allocates the master list of discovered temporary directories exactly once per process,
  unless `refresh` is true in which case the list will be refreshed. The system calls to retrieve paths
  may allocate additional memory for paths returned.
  \errors This call never fails, except to return an empty span.
  */

  AFIO_HEADERS_ONLY_FUNC_SPEC span
<discovered_path> all_temporary_directories(bool refresh = false) noexcept;


 
/*! \brief Returns a subset of `all_temporary_directories()` each of which has been tested to be writable
  by the current process. No testing is done of available writable space.

  After this call returns, the successfully probed entries returned by `all_temporary_directories()` will have their
  stat structure set. As the probing involves creating a non-zero sized file in each possible temporary
  directory to verify its validity, this is not a fast call. It is however cached statically, so the
  cost occurs exactly once per process, unless someone calls `all_temporary_directories(true)` to wipe and refresh
  the master list. An internal mutex is held for the duration of this call.
  \mallocs None.
  \error This call never fails, though if it fails to find any writable temporary directory, it will
  terminate the process.
  */

  AFIO_HEADERS_ONLY_FUNC_SPEC span
<discovered_path> verified_temporary_directories() noexcept;


 
/*! \brief Returns a reference to an open handle to a verified temporary directory where files created are
  stored in a filesystem directory, usually under the current user's quota.

  This is implemented by iterating all of the paths returned by `verified_temporary_directories()`
  and checking what file system is in use. The following regex is used:

  `btrfs|cifs|exfat|ext?|f2fs|hfs|jfs|nfs|nilf2|ufs|vfat|xfs|zfs|msdosfs|newnfs|ntfs|smbfs|unionfs|fat|fat32`

  The handle is created during `verified_temporary_directories()` and is statically cached thereafter.
  */

  AFIO_HEADERS_ONLY_FUNC_SPEC
const path_handle &storage_backed_temporary_files_directory() noexcept;


 
/*! \brief Returns a reference to an open handle to a verified temporary directory where files created are
  stored in memory/paging file, and thus access may be a lot quicker, but stronger limits on
  capacity may apply.

  This is implemented by iterating all of the paths returned by `verified_temporary_directories()`
  and checking what file system is in use. The following regex is used:

  `tmpfs|ramfs`

  The handle is created during `verified_temporary_directories()` and is statically cached thereafter.

  \note If you wish to create an anonymous memory-backed inode for mmap and paging tricks like mapping
  the same extent into multiple addresses e.g. to implement a constant time zero copy `realloc()`,
  strongly consider using a non-file-backed `section_handle` as this is more portable.
  */

  AFIO_HEADERS_ONLY_FUNC_SPEC
const path_handle &memory_backed_temporary_files_directory() noexcept;
}


As you'll note, it refers only to temporary directory path discovery as that's my main concern right now with AFIO, and yes it's code I copy and pasted from AFIO's sources, so it's not a mockup.

But the general design principle can be extended to other user account path discovery too:

1. The lowest level API returns a list of potential paths, not some canonical path. This is because different paths may exist for even say the user's preferred Downloads directory e.g. environment supplied, system supplied, hardcoded.

2. We place environment variable discovered paths at a higher priority than other methods of discovery to permit environment variable based overrides of system supplied defaults.

3. We disable environment variable discovery if SUID, SGID or Privilege Elevation on Windows is in effect.

4. The middle level API probes the list of potential paths for validity, so in temp directory's case, that means writable. It returns a subset of the low level API's list. It fills in a stat_t structure for each path found to be valid so that secure programs can later choose verify that no bait and switch has occurred via st_ino  and st_dev.

5. The highest level API returns statically allocated open path handles to a temp directory discovered to be filesystem or memory backed.


If the OP proposed some form of lightweight path discovery framework to WG21 based on something like just illustrated, I'd have no objection. It's the right design: layered.

Niall

Thiago Macieira

unread,
Sep 14, 2017, 7:45:04 PM9/14/17
to std-pr...@isocpp.org
On Wednesday, 13 September 2017 05:59:17 PDT Niall Douglas wrote:
> AFIO never stores a path, too expensive (malloc), too racy. But you realise
> right that you can ask any open handle for its current path right? It's
> afio::handle::current_path().

And you do realise that /proc does not have to be mounted on Linux, right? On
a container environment, it might not have been set up.

torto...@gmail.com

unread,
Sep 14, 2017, 7:54:55 PM9/14/17
to ISO C++ Standard - Future Proposals
Good catch. Is there another means of converting an FD to a path from within a (Linux) container?
I see that AFIO has (for current_path) #ifdefed one strategy per OS but there are presumably more strategies possible for each OS and more OSs to consider.

Niall Douglas

unread,
Sep 14, 2017, 9:11:50 PM9/14/17
to ISO C++ Standard - Future Proposals
On Wednesday, 13 September 2017 05:59:17 PDT Niall Douglas wrote:
> AFIO never stores a path, too expensive (malloc), too racy. But you realise
> right that you can ask any open handle for its current path right? It's
> afio::handle::current_path().

And you do realise that /proc does not have to be mounted on Linux, right? On
a container environment, it might not have been set up.

In which case a ton of stuff stops working. That's not a showstopper, just heavily reduced functionality.

I may down the line implement some remedial support for such platforms, but it would come at very significant added runtime overhead as a process-wide shared list of fds would need to be kept, and it'll need a big global mutex around it. AFIO v1 implemented that by the way. For v2 I've not bothered yet.

Still, if AFIO ever enters standards track, such support may be demanded of me.



Good catch. Is there another means of converting an FD to a path from within a (Linux) container?
I see that AFIO has (for current_path) #ifdefed one strategy per OS but there are presumably more strategies possible for each OS and more OSs to consider.

According to the lsof sources, the sole method on Linux for reading the current path of an open fd is via /proc.

Niall

Arthur O'Dwyer

unread,
Sep 14, 2017, 9:21:25 PM9/14/17
to ISO C++ Standard - Future Proposals
On Thu, Sep 14, 2017 at 6:11 PM, Niall Douglas <nialldo...@gmail.com> wrote:
On Wednesday, 13 September 2017 05:59:17 PDT Niall Douglas wrote:
> AFIO never stores a path, too expensive (malloc), too racy. But you realise
> right that you can ask any open handle for its current path right? It's
> afio::handle::current_path().

And you do realise that /proc does not have to be mounted on Linux, right? [...]

Good catch. Is there another means of converting an FD to a path from within a (Linux) container? [...]

According to the lsof sources, the sole method on Linux for reading the current path of an open fd is via /proc.

You guys seem to have more understanding of your platforms than I do, but for what it's worth, my current understanding is like this:

- a POSIX file descriptor (fd) might or might not refer to a file in the filesystem; it might equally well refer to a network socket
- even if the fd does refer to a file (that is, an inode), an inode does not have a "path"; it does not even have a "filename" (the leafiest part of a path). Instead, an inode holds metadata about the file, such as its last-modified date and how to find the blocks holding its bytewise content
- The filesystem itself might have a unique path that refers to that inode; or there might be several such paths (hardlinks); or there might be zero such paths (if the file has been removed from the filesystem but is still open, as Niall and I briefly mentioned as a dev-ops nightmare elsewhere in this thread).

So if you are trying to discover "the path of" an inode, you are already in a state of sin, and the platform is not obliged to offer you any help whatsoever, AFAIK. If you want to deal with paths-and-inodes together as pairs, you need to track them together in pairs — which is exactly what std::filesystem::directory_entry does.

my $.02,
Arthur

Niall Douglas

unread,
Sep 14, 2017, 10:32:47 PM9/14/17
to ISO C++ Standard - Future Proposals

You guys seem to have more understanding of your platforms than I do, but for what it's worth, my current understanding is like this:

- a POSIX file descriptor (fd) might or might not refer to a file in the filesystem; it might equally well refer to a network socket

Correct. Though network sockets have (pseudo) paths on at least Windows and Linux
 
- even if the fd does refer to a file (that is, an inode), an inode does not have a "path"; it does not even have a "filename" (the leafiest part of a path). Instead, an inode holds metadata about the file, such as its last-modified date and how to find the blocks holding its bytewise content
- The filesystem itself might have a unique path that refers to that inode; or there might be several such paths (hardlinks); or there might be zero such paths (if the file has been removed from the filesystem but is still open, as Niall and I briefly mentioned as a dev-ops nightmare elsewhere in this thread).

An inode has zero to many identifying paths, reflecting its hard link count.

An open fd has one path associated with it, that being the specific path used for opening. Changes by third parties to that path will be reflected by the kernel to that path when queried.

Linux and Windows implement per-fd path change tracking very well. Bug free. Other systems have quirks/bugs, like returning a different path to an inode hard linked more than once under certain circumstances. I've opened bug trackers for all that I know of with their respective vendors, but none have been fixed to date.
 
Regarding unlinking, Linux returns the string "(deleted)" if you unlink an open fd's path for that fd. Windows never deletes any path until last handle is closed. Other systems have quirks here too. But none are showstoppers.
 

So if you are trying to discover "the path of" an inode, you are already in a state of sin, and the platform is not obliged to offer you any help whatsoever, AFAIK. If you want to deal with paths-and-inodes together as pairs, you need to track them together in pairs — which is exactly what std::filesystem::directory_entry does.

Open fd path tracking works well on all major OS kernels, sometimes with quirks workarounds. And without it you cannot implement race free filesystem. It's a hard prerequisite due to incompleteness in the POSIX race free filesystem spec. Windows is in fact the only OS which doesn't require path discovery due to beautifully complete race free filesystem support, yet it has the best path discovery API of them all which is ironic.

The bit you might be missing is that without it you cannot know if a third party has moved something you are using, and thus you are about to rename/unlink a different inode to the one the user asked for i.e. data loss. This is why we never store and always retrieve paths, storing them is inherently racy.

Niall

Message has been deleted

Thiago Macieira

unread,
Sep 14, 2017, 11:04:59 PM9/14/17
to std-pr...@isocpp.org
On Thursday, 14 September 2017 18:11:50 PDT Niall Douglas wrote:
> > On Wednesday, 13 September 2017 05:59:17 PDT Niall Douglas wrote:
> >> > AFIO never stores a path, too expensive (malloc), too racy. But you
> >>
> >> realise
> >>
> >> > right that you can ask any open handle for its current path right? It's
> >> > afio::handle::current_path().
> >>
> >> And you do realise that /proc does not have to be mounted on Linux,
> >> right? On
> >> a container environment, it might not have been set up.
>
> In which case a ton of stuff stops working. That's not a showstopper, just
> heavily reduced functionality.
>
> I may down the line implement some remedial support for such platforms, but
> it would come at very significant added runtime overhead as a process-wide
> shared list of fds would need to be kept, and it'll need a big global mutex
> around it. AFIO v1 implemented that by the way. For v2 I've not bothered
> yet.
>
> Still, if AFIO ever enters standards track, such support may be demanded of
> me.

Or, drop the method. Provide it as a separate, optional functionality that may
be absent or fail unpredictably. People should keep the name if they need to,
or use the possibly-failing API that gets the current name.

From reading your code and your FreeBSD report, it looks like it has problems
with hardlinks too, at least on Darwin.

> > Good catch. Is there another means of converting an FD to a path from
> > within a (Linux) container?
> > I see that AFIO has (for current_path) #ifdefed one strategy per OS but
> > there are presumably more strategies possible for each OS and more OSs to
> > consider.
>
> According to the lsof sources, the sole method on Linux for reading the
> current path of an open fd is via /proc.

Yep, but it wouldn't be too hard to have the kernel add an ioctl(2) for it, so
that it did not require /proc. Plus, it could also let you know whether the
file was deleted, as opposed to a file whose name ends in " (deleted)".

Thiago Macieira

unread,
Sep 14, 2017, 11:14:06 PM9/14/17
to std-pr...@isocpp.org
On Thursday, 14 September 2017 19:32:47 PDT Niall Douglas wrote:
> Open fd path tracking works well on all major OS kernels, sometimes with
> quirks workarounds.

Though it does, it's not really an intentional feature. The link name in /proc
on Linux is more for debugging and user purposes than for the retrieving the
name like you're doing. I like that you can do that and that allows you to
provde an API for it, but it's not a reliable feature and you're very likely
going to find that the great majority of OSes don't have that. To name a few
that you don't have code for:

- OpenBSD
- NetBSD
- DragonflyBSD
- Solaris
- AIX
- QNX
- VxWorks
- INTEGRITY

At least half of the above I know Qt currently has users for, with QNX being a
major user, with millions of deployments in medical and automotive segment.

> The bit you might be missing is that without it you cannot know if a third
> party has moved something you are using, and thus you are about to
> rename/unlink a different inode to the one the user asked for i.e. data
> loss. This is why we never store and always retrieve paths, storing them is
> inherently racy.

But it's also racy if you retrieve the path, since a modification can happen
right after that. There's no race-free rename/move syscall.

BTW, one more failure mode: you may have files open that refer to files not
visible in the filesystem due to a clobbered mount or chroot.

Niall Douglas

unread,
Sep 15, 2017, 8:24:23 AM9/15/17
to ISO C++ Standard - Future Proposals

> In which case a ton of stuff stops working. That's not a showstopper, just
> heavily reduced functionality.


Or, drop the method. Provide it as a separate, optional functionality that may
be absent or fail unpredictably. People should keep the name if they need to,
or use the possibly-failing API that gets the current name.

People using AFIO can store a path alongside a handle if they wish to. No need for AFIO to do it for them. handle::current_path() is virtual, so a subclass very easily could swap in a locally stored current path (this is intentional).

Race free unlinking and rename which require handle::current_path() to work can be told to not be race free using handle::flag::disable_safety_unlinks.

 

From reading your code and your FreeBSD report, it looks like it has problems
with hardlinks too, at least on Darwin.

Darwin has an irritating bug in their relink implementation in that they fail to correctly update the path slot used by the open fds, thus causing the per-fd paths to cycle for inodes with more than one path. I could almost tell them exactly what the code bug is just from the pattern. Otherwise Darwin works well.

FreeBSD is more interesting. If your fd is in the LIFO kernel path cache, all works as expected. If it drops out of the path cache, if it's a directory all works as expected, if it's a regular file it returns a null string.

I actually got some feedback from the kernel devs on fixing this, and their view is that it's trivially easy to fix, please submit a patch.

None of these quirks are showstoppers. They can be worked around easily enough, if there is user demand. Or the user can just disable race free unlinking.

Something you probably aren't aware of is that AFIO v1 did work around all these quirks, and it was slammed for it in the Boost peer review in 2015. Overwhelmingly the peer review wanted these quirks exposed, untainted, to user facing APIs. I have delivered on that in v2.
 

> > Good catch. Is there another means of converting an FD to a path from
> > within a (Linux) container?
> > I see that AFIO has (for current_path) #ifdefed one strategy per OS but
> > there are presumably more strategies possible for each OS and more OSs to
> > consider.
>
> According to the lsof sources, the sole method on Linux for reading the
> current path of an open fd is via /proc.

Yep, but it wouldn't be too hard to have the kernel add an ioctl(2) for it, so
that it did not require /proc. Plus, it could also let you know whether the
file was deleted, as opposed to a file whose name ends in " (deleted)".

I know some of the Linux filesystem folk and have spoken to them about this. Their view is that procfs ought always to be mounted, and so much stuff breaks when it isn't that it's not viable. Therefore they won't add the ioctl as an existing, well tested, reliable interface for discovering paths is already present.

(For the record, my objection to procfs is more ideological, I strongly disagree philosophically with the entire idea. That was the basis of my approach to them. They disagree, naturally)

Niall

Niall Douglas

unread,
Sep 15, 2017, 8:34:14 AM9/15/17
to ISO C++ Standard - Future Proposals
On Friday, September 15, 2017 at 4:14:06 AM UTC+1, Thiago Macieira wrote:
On Thursday, 14 September 2017 19:32:47 PDT Niall Douglas wrote:
> Open fd path tracking works well on all major OS kernels, sometimes with
> quirks workarounds.

Though it does, it's not really an intentional feature. The link name in /proc
on Linux is more for debugging and user purposes than for the retrieving the
name like you're doing.

It is an intentional feature on Linux for race free deletion. I've spoken to some of the kernel devs who debugged it exactly for that use case. And it is very well debugged incidentally, I've never once found instability, even on 2.6 kernels.

Now, all that said, there is a very obvious extension to unlinkat() and renameat() which lets you race free unlink and rename an open fd directly, just as Windows can. And that would eliminate the need for these current path based algorithms entirely. But again, I was told "please submit a patch for that".
 
I like that you can do that and that allows you to
provde an API for it, but it's not a reliable feature and you're very likely
going to find that the great majority of OSes don't have that. To name a few
that you don't have code for:

 - OpenBSD
 - NetBSD
 - DragonflyBSD
 - Solaris
 - AIX
 - QNX
 - VxWorks
 - INTEGRITY

As a personal rule, I don't add code for platforms I don't have access to. But don't confuse lack of code support for lack of a comprehensive prior review of all the main OSs before deciding to design the library around it. At least half of your list provide some method of retrieving the current path of an open fd, or they prevent renaming of any path component in an open fd which amounts to the same thing.


> The bit you might be missing is that without it you cannot know if a third
> party has moved something you are using, and thus you are about to
> rename/unlink a different inode to the one the user asked for i.e. data
> loss. This is why we never store and always retrieve paths, storing them is
> inherently racy.

But it's also racy if you retrieve the path, since a modification can happen
right after that. There's no race-free rename/move syscall.

The algorithm is this for deletion:

1. Fetch the current path of the open fd about to be unlinked.

2. Open its containing directory.

3. Check the directory just opened has a child inode with the same name, inode and device as the one we intend to delete. If not, loop to 1.

4. Call unlinkat() to delete the open fd.


This deletion algorithm is race free up to the containing directory, as the documentation states. Anything from the containing directory up to the root directory may permute without loss of safety.
 

BTW, one more failure mode: you may have files open that refer to files not
visible in the filesystem due to a clobbered mount or chroot.

We check for that already. 

Niall

Thiago Macieira

unread,
Sep 15, 2017, 11:47:56 AM9/15/17
to std-pr...@isocpp.org
On Friday, 15 September 2017 05:34:14 PDT Niall Douglas wrote:
> The algorithm is this for deletion:
>
> 1. Fetch the current path of the open fd about to be unlinked.
>
> 2. Open its containing directory.
>
> 3. Check the directory just opened has a child inode with the same name,
> inode and device as the one we intend to delete. If not, loop to 1.
>
> 4. Call unlinkat() to delete the open fd.

How do you guarantee that the file wasn't replaced between 3 and 4?

> This deletion algorithm is race free up to the containing directory, as the
> documentation states. Anything from the containing directory up to the root
> directory may permute without loss of safety.

Agreed, but that was not what I was thinking of.

> > BTW, one more failure mode: you may have files open that refer to files
> > not
> > visible in the filesystem due to a clobbered mount or chroot.
>
> We check for that already.

I think Linux simply returns "/" for those.

Niall Douglas

unread,
Sep 15, 2017, 2:02:37 PM9/15/17
to ISO C++ Standard - Future Proposals
On Friday, September 15, 2017 at 4:47:56 PM UTC+1, Thiago Macieira wrote:
On Friday, 15 September 2017 05:34:14 PDT Niall Douglas wrote:
> The algorithm is this for deletion:
>
> 1. Fetch the current path of the open fd about to be unlinked.
>
> 2. Open its containing directory.
>
> 3. Check the directory just opened has a child inode with the same name,
> inode and device as the one we intend to delete. If not, loop to 1.
>
> 4. Call unlinkat() to delete the open fd.

How do you guarantee that the file wasn't replaced between 3 and 4?

As the documentation says (and I mentioned before), you cannot on any system without a race-free unlink and relink API. Only Windows has that at the moment.

The race free guarantee up until the containing directory is still very valuable however, and prevents a wide range of potential data losses due to third party concurrent changes to the file system.
 

> This deletion algorithm is race free up to the containing directory, as the
> documentation states. Anything from the containing directory up to the root
> directory may permute without loss of safety.

Agreed, but that was not what I was thinking of.

What were you thinking of?
 

> > BTW, one more failure mode: you may have files open that refer to files
> > not
> > visible in the filesystem due to a clobbered mount or chroot.
>
> We check for that already.

I think Linux simply returns "/" for those.

Unless disabled explicitly, anything using handle::current_path() for race safety always verifies that the path it returns points to an inode with the same st_ino and st_dev as the open fd. If it differs, it loops the fetch up until any deadline for the i/o expires.

Niall

Thiago Macieira

unread,
Sep 15, 2017, 2:34:01 PM9/15/17
to std-pr...@isocpp.org
On Friday, 15 September 2017 11:02:37 PDT Niall Douglas wrote:
> > Agreed, but that was not what I was thinking of.
>
> What were you thinking of?

The case above, about unlinking a file that was replaced.

> > I think Linux simply returns "/" for those.
>
> Unless disabled explicitly, anything using handle::current_path() for race
> safety always verifies that the path it returns points to an inode with the
> same st_ino and st_dev as the open fd. If it differs, it loops the fetch up
> until any deadline for the i/o expires.

Which is probably why we could convince kernel devs to add an ioctl that
allows us to distinguish dummy names ("/" and "(deleted)") from real ones.

Niall Douglas

unread,
Sep 15, 2017, 4:34:22 PM9/15/17
to ISO C++ Standard - Future Proposals
> > I think Linux simply returns "/" for those.
>
> Unless disabled explicitly, anything using handle::current_path() for race
> safety always verifies that the path it returns points to an inode with the
> same st_ino and st_dev as the open fd. If it differs, it loops the fetch up
> until any deadline for the i/o expires.

Which is probably why we could convince kernel devs to add an ioctl that
allows us to distinguish dummy names ("/" and "(deleted)") from real ones.

Your name submitting any enhancement requests to https://bugzilla.kernel.org/ may have more effect than my name. If you do, I'll gladly CC and write a comment in support.

Here's my enhancement request from 2015 for race free unlink and rename so we have no need in the first place for path fetching: https://bugzilla.kernel.org/show_bug.cgi?id=93441

I suspect it will require someone to write and submit a patch implementing this. I've tried nudging a few Linux kernel engineers I know, but as always, they have a full work queue.

Niall

Thiago Macieira

unread,
Sep 15, 2017, 7:22:45 PM9/15/17
to std-pr...@isocpp.org
On Friday, 15 September 2017 13:34:22 PDT Niall Douglas wrote:
> > Which is probably why we could convince kernel devs to add an ioctl that
> > allows us to distinguish dummy names ("/" and "(deleted)") from real ones.
> >
> Your name submitting any enhancement requests to
> https://bugzilla.kernel.org/ may have more effect than my name. If you do,
> I'll gladly CC and write a comment in support.

From experience: there's no telling whether developers of a particular kernel
subsystem actually read the bugzilla. The best bet is to post to their mailing
list directly, preferably with a patch implementing such a feature.

> Here's my enhancement request from 2015 for race free unlink and rename so
> we have no need in the first place for path
> fetching: https://bugzilla.kernel.org/show_bug.cgi?id=93441

Added a comment there, because I think it can be improved further.

> I suspect it will require someone to write and submit a patch implementing
> this. I've tried nudging a few Linux kernel engineers I know, but as
> always, they have a full work queue.

Right.

olafv...@gmail.com

unread,
Sep 26, 2017, 10:02:42 AM9/26/17
to ISO C++ Standard - Future Proposals
Op woensdag 13 september 2017 00:20:20 UTC+2 schreef Arthur O'Dwyer:
I would have said, "Correct. $HOME is set by someone, but that someone is never the 'sudo' process itself. It won't override the $HOME setting of the current environment (unless you pass -H)."

It doesn't?

$ sudo env
LANG=en_US.UTF-8
TERM=linux
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
MAIL=/var/mail/root
LOGNAME=root
USER=root
USERNAME=root
HOME=/root
SHELL=/bin/bash
SUDO_COMMAND=/usr/bin/env
SUDO_USER=join
SUDO_UID=1000
SUDO_GID=1000

Farid Mehrabi

unread,
Oct 6, 2017, 4:25:54 PM10/6/17
to std-proposals
Home is not the only needed directory; There are a bunch of other user-dependent-paths one might need access to. A current user handle is the true need, which is in-turn supposed to be part of a user+access management library. Then a bunch of path functions, built upon current user access rights could be achieved. Every OS vendor provides its system-dependent library for this. IMHO a std usr-access mgment API makes good title for a discussion.

Regards,
FM.

در تاریخ ۹ سپتامبر ۲۰۱۷ ۴:۰۷ بعدازظهر، <david.b...@gmail.com> نوشت:
There is a root_directory() function to get a root directory, but there is no function to get a home directory.

I would like to propose a new function - home_directory().
Returns a home directory.

It would easy to implement, since it can use environment variables to get path to a home directory:
Unix: HOME
Windows: USERPROFILE (or HOMEDRIVE+HOMEPATH)

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/37a2ebea-dfb0-4306-a6f0-780e790b79ee%40isocpp.org.
Reply all
Reply to author
Forward
0 new messages