I would like to propose a new function - home_directory().
Returns a home directory.
It would easy to implement, since it can use environment variables to get path to a home directory:
Unix: HOME
Windows: USERPROFILE (or HOMEDRIVE+HOMEPATH)
And what happens if the environment has been zopped? This is not uncommon. Untrusted processes are frequently run with a sanitised environment under a user id which has no home.What if the environment has been set by a malicious actor?What happens if someone uses TOCTOU to do timing attacks to swap the home directory for another in the middle of program execution?If your answer is "do nothing", then I see no value add for this facility. The hard part in implementing this, and why it is missing in the Filesystem TS, is surveying all the proprietary mechanisms each OS has for determining a true home directory for any arbitrary process without using environment variables, and from that survey establishing a standards-quality proposal for the semantics which can be implemented by 99% of the OSs out there such that home_directory() always works correctly and safely.The cost to benefit ratio is hard to argue in favour of. AFIO, which may become the File I/O TS one day, doesn't implement it publicly either. Beman rightly placed it out of scope for the Filesystem TS.Niall
That seems an unduly hostile response.
Just because a facility could be abused or subverted in some (I would argue unusual) circumstances doesn't seem a good reason to make it unnecessarily hard.
A common use case would be at the application level. E.g. cache some data in ~/.myappsrprefs when myapp is run by the user for the user. Is that really so dangerous?
To put it another way, why must you use something other than environment variables to find the home directory? and why is it so important that it be the "true" home directory rather than the obvious one as set by the environment?The security implications don't seem that obvious. Perhaps documenting them would be sufficient?
There is a root_directory() function to get a root directory, but there is no function to get a home directory.
I would like to propose a new function - home_directory().
Returns a home directory.It would easy to implement, since it can use environment variables to get path to a home directory:
Unix: HOME
Windows: USERPROFILE (or HOMEDRIVE+HOMEPATH)
That seems an unduly hostile response.I re-read my reply. I saw no evidence of hostility. I did ask a set of motivating questions, and listed some of the good reasons it's not in the standard.
Just because a facility could be abused or subverted in some (I would argue unusual) circumstances doesn't seem a good reason to make it unnecessarily hard.It's not hard to call getenv("HOME") if that's what you really want to do.
A common use case would be at the application level. E.g. cache some data in ~/.myappsrprefs when myapp is run by the user for the user. Is that really so dangerous?Yes. Extremely.To put it another way, why must you use something other than environment variables to find the home directory? and why is it so important that it be the "true" home directory rather than the obvious one as set by the environment?The security implications don't seem that obvious. Perhaps documenting them would be sufficient?Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.
On Sunday, 10 September 2017 00:58:13 UTC+1, Niall Douglas wrote:Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.
Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.Can you point to any links that would help explain this stance?
I can't find anything googling various combinations of "home directory" or "absolute path" with "dangerous" or "attack".
Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.Can you point to any links that would help explain this stance?The problem with the home directory is that it is (a) particularly amenable to TOCTOU attacks as by definition everything has write access to it and (b) when storing settings into it, it is unusually common to update more than one file at once. So this might be common:Open "/home/ned/.config/mystuff/settings.mbx" and append stuff.Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.Looks safe right? It's a very common design pattern. Now think about this:Process 1: Open "/home/ned/.config/mystuff/settings.mbx" and append stuffProcess 2: Rename "/home/ned/.config/mystuff" to "/home/ned/.config/myotherstuff" and rename "/home/ned/.config/meow" to "/home/ned/.config/mystuff"Process 1: Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.You've just corrupted mystuff's config, and lost the user their data.
That seems an unduly hostile response.I re-read my reply. I saw no evidence of hostility. I did ask a set of motivating questions, and listed some of the good reasons it's not in the standard.Sorry. I didn't mean to imply personally hostile. Just hostile to the idea. The security implications may be obvious to you but not to the OP or myself.
That seems an unduly hostile response.I re-read my reply. I saw no evidence of hostility. I did ask a set of motivating questions, and listed some of the good reasons it's not in the standard.Sorry. I didn't mean to imply personally hostile. Just hostile to the idea. The security implications may be obvious to you but not to the OP or myself.Thanks for clarifying.You're right that I hate anything returning an absolute path in the filesystem. I particularly take issue with std::filesystem::temp_directory_path() which claims to be "A directory suitable for temporary files. The path is guaranteed to exist and to be a directory" which is so underspecified as to approach uselessness:
1. Is it guaranteed to be writable? The standard doesn't actually say it must be.
2. Is it guaranteed to always exist? The standard only guarantees it exists at the time of check, not that it continues to exist.
3. Is it guaranteed that stuff written there doesn't vanish while I'm using it?
4. Is it guaranteed that if I write temp file A and then temp file B, that B will be placed alongside A in the same directory instance?
5. Does storage at that path count against the system paging file, or against the user's quota?
6. Does its contents persist across reboots?
7. Can others modify things I place there?8. Can others see things I place there?
9. Is it safe to call from a signal handler and I need to dump crashlogs somewhere guaranteed writable?
And that's just off the top of my head. There are lots more.
I raised these issues with Beman informally at a C++ Now, and he agreed with almost all of it. The problem was that doing something about it would be worse than not doing something about it (or words to that effect).
AFIO does tackle a reasonable number of the more important of the above issues with the temp directory. But then I have the race free API to hand, and Beman did not, so I can do a lot better (Beman specifically excluded the development of a race free file i/o API as out of scope for the Filesystem TS. He said it was too hard, too controversial. I agree on both, the Filesystem TS needed to be shipped now, not in 2025, or later)
I haven't thought about it rigorously, but a similar critique could be applied to any std::filesystem::home_directory_path(). What you'll find is that developers think they want access to $HOME, but if they think a bit harder, what they really want is "What is the Downloads directory for this user?" Or "What is the configuration store for this user so I can store user-specific data?" Or "Where in the filesystem can I create temp files which count towards the user's quota/Where on the filesystem can I create temp files which do NOT count towards the user's quote?" And so on.A decent standards proposal on this topic which I could support would not propose adding "std::filesystem::home_directory_path()", but would rather propose a set of APIs for discovering paths suitable for various common use cases. And without relying on environment variables to deduce any of that (e.g. look up the true home directory from /etc/passwd and getuid(), and from there deduce - with rigorous checking of permissions and retaining st_ino+st_dev unique identifiers for fast later verification - the various common use case paths).Such a proposal I'd very strongly support as delivering significant added value. I think so would everyone else.Niall
On Sunday, September 10, 2017 at 7:57:19 PM UTC-4, Niall Douglas wrote:Anything which works via an absolute path is inherently dangerous, and I would argue strongly in favour of rejection purely based on that alone.Can you point to any links that would help explain this stance?The problem with the home directory is that it is (a) particularly amenable to TOCTOU attacks as by definition everything has write access to it and (b) when storing settings into it, it is unusually common to update more than one file at once. So this might be common:Open "/home/ned/.config/mystuff/settings.mbx" and append stuff.Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.Looks safe right? It's a very common design pattern. Now think about this:Process 1: Open "/home/ned/.config/mystuff/settings.mbx" and append stuffProcess 2: Rename "/home/ned/.config/mystuff" to "/home/ned/.config/myotherstuff" and rename "/home/ned/.config/meow" to "/home/ned/.config/mystuff"Process 1: Open "/home/ned/.config/mystuff/settings.idx" and update index to reflect stuff appended to settings.mbx.You've just corrupted mystuff's config, and lost the user their data.I fail to see how this problem is specific to using the home directory. This can happen when any two processes have concurrent access to the same directory. Which is essentially any two processes. The user can tell process 1 to save to some directory, and process 2 to change the name of a directory. If the save involves manipulating two files, you have an inter-process data race.How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves. This is similar to how we have `filesystem::current_path`, which can set the current path. We don't like modifiable globals, but the current path is a real thing that really exists and real users need to be able to change, so if we don't allow people to set it, people will just use non-portable code to set it.The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.
This is why absolute paths are the spawn of satan and must be avoided in any correct code. They are only safe to use if you only ever touch a single file or directory. Otherwise your code is incorrect.I can't find anything googling various combinations of "home directory" or "absolute path" with "dangerous" or "attack".The inherent raciness of absolute path addressed filing systems was dealt with in 2006 or so by POSIX standardising the Solaris "race free filesystem" API extensions. Those are now available on all major platforms except Windows, though the NT kernel implements them.There sadly remains a big gap between common programmer practice and correct file system usage. Much of it is ignorance, some of it is lack of good library support.Niall
> The motivation for the introduction of this set of interfaces is as > follows:
> * Interfaces taking a path name are limited by the maximum length of > a path name(_SC_PATH_MAX). The absolute path of files can far exceed > this length. The current solution would be to change the working > directory and use relative path names. This is not thread-safe. > > * A second motivation is that files accessed outside the current > working directory are subject to attacks caused by the race condition > created by change any of the elements of the path names used. > > * A third motivation is to allow implementing code which uses a > virtual current working directory for each individual thread. In > the current model there is only one current working directory for > all threads.
How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.
Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.
This is similar to how we have `filesystem::current_path`, which can set the current path. We don't like modifiable globals, but the current path is a real thing that really exists and real users need to be able to change, so if we don't allow people to set it, people will just use non-portable code to set it.
The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.
How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.Nobody is denying anyone access.
What I am saying is that the standard should not encourage obviously suboptimal design patterns when with just a little bit of extra thought, we can enable ideal design patterns.
Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.This is not an argument.
People can write random garbage all over memory all by themselves. It doesn't mean we should encourage it.This is similar to how we have `filesystem::current_path`, which can set the current path. We don't like modifiable globals, but the current path is a real thing that really exists and real users need to be able to change, so if we don't allow people to set it, people will just use non-portable code to set it.That's a bit different again. Being able to set the current path has quite a list of highly beneficial, code improving, consequences. It's not as mostly problematic as the current temp_directory_path() specification.
The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.Assuming the existence of a home directory at all is highly unwise. Daemon code won't have one for example.
The point is that if you do the design right, you don't have to even have the concept of a home directory at all.
Need somewhere to write settings? Ask for exactly that.
You're right that I hate anything returning an absolute path in the filesystem. I particularly take issue with std::filesystem::temp_directory_path() which claims to be "A directory suitable for temporary files. The path is guaranteed to exist and to be a directory" which is so underspecified as to approach uselessness:That's an interesting list but I think you are mixing things here. People usually mean something very specific when you say a temporary directory.
I would not expect it to be persistant for example. I immediately think of Linux FHS /tmp.Neither would I want to rely on it being wiped. I think of /var/run for that.
1. Is it guaranteed to be writable? The standard doesn't actually say it must be.I would say that is a defect. It must be writeable or it is useless.
2. Is it guaranteed to always exist? The standard only guarantees it exists at the time of check, not that it continues to exist.3. Is it guaranteed that stuff written there doesn't vanish while I'm using it?I guess these are arguments for returning an open file descriptor rather than a path.There are still uses for the path though. A configuration dialog might have a slot for the path that should be used and populate that with the default.
I suppose you could encourage using handles first if you had a way of saying handle::get_path() which is the inverse ofwhat is normally available.
4. Is it guaranteed that if I write temp file A and then temp file B, that B will be placed alongside A in the same directory instance?Do you need that guarantee from the "temp" API?If you need it shouldn't you be creating a directory yourself inside the path returned? Its arguably neater that way anyhow.
5. Does storage at that path count against the system paging file, or against the user's quota?
So you mean is it in ram or on a persistent file-system
6. Does its contents persist across reboots?Does this need to be separate from the above? You can have a persistent file-system that is wiped on boot which isn't in ram.I guess that could be important to know for security reasons.
7. Can others modify things I place there?8. Can others see things I place there?Isn't this an argument for using a directory relative to the users home directory versus a shared temporary directory?The interesting case is a private temporary directory that is not persistent.
9. Is it safe to call from a signal handler and I need to dump crashlogs somewhere guaranteed writable?You can never have that. If the file-system is full all bets are off.
I sounds like what we want is APIs where you can specify the required properties of the 'special' directories. That does sound generally useful.Its probably must easier for temp than for home.
Can't we do that via a layered approach? v1 builds on imperfect APIs like home_directory(), temporary_files_directory() and adds local knowledge FHS and XDG on Linux and theequivalent for other OSs.
So presumably we need an equivalent or an extension that can use a file descriptor + relative path?
To be truly safe wouldn't you have to have an open file descriptor for every path element you might re-use, not just the root element?
Is the filesystem TS expected to use openat() et al anywhere under the hood at present?
On 2017-09-09 19:12, torto...@gmail.com wrote:
> A common use case would be at the application level. E.g. cache some data
> in ~/.myappsrprefs when myapp is run by the user for the user. Is that
> really so dangerous?
There is an XDG spec for this sort of thing. Should C++ adopt that as well?
Maybe it would be good to have a modern (i.e. written atop the FS API),
light-weight, cross-platform XDG library. Write that first, then if it
seems to work out, possibly propose it for standardization.
> On Saturday, 9 September 2017 23:57:31 UTC+1, Niall Douglas wrote:
>> And what happens if the environment has been zopped? [...]
>> Untrusted processes are frequently run with a sanitised environment
>> under a user id which has no home.
...then the application is doomed. (Well, the API should return an
invalid path, because there is nothing else it can do, and the
application needs to be able to deal with that. An application that
wants to use the home directory in such instance is no worse off for
having a standard API to query the home directory.)
>> What happens if someone uses TOCTOU to do timing attacks to swap the home
>> directory for another in the middle of program execution?
What happens *today*? Having a standard API to do something that
*programs already do* does not introduce any *new* security issues.
Now... you did say *rely on*. The API certainly ought to work if the
environment variables aren't set, but it should respect them if they are...
On Monday, September 11, 2017 at 10:25:48 AM UTC-4, Niall Douglas wrote:How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.Nobody is denying anyone access.When you say that the standard shouldn't let people have a cross-platform way to access a cross-platform concept, that's denying access.
What I am saying is that the standard should not encourage obviously suboptimal design patterns when with just a little bit of extra thought, we can enable ideal design patterns.What you've said does not match your argument. Your overall arguments have been:1: People may use the home directory incorrectly.2: The home directory is globally accessible and therefore shouldn't be used for things.
And the closest thing to a better design you've suggested is to provide a plethora of directories, none of them explicitly called "home". That doesn't solve either of those problems, since the user could still use it for the wrong thing and the directories can still be manipulated by the user.If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".
Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.This is not an argument.Um, yes it is. Standards are supposed to standardize existing practice. Home directories are existing practice. As such, they are legitimate candidates for standardization. There is a genuine need for home directory access, and if we don't provide it, someone else will.I get that you don't like the home directory. But it is existing practice. And that makes it viable for standardization, despite your dislike.
The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.Assuming the existence of a home directory at all is highly unwise. Daemon code won't have one for example.Then make the function return an `optional<path>`, for cases where such a path does not exist. This is not a good enough reason to say no to the whole concept of getting a home directory.
On Sunday, 10 September 2017 19:24:58 CDT Niall Douglas wrote:
> 1. Is it guaranteed to be *writable*? The standard doesn't actually say it
> must be.
> 2. Is it guaranteed to *always* exist? The standard only guarantees it
> exists at the time of check, not that it continues to exist.
> 3. Is it guaranteed that stuff written there doesn't vanish while I'm using
> it?
> 4. Is it guaranteed that if I write temp file A and then temp file B, that
> B will be placed alongside A in the same directory instance?
> 5. Does storage at that path count against the system paging file, or
> against the user's quota?
> 6. Does its contents persist across reboots?
> 7. Can others modify things I place there?
> 8. Can others see things I place there?
> 9. Is it safe to call from a signal handler and I need to dump crashlogs
> somewhere guaranteed writable?
10. Is it capable of having Unix sockets or FIFOs created there?
11. Is it capable of reducing permissions on individual files and directories
from world-readable and world-executable?
On Monday, September 11, 2017 at 4:26:51 PM UTC+1, Nicol Bolas wrote:On Monday, September 11, 2017 at 10:25:48 AM UTC-4, Niall Douglas wrote:How does having a home directory function make code more prone to this? After all, in the above example, the user is the one who explicitly provided the paths.So long as two processes can access the same directories, what you are talking about can happen. It's not particular to home directory usage, so I fail to see why that's a reason to deny people access to that directory.Nobody is denying anyone access.When you say that the standard shouldn't let people have a cross-platform way to access a cross-platform concept, that's denying access.No, it isn't. This is not the first time you've put words in my mouth and claimed I said things I did not.
What I am saying is that the standard should not encourage obviously suboptimal design patterns when with just a little bit of extra thought, we can enable ideal design patterns.What you've said does not match your argument. Your overall arguments have been:1: People may use the home directory incorrectly.2: The home directory is globally accessible and therefore shouldn't be used for things.Where did I ever say it shouldn't be used for things?
I said it ought to be used correctly. That's very different.
And the closest thing to a better design you've suggested is to provide a plethora of directories, none of them explicitly called "home". That doesn't solve either of those problems, since the user could still use it for the wrong thing and the directories can still be manipulated by the user.If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".There is no such thing as home.
Lastly, let us not forget that the "home directory" is a real thing that exists on various systems. Users expect applications to be able to know where it is. So if we don't let applications ask where it is, they will use platform-specific code to do it themselves.This is not an argument.Um, yes it is. Standards are supposed to standardize existing practice. Home directories are existing practice. As such, they are legitimate candidates for standardization. There is a genuine need for home directory access, and if we don't provide it, someone else will.I get that you don't like the home directory. But it is existing practice. And that makes it viable for standardization, despite your dislike.Home directories are not standard practice. What is standard practice is some place on the filesystem where the user running the login process which launched the running process is permitted to read and write. Which could be /tmp, which is a legal $HOME value.That's far short of what you're claiming. Specifically that the home directory is a property of the logged in user, and NOT that of the user running some process as you appear to think.
If you don't believe me, go look it up in the POSIX spec. And be aware Windows follows the exact same principle as well. If you are temporarily acting as a different user, you may, or may not, see "home" differently depending on a wide range of factors impossible to standardise.
This is part of why $HOME cannot be trusted. Or any environment variable.The fact that such a directory may encourage the creation of configuration data in global space in ways that are potentially breakable is not sufficient reason to tell people that they shouldn't have access to such a directory.Assuming the existence of a home directory at all is highly unwise. Daemon code won't have one for example.Then make the function return an `optional<path>`, for cases where such a path does not exist. This is not a good enough reason to say no to the whole concept of getting a home directory.The concept of there being a "home" directory is severely flawed. You cannot get what does not exist.Instead ask for a path with a specified list of desired properties.
On Monday, September 11, 2017 at 4:26:51 PM UTC+1, Nicol Bolas wrote:If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".There is no such thing as home.
If my understanding of your argument is wrong, what is your argument? Why do you think this is a bad idea? Lay out the whole thing in a single post. You've provided what seems to be several different lines of argument thus far (security concerns, absolute paths being bad, etc), and you seem to think that my summary doesn't adequately describe them.So please do so.
To me, that reads "People should not use this." Maybe you meant it as something less firm, but when you start throwing terms around like "spawn of satan", it's hard not to get the impression that you think it's a good idea to ever use such a feature.
There is no such thing as home.I'm not a filesystem expert at all. But other people seem to disagree with you.
OK, what set of "desired properties" will give me the home directory? If there isn't a set of properties that would give that to me, then that is a poorer API than just asking for the home directory.
I've seen APIs like you're describing before. They remind me of the usage hints in `glBufferData`. You provide a general idea of how you're going to use the memory, and the implementation would decide where to allocate that storage from.This idea turned out so bad that OpenGL implementations would in many cases flat-out ignore the usage hints altogether.
If you want to say that we should provide access to a number of "standard" directories, that's fine. But thus far, none of the arguments you've presented justifies not calling one of those standard directories "home".There is no such thing as home.You really missed an opportunity to day "there is no place like home" there. :)
I would use functionality like that. I’ve already had to roll my own, and it isn’t pretty:
https://github.com/HowardHinnant/date/blob/master/tz.cpp#L167-L237That's pretty par for the course. I've seen, and written, far less pretty implementations in the past.Don't get me wrong, I'd just love a decent proposal implementing this too. But as I mentioned, you'd need to start with a definitive survey of how each platform implements its particular race-free and secure mechanism for retrieving these paths,
If my understanding of your argument is wrong, what is your argument? Why do you think this is a bad idea? Lay out the whole thing in a single post. You've provided what seems to be several different lines of argument thus far (security concerns, absolute paths being bad, etc), and you seem to think that my summary doesn't adequately describe them.So please do so.The reason there are multiple strands of argument involved is because this thread has split into such. Most appear to understand that has happened and don't find it a problem. But to spell it out:1. There is the "big picture" argument regarding good engineering (use absolute paths as sparingly as possible, fds to mark filesystem locations wherever possible) vs what's currently forced on us by lack of API support.2. There is the "no such thing as home" argument where people have incorrect beliefs in what a "home directory" means, belongs to, and is, and far overestimate how common practice it is given all the embedded C++ devices out there. Moreover, historically root processes saw '/' as their "home" directory, and you can't not return '/' if a root process asks for its "home". Which almost certainly will result in very bad, unexpected, consequences.
Most currently written code using the filesystem is incorrect.Whether people have the choice to write actually correct code is the central problem. Right now you cannot say: create file A and guaranteed sibling file B and both files must exist as siblings or neither. That forces people to write power loss recovery code which is usually buggy, certainly insufficiently tested. Most don't bother, and just allow incorrectness and corruption of persistent state to occur. And when that becomes a problem, they just fire everything into SQLite and call it "solved".I aim to do something about that eventually, but Rome wasn't built in a day. I only properly work on this stuff when I'm out of contract. It's slow going.
OK, what set of "desired properties" will give me the home directory? If there isn't a set of properties that would give that to me, then that is a poorer API than just asking for the home directory.A list of properties has been given already. I agree with Thiago that Qt is well ahead of everyone on the same topic, and starting from what they've done looks to be a great idea.
Path discovery, if implemented by runtime probing, doesn't suffer from this problem of staleness by design. If I ask for a path where I can create fifos which will be visible to other processes running under the same user, the implementation will go off and try to create a fifo with user rw privs in a sequence of path locations until it finds one which works. It then returns that path, and you know for a guaranteed fact that that path will let you create fifos which will be visible to other processes running under the same user.
2. There is the "no such thing as home" argument where people have incorrect beliefs in what a "home directory" means, belongs to, and is, and far overestimate how common practice it is given all the embedded C++ devices out there. Moreover, historically root processes saw '/' as their "home" directory, and you can't not return '/' if a root process asks for its "home". Which almost certainly will result in very bad, unexpected, consequences.I don't think the consequences of expanding "~root" to root's home directory are bad or unexpected at all — but notice that you got root's default home directory slightly wrong; it's "/root", not "/", at least on modern Linux boxes.
The "obvious" next question is, what happens when you run "vim" via "sudoedit", which lets you masquerade as root? Well, I'm no expert, but my understanding is that "sudo" by default does not change the current value of the environment variable HOME. So when "vim" goes to look in "$HOME/.vimrc", it finds your config; and "vim" will never look in "~root/.vimrc" at all. That is, "vim" seems to consider "the home directory" to be exactly synonymous with "the current value of $HOME".
None of this applies to Windows, but it would not surprise me if the situation is almost as simple there.
However, that's orthogonal to the ability of the program to access its external environment, which is global by definition. I continue to see no problem with allowing the program to get the name of the home directory; that seems to be Step 1 of any workflow, even if you do everything "securely" from then on.(Here and throughout, "fs::" means "std::filesystem::" and "fs2::"/"std2::" means things that aren't in C++17 but could be proposed.)fs::path home = fs2::get_home_directory_path();fs2::working_directory_entry home_wd(home); // basically an fs::directory_entry plus an open file descriptorif (!home_wd.exists())throw std::runtime_error("Home directory could not be opened because it disappeared!");std::cout << "Successfully opened " << home.string() << std::endl;fs2::working_directory_entry git_wd = home_wd.create_directory(".git");std2::ifstream config_stream(git_wd, "config");if (config_stream.is_open())config_stream >> config;OTOH, I guess now that I've proposed this "working_directory" abstraction, I see no problem with combining the first two steps of that workflow.fs2::working_directory_entry home_wd = fs2::home_working_directory();if (!home_wd.exists())throw std::runtime_error("Home directory does not exist!");std::cout << "Successfully opened " << home_wd.path().string() << std::endl;fs2::working_directory_entry git_wd = home_wd.create_directory(".git");std2::ifstream config_stream(git_wd, "config");if (config_stream.is_open())config_stream >> config;If we ever get something like "fs2::working_directory", then it would make sense to provide "fs2::current_working_directory()" in addition to "fs::current_path()", and "fs2::home_working_directory()" in addition to "fs2::home_path()".TLDR: I applaud your openat()-related opinions, but they are completely unrelated to $HOME.
Path discovery, if implemented by runtime probing, doesn't suffer from this problem of staleness by design. If I ask for a path where I can create fifos which will be visible to other processes running under the same user, the implementation will go off and try to create a fifo with user rw privs in a sequence of path locations until it finds one which works. It then returns that path, and you know for a guaranteed fact that that path will let you create fifos which will be visible to other processes running under the same user.I would hate to use a library function that "will go off and try to create [files] in a sequence of path locations until it finds one which works." That approach suffers from the same undocumentability problem as the current fs::temp_directory_path() — there's no way to document its file-creation behavior except "i dunno lol, wherever it happened to like today."
There is immense value in being able to say, definitively, that "the Foo server creates its pidfile in /var/run/foo.pid".
2. There is the "no such thing as home" argument where people have incorrect beliefs in what a "home directory" means, belongs to, and is, and far overestimate how common practice it is given all the embedded C++ devices out there. Moreover, historically root processes saw '/' as their "home" directory, and you can't not return '/' if a root process asks for its "home". Which almost certainly will result in very bad, unexpected, consequences.I don't think the consequences of expanding "~root" to root's home directory are bad or unexpected at all — but notice that you got root's default home directory slightly wrong; it's "/root", not "/", at least on modern Linux boxes.That's purely a choice of common Linux distros (and a very wise one).I'm just about old enough to remember when root's home directory was '/'. And that's still legal under POSIX.
The "obvious" next question is, what happens when you run "vim" via "sudoedit", which lets you masquerade as root? Well, I'm no expert, but my understanding is that "sudo" by default does not change the current value of the environment variable HOME. So when "vim" goes to look in "$HOME/.vimrc", it finds your config; and "vim" will never look in "~root/.vimrc" at all. That is, "vim" seems to consider "the home directory" to be exactly synonymous with "the current value of $HOME".Correct. $HOME is set by the initial login process.Code needs to know if it's writing root-only accessible files into an unrelated user's home directory. This is part of why exposing $HOME is unsafe. Expose "Place where I can write the logged in user's space" and "Place where I can write the current user's space" instead.
None of this applies to Windows, but it would not surprise me if the situation is almost as simple there.That's exactly the API we need to have instead. You ask for the home directory for a specific user, and Python goes and parses the passwd file and gives you the correct, untainted, path.I have no problem with that API. In fact, that's what I want.
TLDR: I applaud your openat()-related opinions, but they are completely unrelated to $HOME.We're back into discussing stuff not possible with the current C++ standard, but you're assuming that the home directory is not renamed in between getting the path to it and using it. This is why all absolute paths are dangerous. If you're ever using an absolute path for more than one filesystem entry, your code is almost certainly incorrect.
So presumably we need an equivalent or an extension that can use a file descriptor + relative path?
I'm just about old enough to remember when root's home directory was '/'. And that's still legal under POSIX.Sure it's legal (I assume), but if everyone agrees that it would be unwise to do in practice, and common distros don't do it, and future distros won't do it, then we don't need to discuss it.
vim doesn't need to know if it's writing root-only-accessible files into some unrelated user's home directory. vim doesn't need to care. It just needs to have a well-supported, simple, stable way to open "the current home directory", so that it can document that that's what it does; and then if the user does something stupid like "sudoedit /MyPhotos/A, take myself out of sudoers, try to open /MyPhotos/A", well, it'll be obvious what the user did wrong and how they should fix it.I find your examples fairly innocuous. They all seem to be of the form "A user with sudo privileges can mildly bork things up for themselves."Besides, normal applications shouldn't be running as root in the first place, and normal users shouldn't be sudo'ing most commands — I'd consider these vastly more fundamental security principles than anything about race conditions — and so I also find your "root"-based examples fairly contrived unless I'm misunderstanding something.
None of this applies to Windows, but it would not surprise me if the situation is almost as simple there.That's exactly the API we need to have instead. You ask for the home directory for a specific user, and Python goes and parses the passwd file and gives you the correct, untainted, path.I have no problem with that API. In fact, that's what I want.I agree that expanduser() is a better API than merely home_directory_path(), because it can generate "~bob" in addition to just "~".IIRC there's a FreeBSD library function to do tilde-expansion like this, but I've forgotten its name and can't figure out the right search terms to find it again. :PBut. When you ask Python to expand a lone "~", the first thing it does is look for the current setting of $HOME and use that. Only if that first approach fails, will it look up the current user (i.e. root, if you're sudo'ed) and then call getpwuid() to fetch that user's home directory.
(And then, although this is orthogonal, we'd have two race conditions. You'd have to guard not only against "malicious concurrent moving-around of directory structure re the home path" but also "malicious concurrent calls to usermod -d (change home directory)" and "malicious concurrent calls to usermod -l (change username)". The fundamental issue here is that both POSIX paths and POSIX usernames are stringly typed; just because someone told me a few milliseconds ago that his identifier was "bob" doesn't mean that that's the correct identifier for him right now.)
But, consider again some user-facing application that is documented to "store its config in /foo/bar". And let's say that while this application is running, somebody maliciously runs "mv /foo/bar /foo/baz && mkdir /foo/bar" on the filesystem. The application has three options:(1) Detect the malicious action and abort with an error. Let's assume aborting is not desirable.(2) "Use absolute paths": the next time it needs to open a file in the config directory, it looks in the new /foo/bar and finds no config. This will likely result in a problem for the user. However, the user's problem is super easy to explain. "Doctor, it hurts when I do this." "Don't do that, then."(3) "Use relative paths": the next time it needs to open a file in the config directory, it uses openat() relative to the handle it kept to /foo/baz (née /foo/bar). This will result in new files getting created in /foo/baz, contrary to the documentation's promise. Furthermore, this will result in a problem for the user, if the reason the user cleared out the old directory is that it was getting too full, and/or to atomically create a backup of the existing files. (See also: log rotation.)Either (2) or (3) could be the "right behavior" in practice. Either (2) or (3) could be the "wrong behavior" in practice.I would much rather debug a user complaint about (2), though.
Related but not really related — I've seen in practice where a disk filled up due to a runaway server process logging into a file that had been deleted months before. The server process kept its file handle open and was not periodically re-checking the original path. It took altogether too long for us to figure out that all that disk usage was coming from a file that wasn't even in the filesystem anymore!
So presumably we need an equivalent or an extension that can use a file descriptor + relative path?I saw various handle types but I didn't spot anything with a basic structure of:class better_path{path_handle root;std::string root_path;std::string leaf_path;};
or
class better_path
{
struct path_element
{
// one element may be invalid, one must be valid
native_file_handle handle;
std::string name;
};
std::vector<path_element> path_impl;
};
You might build this on top of AFIO but it doesn't seem to have it yet.
Ideally you'd be able to use this both where a string is appropriate and where you need to be race-free.
Yes of course, this does potentially leak string based paths which wouldn't be race free but I'm not sure you can avoid that completely.
You just need documentation to educate users.
An issue is that adding a path element might be a purely string operation or an openat(). You need both.
If feels on much shakier ground that having separate purely string based and purely handle based representations and leaving it to programmers to be safe
but it could work.
So presumably we need an equivalent or an extension that can use a file descriptor + relative path?
You might build this on top of AFIO but it doesn't seem to have it yet.
Heh.It's actually staring you in the face:afio::path_handle refers to some inode on the filesystem which could have any path different to when it was constructed.afio::path_view is a borrowed reference to a string/path stored elsewhere.Just combine the two to create a race free reference to a path fragment relative to wherever the path_handle is currently located. All the APIs in AFIO accept path_handle + path_view, in fact, it's the only thing they accept.
AFIO never stores a path, too expensive (malloc), too racy. But you realise right that you can ask any open handle for its current path right? It's afio::handle::current_path().
//! \brief Contains functions used to discover suitable paths for things
namespace path_discovery
{
//! \brief A discovered path.
struct discovered_path
{
path_view path; //!< The path discovered.
//! Source of the discovered path.
enum class source_type
{
hardcoded, //!< This path came from an internal hardcoded list of paths likely for this system.
system, //!< This path came from querying the system.
environment, //!< This path came from an environment variable (an override?).
local //!< This path was added locally.
} source;
/*! If this path was successfully probed for criteria verification, this was its stat after any symlink
derefencing at that time. Secure applications ought to verify that any handles opened to the path have
the same `st_ino` and `st_dev` as this structure before use.
*/
optional<stat_t> stat;
};
/*! \brief Returns a list of potential directories which might be usuable for temporary files.
This is a fairly lightweight call which builds a master list of all potential temporary file directories
given the environment block of this process (unless SUID or SGID or Privilege Elevation are in effect) and the user
running this process. It does not verify if any of them exist, or are writable, or anything else about them.
An internal mutex is held for the duration of this call.
\mallocs Allocates the master list of discovered temporary directories exactly once per process,
unless `refresh` is true in which case the list will be refreshed. The system calls to retrieve paths
may allocate additional memory for paths returned.
\errors This call never fails, except to return an empty span.
*/
AFIO_HEADERS_ONLY_FUNC_SPEC span<discovered_path> all_temporary_directories(bool refresh = false) noexcept;
/*! \brief Returns a subset of `all_temporary_directories()` each of which has been tested to be writable
by the current process. No testing is done of available writable space.
After this call returns, the successfully probed entries returned by `all_temporary_directories()` will have their
stat structure set. As the probing involves creating a non-zero sized file in each possible temporary
directory to verify its validity, this is not a fast call. It is however cached statically, so the
cost occurs exactly once per process, unless someone calls `all_temporary_directories(true)` to wipe and refresh
the master list. An internal mutex is held for the duration of this call.
\mallocs None.
\error This call never fails, though if it fails to find any writable temporary directory, it will
terminate the process.
*/
AFIO_HEADERS_ONLY_FUNC_SPEC span<discovered_path> verified_temporary_directories() noexcept;
/*! \brief Returns a reference to an open handle to a verified temporary directory where files created are
stored in a filesystem directory, usually under the current user's quota.
This is implemented by iterating all of the paths returned by `verified_temporary_directories()`
and checking what file system is in use. The following regex is used:
`btrfs|cifs|exfat|ext?|f2fs|hfs|jfs|nfs|nilf2|ufs|vfat|xfs|zfs|msdosfs|newnfs|ntfs|smbfs|unionfs|fat|fat32`
The handle is created during `verified_temporary_directories()` and is statically cached thereafter.
*/
AFIO_HEADERS_ONLY_FUNC_SPEC const path_handle &storage_backed_temporary_files_directory() noexcept;
/*! \brief Returns a reference to an open handle to a verified temporary directory where files created are
stored in memory/paging file, and thus access may be a lot quicker, but stronger limits on
capacity may apply.
This is implemented by iterating all of the paths returned by `verified_temporary_directories()`
and checking what file system is in use. The following regex is used:
`tmpfs|ramfs`
The handle is created during `verified_temporary_directories()` and is statically cached thereafter.
\note If you wish to create an anonymous memory-backed inode for mmap and paging tricks like mapping
the same extent into multiple addresses e.g. to implement a constant time zero copy `realloc()`,
strongly consider using a non-file-backed `section_handle` as this is more portable.
*/
AFIO_HEADERS_ONLY_FUNC_SPEC const path_handle &memory_backed_temporary_files_directory() noexcept;
}
On Wednesday, 13 September 2017 05:59:17 PDT Niall Douglas wrote:
> AFIO never stores a path, too expensive (malloc), too racy. But you realise
> right that you can ask any open handle for its current path right? It's
> afio::handle::current_path().
And you do realise that /proc does not have to be mounted on Linux, right? On
a container environment, it might not have been set up.
Good catch. Is there another means of converting an FD to a path from within a (Linux) container?I see that AFIO has (for current_path) #ifdefed one strategy per OS but there are presumably more strategies possible for each OS and more OSs to consider.
On Wednesday, 13 September 2017 05:59:17 PDT Niall Douglas wrote:
> AFIO never stores a path, too expensive (malloc), too racy. But you realise
> right that you can ask any open handle for its current path right? It's
> afio::handle::current_path().
And you do realise that /proc does not have to be mounted on Linux, right? [...]Good catch. Is there another means of converting an FD to a path from within a (Linux) container? [...]
According to the lsof sources, the sole method on Linux for reading the current path of an open fd is via /proc.
You guys seem to have more understanding of your platforms than I do, but for what it's worth, my current understanding is like this:- a POSIX file descriptor (fd) might or might not refer to a file in the filesystem; it might equally well refer to a network socket
- even if the fd does refer to a file (that is, an inode), an inode does not have a "path"; it does not even have a "filename" (the leafiest part of a path). Instead, an inode holds metadata about the file, such as its last-modified date and how to find the blocks holding its bytewise content
- The filesystem itself might have a unique path that refers to that inode; or there might be several such paths (hardlinks); or there might be zero such paths (if the file has been removed from the filesystem but is still open, as Niall and I briefly mentioned as a dev-ops nightmare elsewhere in this thread).
So if you are trying to discover "the path of" an inode, you are already in a state of sin, and the platform is not obliged to offer you any help whatsoever, AFAIK. If you want to deal with paths-and-inodes together as pairs, you need to track them together in pairs — which is exactly what std::filesystem::directory_entry does.
> In which case a ton of stuff stops working. That's not a showstopper, just
> heavily reduced functionality.
Or, drop the method. Provide it as a separate, optional functionality that may
be absent or fail unpredictably. People should keep the name if they need to,
or use the possibly-failing API that gets the current name.
From reading your code and your FreeBSD report, it looks like it has problems
with hardlinks too, at least on Darwin.
> > Good catch. Is there another means of converting an FD to a path from
> > within a (Linux) container?
> > I see that AFIO has (for current_path) #ifdefed one strategy per OS but
> > there are presumably more strategies possible for each OS and more OSs to
> > consider.
>
> According to the lsof sources, the sole method on Linux for reading the
> current path of an open fd is via /proc.
Yep, but it wouldn't be too hard to have the kernel add an ioctl(2) for it, so
that it did not require /proc. Plus, it could also let you know whether the
file was deleted, as opposed to a file whose name ends in " (deleted)".
On Thursday, 14 September 2017 19:32:47 PDT Niall Douglas wrote:
> Open fd path tracking works well on all major OS kernels, sometimes with
> quirks workarounds.
Though it does, it's not really an intentional feature. The link name in /proc
on Linux is more for debugging and user purposes than for the retrieving the
name like you're doing.
I like that you can do that and that allows you to
provde an API for it, but it's not a reliable feature and you're very likely
going to find that the great majority of OSes don't have that. To name a few
that you don't have code for:
- OpenBSD
- NetBSD
- DragonflyBSD
- Solaris
- AIX
- QNX
- VxWorks
- INTEGRITY
> The bit you might be missing is that without it you cannot know if a third
> party has moved something you are using, and thus you are about to
> rename/unlink a different inode to the one the user asked for i.e. data
> loss. This is why we never store and always retrieve paths, storing them is
> inherently racy.
But it's also racy if you retrieve the path, since a modification can happen
right after that. There's no race-free rename/move syscall.
BTW, one more failure mode: you may have files open that refer to files not
visible in the filesystem due to a clobbered mount or chroot.
On Friday, 15 September 2017 05:34:14 PDT Niall Douglas wrote:
> The algorithm is this for deletion:
>
> 1. Fetch the current path of the open fd about to be unlinked.
>
> 2. Open its containing directory.
>
> 3. Check the directory just opened has a child inode with the same name,
> inode and device as the one we intend to delete. If not, loop to 1.
>
> 4. Call unlinkat() to delete the open fd.
How do you guarantee that the file wasn't replaced between 3 and 4?
> This deletion algorithm is race free up to the containing directory, as the
> documentation states. Anything from the containing directory up to the root
> directory may permute without loss of safety.
Agreed, but that was not what I was thinking of.
> > BTW, one more failure mode: you may have files open that refer to files
> > not
> > visible in the filesystem due to a clobbered mount or chroot.
>
> We check for that already.
I think Linux simply returns "/" for those.
> > I think Linux simply returns "/" for those.
>
> Unless disabled explicitly, anything using handle::current_path() for race
> safety always verifies that the path it returns points to an inode with the
> same st_ino and st_dev as the open fd. If it differs, it loops the fetch up
> until any deadline for the i/o expires.
Which is probably why we could convince kernel devs to add an ioctl that
allows us to distinguish dummy names ("/" and "(deleted)") from real ones.
I would have said, "Correct. $HOME is set by someone, but that someone is never the 'sudo' process itself. It won't override the $HOME setting of the current environment (unless you pass -H)."
There is a root_directory() function to get a root directory, but there is no function to get a home directory.
I would like to propose a new function - home_directory().
Returns a home directory.
It would easy to implement, since it can use environment variables to get path to a home directory:
Unix: HOME
Windows: USERPROFILE (or HOMEDRIVE+HOMEPATH)
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/37a2ebea-dfb0-4306-a6f0-780e790b79ee%40isocpp.org.