On 2015-05-29, Kenny McCormack <
gaz...@shell.xmission.com> wrote:
> I've discovered that, on Linux, if I (try to) "cp" a new version of an
> executable over an existing, running one (say, /usr/bin/xyz), I
> sometimes/usually get error "text file busy" and the file copy fails. This
> is normal and expected.
>
> But, I've found that I can just remove ("rm") the file and that works
> (IMLE, of course...). And, of course, once it is removed, you can copy
> over a new version.
>
> So my questions are:
>
> 1) Why does it let me remove it, but not copy over it? Note that
> other, more contentious OSes, wouldn't never let you remove it if
> it is open or "in-use".
In Unix, objects in the file system have a persistent reference count. An
object that appears under two different names in the same directory, or in
multiple dirctories, has a reference count of two. An object which is removed
from the directory structure has a reference count of zero.
There is an additional reference count: an in-memory reference count
which tracks how many descriptors are open on an object.
An object is not recycled (i.e. keeps existing) until both reference counts go
to zero: it has to be removed from the directory structure and all file
descriptors have to be closed. On the "last close", the object becomes
garbage.
If a file is deleted, but open, and the OS crashes, then you have a situation
of an object that doesn't exist in the directory structure, but has not been
reclaimed properly (continues to occupy storage). One of the jobs of a "fsck"
program is to find these and release them properly.
> 2) What happens to the running program if/when it finds it needs to
> access its "text" and the file has been removed?
Unless there is some special mechanism for a process to gain access to its
own executable, it is screwed.
For instance, if we look at Linux, /proc/self/exe is just a symlink,
so that won't help.
Programs don't usually need access to their executable. Sometimes they do
though; for instance executables created by interpreted languages might have
an "image" appended to a generic executable part. Usually, that is accessed
at start up. Another use case is programs accessing their own debug symbols.
> Note that if the file is nfs-mounted, then when you remove it but somebody
> still has it open, the file gets renamed something like .nfs123456789 and
> the system automagically takes care of getting rid of this temporary file
> when it can. But the case I am talking about here (in this thread) is when
> nfs isn't involved.
When NFS is involved, these explicit objects have to be created. Why?
Because the in-memory reference counts are not sufficient for NFS.
NFS is supposed to be "stateless", in that you can reboot the NFS server,
and things continue working from the clients' perspectives (when the server
comes back up). These explicit .nfs123456789 files can survive a reboot, which
then helps the client achieve the usual Unix semantics that the file remains
accessible while it is open.