Guanglei <
ligua...@gmail.com> writes:
> My app will create a unix socket file on startup, and it will
> remove it on exit. But there are some abnormal situations where
> the cleanup code can't be called, such as killed by SIGKILL.
> For normal files, what we usually do is to call unlink() right
> after open(), so that the system could help to clean that file
> when there is no reference to it. But this won't work for unix
> socket file, since unlink() will remove the socket filename
> immediately.
It also won't work for regular files (this being the proper term)
because the create-and-unlink sequence is not atomic: The program
could be killed after it created the file and before it was unlinked.
There is no real solution to this problem because even if the kernel
was modified to unlink the socket directory entry automatically on
exit (or something like the Linux-specific 'abstract AF_UNIX
namespace' was being used), there still a time window where a process
which has not yet fully terminated has already terminally stopped
processing data sent to the socket while a process in the process of
being started can't recreate the socket (name) because it hasn't yet
been removed.
I usually just do an unlink(2) before calling bind(2) during process
startup - that's going to work for the case which interests me most
(automatic restart of a 'crashed' process) without messing up any
coredumps possibly needed for debugging - and leave it to 'operator
invention' to sort out the case of multiple 'live' instances of the
program running concurrently.