1) common case
process A process B
main()
pidfile_open() -> success
perform_initialization()
daemon()
pidfile_write() -> success
perform_work() main()
pidfile_open() -> EEXIST
exit()
2) very unlikely but still possible case
process A process B
main()
pidfile_open() -> success main()
perform_initialization() pidfile_open() -> EAGAIN
daemon() perform_initialization()
pidfile_write() -> success daemon()
perform_work() perform_work()
The problem is that most of them (at least the ones I checked) ignore a
pidfile_open() failure unless errno == EEXIST.
How do we fix this? My suggestion is to loop until pidfile_open()
succeeds or errno != EAGAIN. Does anyone have any objections to that
approach?
DES
--
Dag-Erling Smørgrav - d...@des.no
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"
I think we already do that internally in pidfile_open(). Can you take a look at
the source and confirm that this is what you mean?
--
Pawel Jakub Dawidek http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com
No, it doesn't; pidfile_open(3) returns NULL with errno == EAGAIN if the
pidfile is locked but empty, as is the case in the window between a
successful pidfile_open(3) and the first pidfile_write(3). This is
documented in the man page:
[EAGAIN] Some process already holds the lock on the given pid‐
file, but the file is truncated. Most likely, the
existing daemon is writing new PID into the file.
I have a patch that adds a pidfile to dhclient(8), where I do this:
for (;;) {
pidfile = pidfile_open(path_dhclient_pidfile, 0600, &otherpid);
if (pidfile != NULL || errno != EAGAIN)
break;
sleep(1);
}
if (pidfile == NULL) {
if (errno == EEXIST)
error("dhclient already running, pid: %d.", otherpid);
warning("Cannot open or create pidfile: %m");
}
I'm not sure I agree with the common idiom (which I copied here) of
ignoring all other errors than EEXIST, but that's a different story.
You are also ignoring the return value of sleep(1), which would tell
you if the call was interrupted by a signal handler. This can be fine
for dhclient(8) but other utilities might require some guards against
such interruptions.
--
"The flames are all long gone, but the pain lingers on"
I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same
value on FreeBSD) should be converted to EEXIST on pidfile_open()
return.
Also if we now have for loop, why not to put count in there?
I'm not very happy about touching pidptr in case of error other than
EEXIST. This is not documented, but a bit unexpected anyway.
I'm happy with increasing the timeout.
> Index: lib/libutil/pidfile.c
> ===================================================================
> --- lib/libutil/pidfile.c (revision 226271)
> +++ lib/libutil/pidfile.c (working copy)
> @@ -118,22 +118,19 @@
> */
> fd = flopen(pfh->pf_path,
> O_WRONLY | O_CREAT | O_TRUNC | O_NONBLOCK, mode);
> - if (fd == -1) {
> - count = 0;
> + if (fd == -1 && errno == EWOULDBLOCK && pidptr != NULL) {
> + *pidptr = -1;
> + count = 20;
> rqtp.tv_sec = 0;
> rqtp.tv_nsec = 5000000;
> - if (errno == EWOULDBLOCK && pidptr != NULL) {
> - again:
> + for (;;) {
> errno = pidfile_read(pfh->pf_path, pidptr);
> - if (errno == 0)
> - errno = EEXIST;
> - else if (errno == EAGAIN) {
> - if (++count <= 3) {
> - nanosleep(&rqtp, 0);
> - goto again;
> - }
> - }
> + if (errno != EAGAIN || --count == 0)
> + break;
> + nanosleep(&rqtp, 0);
> }
> + if (errno == 0)
> + errno = EEXIST;
> free(pfh);
> return (NULL);
The historical (and documented) behavior is to return EAGAIN.
> Also if we now have for loop, why not to put count in there?
Because if we do, there will be a nanosleep after the last
pidfile_read() attempt. We need to break the loop after pidfile_read()
failed but before nanosleep().
> I'm not very happy about touching pidptr in case of error other than
> EEXIST. This is not documented, but a bit unexpected anyway.
Well, it was your idea, I just moved it to before the loop :)
DES
--
Dag-Erling Smørgrav - d...@des.no
Jos
We don't want to duplicate the code of handling EAGAIN into every single
pidfile(3) consumer. This is why we hav pidfile(3) API in the first
place - to make it easy for people to use.
> > Also if we now have for loop, why not to put count in there?
>
> Because if we do, there will be a nanosleep after the last
> pidfile_read() attempt. We need to break the loop after pidfile_read()
> failed but before nanosleep().
Right, ok.
> > I'm not very happy about touching pidptr in case of error other than
> > EEXIST. This is not documented, but a bit unexpected anyway.
>
> Well, it was your idea, I just moved it to before the loop :)
In my patch *pidptr was set to -1 only in the case of EAGAIN from
pidfile_read(), not for every other error.
BTW. With your patch we will continue even when flopen(3) failed for
other reasons, instead of returning NULL. Checking for fd being -1
should not be done in the same statement with other checks.
After proposed changes it would look like this, what do you think?
http://people.freebsd.org/~pjd/patches/pidfile.3.patch
Looks OK to me, but you should also remove the paragraph about EAGAIN in
the man page.