On Thu, Oct 13, 2011 at 12:54:38PM +0200, Dag-Erling Smørgrav wrote: > I looked at some of the programs that use pidfile(3) in base, and they > pretty much all get it wrong. Consider these two scenarios:
> Dag-Erling Smørgrav <d...@des.no> writes: > > How do we fix this? My suggestion is to loop until pidfile_open() > > succeeds or errno != EAGAIN. Does anyone have any objections to that > > approach? > I think we already do that internally in pidfile_open(). Can you take a look at > the source and confirm that this is what you mean?
No, it doesn't; pidfile_open(3) returns NULL with errno == EAGAIN if the pidfile is locked but empty, as is the case in the window between a successful pidfile_open(3) and the first pidfile_write(3). This is documented in the man page:
[EAGAIN] Some process already holds the lock on the given pid‐ file, but the file is truncated. Most likely, the existing daemon is writing new PID into the file.
I have a patch that adds a pidfile to dhclient(8), where I do this:
for (;;) { pidfile = pidfile_open(path_dhclient_pidfile, 0600, &otherpid); if (pidfile != NULL || errno != EAGAIN) break; sleep(1); } if (pidfile == NULL) { if (errno == EEXIST) error("dhclient already running, pid: %d.", otherpid); warning("Cannot open or create pidfile: %m"); }
I'm not sure I agree with the common idiom (which I copied here) of ignoring all other errors than EEXIST, but that's a different story.
> Pawel Jakub Dawidek <p...@FreeBSD.org> writes: >> Dag-Erling Smørgrav <d...@des.no> writes: >> > How do we fix this? My suggestion is to loop until pidfile_open() >> > succeeds or errno != EAGAIN. Does anyone have any objections to that >> > approach? >> I think we already do that internally in pidfile_open(). Can you take a look at >> the source and confirm that this is what you mean?
> No, it doesn't; pidfile_open(3) returns NULL with errno == EAGAIN if the > pidfile is locked but empty, as is the case in the window between a > successful pidfile_open(3) and the first pidfile_write(3). This is > documented in the man page:
> [EAGAIN] Some process already holds the lock on the given pid‐ > file, but the file is truncated. Most likely, the > existing daemon is writing new PID into the file.
> I have a patch that adds a pidfile to dhclient(8), where I do this:
> for (;;) { > pidfile = pidfile_open(path_dhclient_pidfile, 0600, &otherpid); > if (pidfile != NULL || errno != EAGAIN) > break; > sleep(1); > } > if (pidfile == NULL) { > if (errno == EEXIST) > error("dhclient already running, pid: %d.", otherpid); > warning("Cannot open or create pidfile: %m"); > }
> I'm not sure I agree with the common idiom (which I copied here) of > ignoring all other errors than EEXIST, but that's a different story.
You are also ignoring the return value of sleep(1), which would tell you if the call was interrupted by a signal handler. This can be fine for dhclient(8) but other utilities might require some guards against such interruptions.
-- "The flames are all long gone, but the pain lingers on" _______________________________________________ freebsd-curr...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
After discussing this with pjd@ on IRC, I arrived at the attached patch, which increases the length of time pidfile_open() itself waits (I hadn't noticed that it already looped) and sets *pidptr to -1 if it fails to read a pid.
On Thu, Oct 13, 2011 at 02:54:16PM +0200, Dag-Erling Smørgrav wrote: > After discussing this with pjd@ on IRC, I arrived at the attached patch, > which increases the length of time pidfile_open() itself waits (I hadn't > noticed that it already looped) and sets *pidptr to -1 if it fails to read > a pid.
I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same value on FreeBSD) should be converted to EEXIST on pidfile_open() return.
Also if we now have for loop, why not to put count in there?
I'm not very happy about touching pidptr in case of error other than EEXIST. This is not documented, but a bit unexpected anyway.
> I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same > value on FreeBSD) should be converted to EEXIST on pidfile_open() > return.
The historical (and documented) behavior is to return EAGAIN.
> Also if we now have for loop, why not to put count in there?
Because if we do, there will be a nanosleep after the last pidfile_read() attempt. We need to break the loop after pidfile_read() failed but before nanosleep().
> I'm not very happy about touching pidptr in case of error other than > EEXIST. This is not documented, but a bit unexpected anyway.
Well, it was your idea, I just moved it to before the loop :)
Why not import daemontools? It's public domain these days. Pidfiles are a hacky mess. UNIX already has a way to track processes which avoids all these issues, with very little overhead.
On Thu, Oct 13, 2011 at 04:11:40PM +0200, Dag-Erling Smørgrav wrote: > Pawel Jakub Dawidek <p...@FreeBSD.org> writes: > > I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same > > value on FreeBSD) should be converted to EEXIST on pidfile_open() > > return.
> The historical (and documented) behavior is to return EAGAIN.
We don't want to duplicate the code of handling EAGAIN into every single pidfile(3) consumer. This is why we hav pidfile(3) API in the first place - to make it easy for people to use.
> > Also if we now have for loop, why not to put count in there?
> Because if we do, there will be a nanosleep after the last > pidfile_read() attempt. We need to break the loop after pidfile_read() > failed but before nanosleep().
Right, ok.
> > I'm not very happy about touching pidptr in case of error other than > > EEXIST. This is not documented, but a bit unexpected anyway.
> Well, it was your idea, I just moved it to before the loop :)
In my patch *pidptr was set to -1 only in the case of EAGAIN from pidfile_read(), not for every other error.
BTW. With your patch we will continue even when flopen(3) failed for other reasons, instead of returning NULL. Checking for fd being -1 should not be done in the same statement with other checks.
After proposed changes it would look like this, what do you think?