Hmmm...as you probably guessed from the comment in the code, I didn't
think there was any way that this ought ever to be happening when I
wrote the code. apr_proc_wait() can be called in two ways, one just
checks the status of a child process and returns immediately. "Still
running" is one of the possible returns from that. The way I'm calling
it, it is supposed to wait until the child stops running, and then
return the final status code. "Still running" should, in theory, never
happen. But here it is, happening anyway.
So we make the call:
rc= apr_proc_wait(&proc,&status,&why,APR_WAIT);
The APR_WAIT flag means wait for termination. Then we do:
if (!APR_STATUS_IS_CHILD_DONE(rc))
{
ap_log_rerror(APLOG_MARK, APLOG_ERR, rc, r,
"Could not get status from child process");
return -5;
}
Do you get the "Could not get status from child process" message? If
so, it should include the return code, which might be interesting.
It's tempting to try changing the apr_proc_wait() call into something
like this:
do {
rc= apr_proc_wait(&proc,&status,&why,APR_WAIT);
while (!APR_STATUS_IS_CHILD_DONE(rc));
But I'd be reluctant to do that without really understanding why
apr_proc_wait() is returning before the child process is done, because
it is likely to turn into a spin loop or infinite loop. Either option
could do ugly things to your server.
Clearly I'm going to have to read through apache's process/thread
library source code (AGAIN!) to try to come up with some theory for how
the wait() could be returning prematurely.
An interesting experiment to try would be to try using mod_auth_external
version 3.1.2. That version doesn't use Apache's process library to run
the child process, but instead directly calls the UNIX OS functions, in
this case wait(). Even if it has the same problem, it might be easier
to debug.
I'll get back to you on this, after I've had more time to dig through
the Apache code. Right now, I haven't even got a theory.
- Jan