I tried to send pthread_cancel in a while loop, stil the API fails
with the same error number. Does anyone have any suggestion, how to
proceed on this ?
Unfortunately, the exact code cannot be shared.
Regards,
Vamsi
> I have an application where it creates 32 threads. Later in the
> logic, it has to kill some 10 threads. All the 10 threads does a
> recvfrom on a unique socket in a while loop. To cancel the thread,
> pthread_cancel API is used to send a cancel signal to the target
> thread from the main thread, from which threads got created. All the
> threads are initialized as canceltype - asynchronous. The
> pthread_cancel API sends signal to 8 threads succesfully, however for
> the 9th thread the pthread_cancel fails with an error number 11
> (EAGAIN).
My bet is that you have made one of three mistakes:
1) While your cancel state is set to asynchronous, you have called a
function that is not cancel-safe. You cannot call any functions that
are not cancel-safe while your cancellation state is set to
asynchronous.
2) You have misdiagnosed the returned error number. (Did you check
'errno'? That's a mistake as 'pthread_cancel' returns the error code
and does not set 'errno'.)
3) Your thread has actually terminated or was detached. As a result,
the pthread_t is not valid.
DS
> 2) You have misdiagnosed the returned error number. (Did you check
> 'errno'? That's a mistake as 'pthread_cancel' returns the error code
> and does not set 'errno'.)
vamsi> The return value is 11. EAGAIN. not the errno.
>
> 3) Your thread has actually terminated or was detached. As a result,
> the pthread_t is not valid.
vamsi> I did a pthread_kill(th_id,0) to see if the thread is
exisiting or not. It is existing.
>
> DS
Vamsi: Please find the info inline above.
> 2) You have misdiagnosed the returned error number. (Did you check
> 'errno'? That's a mistake as 'pthread_cancel' returns the error code
> and does not set 'errno'.)
Vamsi> error number returned is 11, not errno.
> 3) Your thread has actually terminated or was detached. As a result,
> the pthread_t is not valid.
Vamsi> I checked the validity of pthread_t by sending a pthread_kill
signal with 0 argument to the target thread. It is successful and the
thread is existing blocked on recvfrom.
> DS
Please find the info inline above.
Regards,
Vamsi
> > 1) While your cancel state is set to asynchronous, you have called a
> > function that is not cancel-safe. You cannot call any functions that
> > are not cancel-safe while your cancellation state is set to
> > asynchronous.
> vamsi> The thread is waiting on recvfrom, which is a async cancel
> safe function.
How do you know this? In general, there is no way to tell where a
thread is.
> > 2) You have misdiagnosed the returned error number. (Did you check
> > 'errno'? That's a mistake as 'pthread_cancel' returns the error code
> > and does not set 'errno'.)
> vamsi> The return value is 11. EAGAIN. not the errno.
Odd. The 'pthread_cancel' function basically reduces to a kill to
'tkill'. The only way 'tkill' can return EAGAIN, as far as I can tell,
is if the signal queue overflows. This can only happen if the kernel
is critically low on memory. Another possibly is that you have some
kind of special signal auditing service, and it has opted to return
EAGAIN.
> > 3) Your thread has actually terminated or was detached. As a result,
> > the pthread_t is not valid.
> vamsi> I did a pthread_kill(th_id,0) to see if the thread is
> exisiting or not. It is existing.
Technically that doesn't prove anything. If the thread had terminated
and was detached, it would be just as illegal to pass it to
pthread_kill as pthread_cancel.
DS
4) Somehow the thread id got clobbered in memory.
5) recvfrom for that thread has returned EINTR from some other signal and you
actually aren't in a cancel safe place even though you think you are.
Not sure of your program's logic, but is it possible to cancel the threads in
the reverse order and see if it hangs on the ninth call or if it is this
specific thread or if it completes. This may help you figure out where to look
for a bug.