How do you get rid of a process which refuses to respond to kill -9
Without rebooting??????
Presumably this process is waiting for something to finish which
obviously never will finish!
I have searched many archives and have found many comments on WHY
processes don't get the signals, but nowhere have I seen a comment on
whether or not it possible to clear this process without a reboot.
Can anyone help? I don't need a techical manifesto, just a clear "you
do this" or "it cannot be done" answer.
Thanks In Advance!!!
Ash Bowers
abowers at email.co.anson.nc.us
> This group seemed the most appropriate, so I will ask the question
> here:
>
> How do you get rid of a process which refuses to respond to kill -9
> Without rebooting??????
Not, AFAIK.
> Presumably this process is waiting for something to finish which
> obviously never will finish!
Take a look at with ps. If the process is in the D state (uninterruptable
sleep, often waiting for I/O) there's no way to kill it, AFAIK. Because
only tasks in the state TASK_INTERRUPTIBLE receive signals.
If there is a way to move a program from TASK_UNINTERUPTABLE state to
TASK_INTERUPTABLE state, I haven't found it yet.
Roland
--
Roland Smith "Traveler, there is no path.
r s m i t h @ x s 4 a l l . n l You make the path as you walk."
http://www.xs4all.nl/~rsmith/
Sometimes it worked to kill the parent process. Do a pstree to find that
one. Actually, this always works if you go up far enough and kill the mother
of all processes. ;-)
Jörn
As Roland and Juergen have already pointed out, processes in the D state
cannot be killed. Although you don't want a "technical manifesto",
perhaps a few words on the "why" are in place, even if it is only for
the purpose of others correcting ME, before I spread more of this
rubbish B-{)
A process always "go to sleep" when it waits for a specific event to
occur, e.g. an I/O process to finish or an idle processor to continue
executing on.
Besides the event the process is waiting for, a signal sent to the
process may wake it up. When the process is woken up, it would have to
check whether it was woken up due to the event having arrived or due to
a signal sent to it and in the latter case do some cleanup e.g because
the event should have been handled by other code (lower levels) before
being passed to the process' handler and that code would have done the
cleanup OR the event has a strong relationship to the process and if it
occurs when the process is not around, strange things might happen OR
the I/O device may be blocked if the event is not handled properly.
So, there are a number of reasons why it may not be possible to allow a
process waiting for an event to be interrupted.
If there is a chance that the event may never occur, e.g. due to faulty
hardware (e.g. a SCSI device stopping to tranfer data in the middle of a
block), some timeout mechanism must be implemented that does the cleanup
mentioned.
--
Josef Möllers (Pinguinpfleger bei FSC)
If failure had no penalty success would not be a prize
-- T. Pratchett
just a comment. This behaviour seems a little strange. Issuing a kill -9
shouldn't result
in the process being signaled. It should just be wiped by the kernel. After
all SIGKILL is
not supposed to be "catchable" at all.
Or am I missing something here?
Martin.
A process catches all signals in kernel mode, the kernel
code then chooses between a number of actions. It can
choose to ignore the signal, deliver the signal, freeze
the process, dump core or just quit by calling do_exit().
With signal 9 the choice is always to call do_exit().
--
Kasper Dupont
unless of course the task's state is not TASK_INTERRUPTABLE:
(from signal.c:)
if (t->state == TASK_INTERRUPTIBLE && signal_pending(t))
wake_up_process(t);
Tasks get their state set to TASK_UNINTERRUPTIBLE when they sleep_on().
The process must be running in kernel mode . say a system call from
user applications. It must have stuck somewhere and not returning...I
don;t know if kernel code traps sure kill signals. There is no way to do
that .. mabbe take the stack trace and check why the syscall did'nt
return.
Regrd
satya
> Hi,
>
> just a comment. This behaviour seems a little strange. Issuing a kill -9
> shouldn't result
> in the process being signaled. It should just be wiped by the kernel. After
> all SIGKILL is
> not supposed to be "catchable" at all.
It is not catchable by the process, that's true.
> Or am I missing something here?
See line 466 and further in kernel/signal.c (the signal_wake_up
function). If the task is not in the TASK_INTERRUPTABLE state, it is not
woken up on a signal. So the process never _gets_ the signal.
What about a zombie process? IIRC, wouldn't it show up in the ps list,
but not really be kill-able?
--
Jeff Gentry jes...@hexdump.org gen...@hexdump.org
SEX DRUGS UNIX
Okay, zombie's are not real processes anymore anyway as they're dead but
a zombie is "only" an entry in the systems process table. Since the
systems process table is a limited resource zombies can become a problem
once there're too many of them.
Just like in everyday life, yes 8)
You basically may end up with not being able to start anything anymore,
not even shutdown 8-/
Juergen
--
\ Real name : Juergen Heinzl \ no flames /
\ EMail Private : jue...@monocerus.demon.co.uk \ send money instead /
>What about a zombie process? IIRC, wouldn't it show up in the ps list,
>but not really be kill-able?
A zombie is already dead, so it can't be killed. A zombie does show up
in the process list because it occupies a process table slot but no
other resources (no memory, no open files, etc...)
Nick.
--
Pacific Internet SP4 http://www.zeta.org.au/~nick/
>unless of course the task's state is not TASK_INTERRUPTABLE:
>(from signal.c:)
> if (t->state =3D=3D TASK_INTERRUPTIBLE && signal_pending(t))
> wake_up_process(t);
>Tasks get their state set to TASK_UNINTERRUPTIBLE when they sleep_on().
For tasks in 'D' state which get a SIGKILL is it reasonable to
wake them up anyway? They're just going to die from the do_exit().
Or will that leave the kernel in an undefined state?.
I get vaguely annoyed by having (say) some disk error or a tape driver
error and a process in 'D' state which can't be killed because the driver
is off in laa-laa land. Usually it's not a problem, because the process
isn't going anywhere anytime soon, but if the process happened to hold
a few resources like open FDs or a large quantity of memory, it might
affect other things.
That is exactly it. For what is supposed to be very short sleep in
the kernel, the kernel device can choose to make the sleep uninterruptible
so it would never have to clean up after catching a signal and before
allowing the process to procede to the exit processing. In some cases
the clean-up required can be quite complex, and the problem is often
better handled by a timeout at a lowere level in the kernel, which will
limit the time a device driver is allowed to sleep. However, if a
process is uniterruptibly sleeping while a tape device is rewinding
is probabaly too long.
Another source of uninterruptibly sleep is NFS access while the server is
down. This is considered necessary to ensure NFS file system integrity,
and can be bypassed by soft-mounting the file system.
Villy
Thanks, I couldn't have said it better (the opposite would be more true
B-{)
> limit the time a device driver is allowed to sleep. However, if a
> process is uniterruptibly sleeping while a tape device is rewinding
> is probabaly too long.
That's when I open the drive door and hope none of the cog wheels gets
damaged B-{)
A process is a zombie from the time it dies until
the parent discovers that the process is dead. If
a process stays in the zombie state for a long
time, it is a problem with the parent. If you
want to get rid of a zombie process kill the
parent.
--
Kasper Dupont
| A process catches all signals in kernel mode, the kernel
| code then chooses between a number of actions. It can
| choose to ignore the signal, deliver the signal, freeze
| the process, dump core or just quit by calling do_exit().
| With signal 9 the choice is always to call do_exit().
Then do_exit() fails to complete because the process is blocked or
otherwise cannot complete certain things like closing files or
devices.
Linux, like every other Unix system I know of, has process and
device layers too tightly bound, and too weak of an abstraction
between them, to make them fully autonomous. Given Linux's
strong monolithic design, this isn't likely to be changed.
Perhaps some microkernel design would be able to pull thus off.
Still, there are some things the Linux design could do to help
the situation.
One of the typical problems caused by a hung process is that while
it may be holding an open file descriptor to the hung device, it
also holds open descriptors and current directory to other things
as well, such as file systems. If the uncatchable kill signal
were to cause the process to forcibly close every open file that
can be closed, the impact of the hung process can be lessened.
Revoking a file decriptor on a killed process shouldn't result in
that process trying to do I/O on a non-open fd, since the process
is killed and cannot run.
Suppose a process is hung writing a SCSI tape drive because the
cable was loose. That process has a current directory, and an
open file, in a large filesystem. What kill should try to do is
close everything. It would probably fail on the tape drive, but
it should succeed on the open file. It should also revoke the
current directory of the process. I know this can be done because
I have deleted directories underneath processes and found them in
limbo as a result. One way to accomplish this is to change the
current directory of the process to a dummy virtual inode just for
this purpose.
The process may not totally go away because the driver cannot
complete the close, but it will no longer prevent the graceful
unmounting of the filesystem it was backing up. I may still have
to reboot, but perhaps now I can schedule the reboot at a time
when it is more convenient, and get on with other work in the
mean time.
Next, device drivers could do a better job. They should be made
to better handle closing of devices even if the device itself does
appear to be hung. If the timeout has expired for the last operation
then the device should be considered to be in an error condition.
A request to close should be honored with any appropriate status
showing that the device did not complete previous operations. For
a process doing non-blocking I/O, it should be able to even get this
done via the close() syscall, with no more a delay than to determine
that the device is indeed hung (wait the timeout then finish close).
--
-----------------------------------------------------------------
| Phil Howard - KA9WGN | Dallas | http://linuxhomepage.com/ |
| phil-...@ipal.net | Texas, USA | http://phil.ipal.org/ |
-----------------------------------------------------------------
Maybe we need yet another kill signal (zkill) which further changes the
process's parent PID to init (much like the parent being killed, but not
actually killing the parent), and running through the kill again, maybe
even briefly waking init so it calls wait again to finish the status.