It'd be nice if we could rm -rf /proc/17665 or something... I'd
reboot, but I got another important process that is still running
responsibly (and will be for the next 8 hours). Any ideas?
> I need something stronger then kill -9 (maybe kill -9 with a shot of
> jack daniels or something). I started xine (a media player) and it
> stopped... so I did a ps aux and it died the same way, but I was able
> to narrow down it's processes and tried a kill -9 on them and it
> doesn't kill them.
If the command "ps aux" died abnormally, then your system is in serious
trouble. Can you run "ls" ?
--
CrayzeeWulf
Processes are getting stuck in D state, it sounds like. This is A Bad
Thing (tm). There may be something wrong with your kernel, unless someone
else has some better ideas...
--
Jon Portnoy
What he probably means is that it locked up. I have found that
happens during an I/O on a device that fails to fail graciously. I
have had this kind of problem (and question) occur before and no one
has come up yet with a satisfactory reply.
So, I can't help either, can anyone?
Simon.
'Be Seeing You.
Who is number one?
When experimenting, it is probably a good idea to have SysReq enabled.
Then you can at least sync and unmount the drives to readonly and reboot
if the system is not totally hung. I used that extensively when testing a
partially working USB cd-rw drive when scsi would race in an unkillable
state.
--
David Efflandt - All spam ignored http://www.de-srv.com/
http://www.autox.chicago.il.us/ http://www.berniesfloral.net/
http://cgi-help.virtualave.net/ http://hammer.prohosting.com/~cgi-wiz/
]I need something stronger then kill -9 (maybe kill -9 with a shot of
]jack daniels or something). I started xine (a media player) and it
]stopped... so I did a ps aux and it died the same way, but I was able
]to narrow down it's processes and tried a kill -9 on them and it
]doesn't kill them.
The referents for all your ITs are obscure.
If the program has turned off all interrupts or is in a disk read, you
cannot kill it. Just leave it be. It is also probably not doing anything
either.
]It'd be nice if we could rm -rf /proc/17665 or something... I'd
]reboot, but I got another important process that is still running
]responosibly (and will be for the next 8 hours). Any ideas?
Wait 8 hours and reboot.
]Jeff
]b...@baux.dyndns.org
> Processes are getting stuck in D state, it sounds like. This is A Bad
> Thing (tm). There may be something wrong with your kernel, unless someone
> else has some better ideas...
Doesn't mean that the kernel is bad. It usually happens when there
is a hardware problem with peripheries. You can also get D state
with long timeouts or slow devices (tape).
Vilmos
That is probably the smartest thing to do, if you couldn't get rid of it the
prosess is most likely not doing anything exept hogging a little memory.
(which will end up in swap if you need the memory for anything).
Leave the thing alone, when your important process is done you could
experiment or perhaps reboot.
If that job is important enough to not reboot then it's important enough not
to try anything like "kill -9 with a shot of JD" (killers that have been
drinking are known to often take out more than the intended target)
I stand corrected :-)
--
Jon Portnoy
Yeah, this is basically what happened. Actually, it happened several
times that day, which is what made me start to wonder if there was a
strong kill. It was related to my dvd decoder card or xfree86 (or
both), which is barely considered stable in linux. I tried to do a
rmmod, but it was in use... Maybe what I really needed was a rmmod
--force
In reply to what someone else said, asking if I could do a ls, yes, I
could, except if I did a ls of certain /proc/12345(ex) and then it
would hang with the rest of the processes (as did top and ps aux).
> Yeah, this is basically what happened. Actually, it happened several
> times that day, which is what made me start to wonder if there was a
> strong kill. It was related to my dvd decoder card or xfree86 (or
Actually there is a stronger kill. It is called the power button. :-)
> In reply to what someone else said, asking if I could do a ls, yes, I
> could, except if I did a ls of certain /proc/12345(ex) and then it
> would hang with the rest of the processes (as did top and ps aux).
It is odd. I have never seen that "ls /proc/xx" or "ps aux" hang.
It means that your /proc directory was possibly hosed.
Vilmos
VS> b...@baux.dyndns.org (Jeff) writes:
VS>
VS> > Yeah, this is basically what happened. Actually, it happened several
VS> > times that day, which is what made me start to wonder if there was a
VS> > strong kill. It was related to my dvd decoder card or xfree86 (or
VS>
VS> Actually there is a stronger kill. It is called the power button. :-)
VS>
VS> > In reply to what someone else said, asking if I could do a ls, yes, I
VS> > could, except if I did a ls of certain /proc/12345(ex) and then it
VS> > would hang with the rest of the processes (as did top and ps aux).
VS>
VS> It is odd. I have never seen that "ls /proc/xx" or "ps aux" hang.
VS> It means that your /proc directory was possibly hosed.
Or the hard drive is hanging. If it is a SCSI system, the SCSI bus can
hang (because some SCSI device has gone south or somehow the bus
terminator 'fell off'). This has happened to my system (pure SCSI
system) when I managed to bump the power connector out of my (external
SCSI) ZIP drive...
Note: in order to do ls or ps, the /bin/ls or /bin/ps files (or various
shared libraries in /lib or /usr/lib) have to be accessed off the disk.
If the disk is hanging (broken disk, wedged bus (ide or SCSI), etc.),
then these commands will hang. The /proc file system might be fine, but
this does you no good if cannot run the tools to look at it.
VS>
VS> Vilmos
VS>
I'm afraid that if this is an I/O error (most likely) thene I have not
found any way to rmmod, kill process or sometimes even the IDE bus on
which my cd/dvd drive has failed. Pressing eject button might not
work if the drive is locked.
This problem I'm sorry to say is the only major drawback to linux.
Apart from that I have found it to be a most stable system.
If you have other apps that need to complete then I suggest that you
let them complete and then power-cycle the system. It's the only way
I find of recovery in this situation.
Shame the process can't timeout when in such a state... locked D processes
should be *eventually* recoverable. No hardware wait should require more
than half an hour.
:)
Here's one contrary data point. An "ERASE" command to a DDS-2 tape
drive takes roughly 3 hours to complete. That's with the "long" bit
set, which is what is compiled into the kernel's "st" driver.
--
Bob Nichols rnic...@interaccess.com
That's the entire action though isn't it? The process isn't in a "D" state
all that time. It has to be actively erasing the tape for most of it.
I don't think the process is actually "actively erasing the
tape". It is my understanding that the application simply
sends an "erase" command of some sort to the tape and waits
for it to complete, with the hardware doing the work.
I'm not the one who made the comment about erasing a DDS-2
tape, but I can say this: on a Travan tape, a seek done
with 'mt fsf 1' remains in state "D" until the seek
completes. (The application is doing nothing while the
hardware does the seek.) With ~8GB in a single tar archive,
that could take _hours_ to get there on my Travan drive.
Some hardware does indeed require more than a half hour wait
in "D" state.
J-P Stewart
Of course. But it still would be nice to be able to easily remove
something from the proc table when a human being knows full well that
the hardware is never going to provide whatever magic bits the driver is
expecting.
In a better world, every driver would have an ioctl that would spit out
some standardized info about what it is up to and a way to tell it to
give up and return to initial state 'cause I know better than it does.
Of course this takes more work than just that - it would also have to
give back some false info to whatever processes were waiting for
whatever it is supposed to provide, but that could be a list of
possibilities too.
Imagine this in rather fanciful pseudo code:
Hey, tape driver?
- What? I'm waiting for register such and such to tell me the tape is
ready.
Give it up, it's not going to happen.
- Well, fine, but I've got pid 134567 waiting for data. I could return
an error status, but if you don't want that, you need to give me a 512
byte block of something to work with.
Just the error, thanks. Return "Not ready" from now until I tell you
different, OK? Don't even look at the hardware, it's whacked.
- Sure thing.
--
Please note new phone number: (781) 784-7547
Tony Lawrence
Unix/Linux Support Tips, How-To's, Tests and more: http://aplawrence.com
Free Unix/Linux Consultants list: http://aplawrence.com/consultants.html
As I understand it, to erase a SCSI tape, you send a command to the
tape drive and wait for it to complete. There's nothing "active" to
do. I've replaced my SCSI tape backup systems with CD-Rs, so I can't
check, but I wouldn't be surprised if the process *was* in D state the
entire time the tape is being erased.
Ed
If I run an "mt erase" command on my DDS-2 tape drive, the "mt" process
remains in the "D" state for approximately 3 hours.
--
Bob Nichols rnic...@interaccess.com