We are using T24 and it happens so that from time to time we found
zombie processes on the server after user was accidentally
disconnected from the server. Some time ago there was discussion about
corrupted files and Jim Idle said that one of the main reasons of
corrupted jBase files is that administrator uses "kill -9" command. I
use LOGOFF <port number> and LOGOUT -<PID> commands first, and only
after that I use "kill" or "kill -9" to eliminate zombies.
Is that correct? What would be best practice to kill zombies in our
situation?
HP-UX 11.23i (Itanium)/jBase 4.1.5.11
Thanks in advance
Andrew
From:
jB...@googlegroups.com [mailto:jB...@googlegroups.com] On Behalf Of Gary
Calvin
Sent: Wednesday, April 11, 2007 2:26 AM
> In addition to the LOGOFF and LOGOUT, you should also try kill with -1, -2, -3 and -6, I believe (I'm sure I'll be corrected shortly). More importantly, you should try to diagnose the root cause of the zombies -- on a terminal disconnect, you should get the equivalent of a 'kill -1' which should close your app, albeit not very gracefully. You may find you have some record locking issues that are leading your users to shut down their terminal emulator out of frustration.
In general, if kill -1 does not work, then the other signals will not either, as the process is likely in some hung state or in some kernel bound loop.
However, if you are getting these a lot, then you need to report this to TEMENOS and have them help you diagnose the fault. In most cases we have found this to be a UNIX kernel issue, whereby the dropping of the connection by the user does not correctly drop the process, but leaves it in either an infinite kernel loop (which only kill -9 will terminate) or awaiting some kernel lock (with the same consequences). Sometimes this can be a result of your network configuration not correctly dropping the connection, or the network configuration of the NIC(s) in your system. But basically, at some point, the kernel should tell the process that the connection was dropped and this should wrap it up.
Other possibilities are of course that the server side connection software is not written correctly and is not properly detecting a dropped connection. The socket timeout parameters and so on are important here. IF this happens with the TEMENOS connection server but not with telnet, then you may point the finger there.
So, the order as posted was correct, with basically the premise being that when you have tried everything else, that kill -9 will likely be a safe option in terms of database corruption. I have yet to see any process state whereby it is hung half way through completing a database write. It is usual that a process hangs waiting to read from some input device which is no longer there, which in the case of this kind of application is usually the socket serving the ‘tty’. This means you are virtually guaranteed not to be in the middle of a database write when only kill -9 will deal with the process.
> Oh, and everybody knows that the only
way to kill zombies is to shoot them in the head, preferably with a shotgun, or
decapitate them. :-P
I prefer the hidden Samurai sword outside the window of the Café, or placing clown masks on them.
Jim
> In addition to the LOGOFF and LOGOUT, you should also try kill with -1, -2, -3 and -6, I believe (I'm sure I'll be corrected shortly). More importantly, you should try to diagnose the root cause of the zombies -- on a terminal disconnect, you should get the equivalent of a 'kill -1' which should close your app, albeit not very gracefully. You may find you have some record locking issues that are leading your users to shut down their terminal emulator out of frustration.
In general, if kill -1 does not work, then the other signals will not either, as the process is likely in some hung state or in some kernel bound loop.
However, if you are getting these a lot, then you need to report this to TEMENOS and have them help you diagnose the fault. In most cases we have found this to be a UNIX kernel issue, whereby the dropping of the connection by the user does not correctly drop the process, but leaves it in either an infinite kernel loop (which only kill -9 will terminate) or awaiting some kernel lock (with the same consequences). Sometimes this can be a result of your network configuration not correctly dropping the connection, or the network configuration of the NIC(s) in your system. But basically, at some point, the kernel should tell the process that the connection was dropped and this should wrap it up.
Other possibilities are of course that the server side connection software is not written correctly and is not properly detecting a dropped connection. The socket timeout parameters and so on are important here. IF this happens with the TEMENOS connection server but not with telnet, then you may point the finger there.
So, the order as posted was correct, with basically the premise being that when you have tried everything else, that kill -9 will likely be a safe option in terms of database corruption. I have yet to see any process state whereby it is hung half way through completing a database write. It is usual that a process hangs waiting to read from some input device which is no longer there, which in the case of this kind of application is usually the socket serving the 'tty'. This means you are virtually guaranteed not to be in the middle of a database write when only kill -9 will deal with the process.
> Oh, and everybody knows that the only way to kill zombies is to shoot them in the head, preferably with a shotgun, or decapitate them. :-P
I prefer the hidden Samurai sword outside the window of the Café, or placing clown masks on them.