SYBASE GHOST PROCESSES/HUNG PROCESSES

owen_philip_epstein

unread,

Jan 8, 1995, 3:26:19 PM1/8/95

to

What is the preferred method for elliminating GHOST PROCESSES from SYBASE
492????? We are having GHOST processes creep into the machine on a
fairly consistent basis and would like a resolution if anyone has one. We are running
a stored proc that checks for hung processes and kills them at 5-10 second intervals.

This is not the preferred method however because the burden is on our DBAs to deal
with it and I beleive SYBASE should have somthing in their software engine that handles
ghosts.

Bret Halford

unread,

Jan 8, 1995, 5:32:37 PM1/8/95

to

What exactly do you mean by a ghost process?

When you say that ghost processes creep into the machine, do you mean that
processes are spontaneously appearing is sysprocesses?

What is your ebf level?

What platform is the Server on?

Do you have clients on PC computers?

Do the PC-based users quit out of their programs, or just control-c or power-off?

And finally, why isn't the reply-to field on your post valid?

>Path: sybase!halon!uunet!cs.utexas.edu!howland.reston.ans.net!news.sprintlink.net!news.interserv.net!usenet
>From: OWEN PHILIP EPSTEIN
>Newsgroups: comp.databases.sybase
>Subject: SYBASE GHOST PROCESSES/HUNG PROCESSES
>Date: 8 Jan 1995 20:26:19 GMT

--
---------------------------------------------------------------------
| Bret Halford br...@sybase.com ___|
| Sybase Technical Support __|
| 6475 Christie Avenue |__
| Emeryville, CA 94608 USA |___
| fax: (510)-922-3911 exec sp_realitycheck() |
#####################################################################

Lianne Hargreaves

unread,

Jan 8, 1995, 7:56:23 PM1/8/95

to

On 8 Jan 1995 20:26:19 GMT (OWEN PHILIP EPSTEIN) wrote:

>SYBASE GHOST PROCESSES/HUNG PROCESSES

>What is the preferred method for elliminating GHOST PROCESSES
from SYBASE 492????? We are having GHOST processes creep into
the machine on a fairly consistent basis and would like a
resolution if anyone has one. We are running a stored proc
that checks for hung processes and kills them at 5-10 second
intervals.

We suffer from these too. What stored proc do you use to kill
them? I only seem to be able to kill active processes but the
ghosts are generally sitting idle and won't go away for HOURS.

Thx
Lianne

michael.jones

unread,

Jan 9, 1995, 1:19:11 PM1/9/95

to

From lar...@sled.gsfc.nasa.gov Mon Jan 9 12:45:52 EST 1995

|
|In article <3ephpb$s...@data.interserv.net>, OWEN PHILIP EPSTEIN writes:

|I'm not sure what you mean by ghost processes. If you are talking about
|zombie processes -- like when a client aborts its connection to the
|server -- then the following information from the May 1994 Sybase
|Technical Newsletter might give you some insight.
|
| Hope this helps
| Teresa Larson
|
|+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
|| Teresa A. Larson - Hughes STX Corporation |
|| NASA/GSFC Code 933.0 voice: (301) 286-7867 |
|| Greenbelt, Maryland 20771 fax: (301) 286-1777 |
|| Teresa...@gsfc.nasa.gov |
|+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+
| Standard disclaimer ...
|
| Hanging, Sleeping, and the "Zombie" Process
| -------------------------------------------
|
| What are the different states of SLEEP? When all processes are shown as
| SLEEPING by sp_who except the one which issued the sp_who command, how
| can a user tell the difference between hanging versus running
| processes? What is a "zombie" process and how can it be dealt with?

[...deleted...]

We had the same problem with sleeping processes. If you are
running on an NCR platform ask for ebf #2828. This includes
this fix. (Ive heard there is a similar one for Solaris but I
dont know the number).

Also, the server is supposed to drop connections after a
specified period of time as of the NCR 4.9.1 release.
As it turns out this won't work on NCR platforms without a
kernel recompile. (Maybe others out there can confirm
other platforms - check your errorlog for dropped connections.)

Add a line to your /etc/conf/cf.d/stune: TCP_KATIMER 300.
Recompile the kernel and reboot (make this change only to the
server machine). The 300 means 300 seconds before the server
drops a connection. This can be set from 30-7200.

Hope this helps.

Mike J.

---------------------------------------------------
mjj...@cbews.att.com
AT&T - GCIS - WINGS System Administrator

Ken Prince

unread,

Jan 12, 1995, 8:15:58 AM1/12/95

to

In message <3eq1jn$r...@pipe4.pipeline.com> - l...@pipeline.com (Lianne Hargreave
s) writes:

Attached here is an excerpt from a Sybase Technical Newsletter.......

Hanging, Sleeping, and the "Zombie" Process
-------------------------------------------

What are the different states of SLEEP? When all processes are shown as
SLEEPING by sp_who except the one which issued the sp_who command, how
can a user tell the difference between hanging versus running
processes? What is a "zombie" process and how can it be dealt with?

Definitions

In pre-4.9.2 SQL Servers, the output of sp_who could be difficult to
interpret. Processes showed only one type of SLEEP status, "sleeping".
In System 10, and 4.9.2 Rollup 2115 and above, sp_who shows four types
of sleep along with the other possible statuses:

Value Meaning
----- -------
infected The server erred with a stack trace, and the process
got an error that would normally kill it. The process
is infected instead of killed.

background This process is in the background.

recv sleep The process is waiting for input from the client.

send sleep The process is waiting for a write to the client to complete.

alarm sleep The process is waiting for an alarm (usually means the
process is waiting for a waitfor command to complete).

lock sleep The process is waiting for a lock to be granted.

sleeping The process is waiting for something not listed above.
This is "normal" sleep.

runnable The process is not waiting for anything, and is ready
to run, but is not the currently running process.

running The process is currently running (in a multiprocessing
system, there can be more than one such process).

stopped The process is stopped. In ancient history (before
version 4.0.1), all processes stopped during a
checkpoint. Now the only time a process is in the
stopped state is when someone is using the kill command
on it.

bad status There is a bad value for the status of this process.

In uniprocessor hardware there can be only one process RUNNING and all
other processes are either SLEEPING or RUNNABLE. The next RUNNABLE
process gets scheduled to run after sp_who finishes. Processes sleep
for certain events like disk I/O, network I/O, alarm, etc. If all the
threads are shown as SLEEPING, at least one of them will become
RUNNABLE after an event on which the thread is waiting.
On a multi-processor machine, if more than one SQL Server engine is
started, you can see more than one thread in the RUNNING state. The
number of processes running can not exceed the number of SQL engines
running.
It is not possible to find out from sp_who output which client process
is hung waiting for Server response. But it is possible to find out if
any process (i.e. thread) is blocked by another by looking at the "blk"
field of sp_who. For more details please refer to the Commands
Reference Manual.

Before System 10 -- Night of the Zombie Process

Pre-System 10 SQL Servers can end up with "zombie" (unkillable hanging)
processes if the event on which a thread is sleeping never happens. In
this case, the thread does not run and cannot be killed. This anomaly
existed right from the first release of 4.0 SQL Server until a recent
Rollup of 4.9.2 (2115 and above).
The problem is that the SQL Server scheduler is non-preemptive. This
means that tasks cannot be put to sleep or woken up arbitrarily by the
SQL Server scheduler; all task context switching is done voluntarily by
running tasks.
Pre-System 10 SQL Servers handle attention through a signal handler
set up to catch OUT-OF-BAND data which sets a status bit in the PSS
(Process Status Structure). This is an asynchronous event. For example:
a task is about to go to sleep waiting for input, but the client
cancels the query with dbcancel(). If the signal handler sets the bit
between the time the task is going to sleep and the time it is actually
put to sleep, then the server task sleeps forever waiting for the
client to send some data, and the client sleeps waiting for the server
to acknowledge the cancel request. This is the well-known "dbcancel
problem." Another source of trouble can be a DBMS task in the Server
which is sleeping on a network I/O from a client that just isn't there
any more (maybe because somebody rebooted the PC on which the front end
was running).
This kind of task cannot be killed because:

o The task must be in RUNNABLE state so that the scheduler can
kill the task the next time it runs.
o The task cannot be preempted because its state is unknown.

To complicate the above scenario, if the eternally-sleeping task
started a transaction, it may potentially hold locks on different
pages. The only solution for older versions is to reboot the SQL
Server.

A Wooden Stake for the Zombie Process

As of the 10.0 SQL Server, and 4.9.2 SQL Server Rollup 2115 and above,
zombie processes can now be killed. The new kill command not only sets
the bit in the PSS as it used to, but also wakes up the task if it
determines that the task is sleeping in one of four states:

o waiting to receive something from the client, a common state
(RECV SLEEP)
o waiting for a send to be completed by the network service task
(SEND SLEEP)
o waiting on an alarm because user did a waitfor delay command
(ALARM SLEEP)
o waiting on a lock (resource, logical, semaphore, etc.) (LOCK SLEEP)

This means that any task can be cleaned up properly as if an exception
has occurred while the task was running, provided the task is in one of
the RECV, SEND, LOCK and ALARM sleep states. The new kill command can
kill infected processes as well, also a new feature.
The kill command can almost instantaneously kill a task that is
sleeping on any one of the events except the fifth state: normal sleep
(where the task is waiting for a resource to post network or disk I/O).
This was true for older versions of SQL Server, and is still true. The
reason for this is that all sleeps except "normal sleep" have
well-known and well-understood backout paths; however, tasks sleeping
on resources have a variety of different backout paths.
The new kill command will:
o set the "kill yourself" bit on the task
o wake up the task normally
o put the task into the runnable queue
When the scheduler is ready to run the task it finds the "kill
yourself" bit and aborts the task. For tasks that are in normal sleep
or for running tasks, the new kill command acts exactly as the old kill
command: it sets the kill bit in the PSS and it is up to the task to
wake up and test the bit and decide to kill itself. Note that this
means that the new kill command may not have an effect on all tasks.

NOTE! If a killed task is in the midst of a transaction, the entire
transaction will abort. All resource cleanup will occur in the task
backout path so that no inconsistent resources are left hanging around
that might cause the SQL Server to hang in a hibernating state and
eventually have to be rebooted.

There were regressions, caused by the new kill command's original
implementation, which could cause the server to hang (bug 51270) or not
completely kill the process under certain conditions (bug 48964). These
bugs were fixed as of Rollup 2359, and can be ordered from Tech
Support.