I have a question regarding "dead" inactive, orphan sessions.
Sometimes, there's a situation in which a client application (a
webserver on a remote machine) dies horribly and leaves an orphan
session (dedicated server mode) in the database. The user process at
the webserver no longer exists, but the shadow process, the database
session, remains inactive, holding all the resources and locks - which
is a bit of a problem as it usually blocks other transactions. In such
cases, we have to manually identify these sessions and kill them off
ourselves. The PMON then takes over and releases the resources and
locks, as described in 1020720.102 and a few other notes, and leaves
the sessions in the Killed state.
Now, I understand that this is correct behaviour, but I'm wondering if
there's a way or a feature that would look for such cases
automatically. The only sort of applicable thing I found is the
SQLNET.EXPIRE_TIME=xxx parameter but it doesn't really work for us in
this case.
The databases are Oracle 10.2.0.4 (on 64bit Win2k3 machines), the
clients are your typical IIS's using ODP.NET 11.2.0.2, using their own
connection pooling and connecting in dedicated server mode.
Thanks,
David
David, the problem is usually in the processes holding these connections.
If, say, Apache spawns a httpd process which is permanently connected to
the database, it will not get killed because the process will not be a
dead connection. The best solution that I've come up with is limiting the
idle time of the process. The same applies to weblogic and tomcat. If a
process acquires more than 1 hour of idle time, it shouldn't be running,
period.
Alternatively, the applications should not enter long waits while in
transaction. Transactions were initially modeled after the banking
business, which means that they are expected to be discrete, independent
and, above all other, short. The transaction should begin when the luser
pushes the "submit" button and end before the form is refreshed. Waiting
for input, while in transaction, is a bad coding practice.
As for the platform, I have fairly decent experience with 64 bit
Windows7, my impression is that it is much more stable than now fairly
old Win2k3. I cannot substantiate it with numbers, it's just my
impression.
Mladen,
the thing is, there are no user processes (httpd's in your comparison)
present at the time => you can completely reboot the box on which this
IIS is located and it has no effect on the database sessions
whatsoever. They are still there, orphaned, holding their resources.
Which is perfectly fine, in my understanding of how Oracle handles it
- the client is responsible for closing them. I'm just looking for a
way of getting rid of them or at least identifying them automatically.
It's something that very very rarely happens and I'm trying to find a
clean solution on the side of the database.
Limiting the idle time of the process is, I believe, for a completely
different discussion. I'm looking for something different and it's
hopefully clearer now.
Thank you for the input, I appreciate it.
David
> hey are still there, orphaned, holding their resources. Which is
> perfectly fine, in my understanding of how Oracle handles it - the
> client is responsible for closing them. I'm just looking for a way of
> getting rid of them or at least identifying them automatically. It's
> something that very very rarely happens and I'm trying to find a clean
> solution on the side of the database
The following SQLNET.ORA parameters may help you:
SQLNET.EXPIRE_TIME
SQLNET.SEND_TIMEOUT
SQLNET.SEND_TIMEOUT
They work well on Linux. Not sure about W2k3, though.
Thanks again
I seem to recall this can happen if the oracle user process has
spawned a child, the child dies or gets hung up in a wait (or odd
things that put errors in the alert log), and the parent waits forever
for the child to respond, ignoring the death signal from the client.
Or something like that.
Be careful if someone tells you to use code like this (X and z are
obfuscation):
SYS@XXXX> SELECT spid FROM v$process WHERE NOT EXISTS
( SELECT 1 FROM v$session WHERE paddr = addr); 2
SPID
------------
2029
2031
SYS@XXXX> !ps -ef|grep 2029
oracle 2029 1 0 Jun 5 ? 0:00 ora_d000_XXXX
oracle 20039 19987 0 13:08:13 pts/0 0:00 grep 2029
SYS@XXXX> !ps -ef|grep 2031
oracle 2031 1 0 Jun 5 ? 0:01 ora_s000_XXXX
zzzzzzz 12031 1 0 06:35:15 ? 0:20 oracleXXXX
(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle 20070 19987 1 13:08:30 pts/0 0:00 grep 2031
Note they've been handed off to init, but you can't tell if they are
orphans or legitimate (well, you could have excluded background
processes by username), and of course it picked up an extra process.
jg
--
@home.com is bogus.
http://www.freelists.org/post/oracle-l/IBM-Consulting-info,9
There's a known issue with Windows and DCD. Resolution depends on your
version of Windows and is too gruesome to spell out here. Google
"sqlnet.expire_time +keepalive" for the details.
-- Phil
> There's a known issue with Windows and DCD. Resolution depends on your
> version of Windows and is too gruesome to spell out here. Google
> "sqlnet.expire_time +keepalive" for the details.
I knew that my suggestion to upgrade to W7 would eventually prove useful.
You can specify directives to to terminate idle sessions belonging to
any specific user. You specify this by setting KILL_SESSION as the
switch group using the resource plan directive MAX_IDLE_TIME: Maximum
idle time for the session.
Info here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/dbrm.htm#i1010776
Phil,
yes, that's true. There a few bugs regarding this issue on MOS and the
one that appears to be the root cause is marked as fixed in 11.2. I'll
create an SR and see what solution they come up with. Originally I
thought I was just missing something obvious.
I'll update this once we find a reliable solution or a workaround.
Thanks everyone,
David
> You can specify directives to to terminate idle sessions belonging to
> any specific user. You specify this by setting KILL_SESSION as the
> switch group using the resource plan directive MAX_IDLE_TIME: Maximum
> idle time for the session.
I already suggested that, but it's not what the OP needs.
DCD seems to have a number of bugs and problems in recent releases
10.2 and 11.1 ... it won't work correctly in 11.1.0.7.7 unless ( they
say ) patch 6918493 is applied.
You may need to do some custom patching of your environment also.
Kind of hard to believe that fixes to an old functionality like DCD
need one off patches even on top of recent patchset updates ... but
there you have it.
# And about the "solution" you warned me about - no, never even
thought of trying to hunt those sessions down myself. Especially not
like that. :)
Nothing wrong with building in some custom reporting for exceptions
( long idle sessions for example ). You can always report on them for
a while without taking any actions to actually kill them off.
Really enabling DCD is not very different from killing sessions from
outside of oracle it is mostly a difference of who the hit man is
right?
Using SQLNET.EXPIRE_TIME is NOT the same as using the DB Resource
manager.
Your suggestion, which I agree is a valid one although doesn't seems
to work on the OP environment, address this issue in the Network
layer.
What I was proposing is to let the DB server identify and kill idle
sessions regardless of their connection status. The advantages of this
approach are two, the network is not affected by connection probing
and you can precisely control which sessions you need to monitor and
when to kill them.
As far as I remember Oracle's architecture in Windows versions they do
not use multiple processes with shared memory there but rather a single
process with a thread per connection (which would be the dedicated
server process on *nix).
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
> As far as I remember Oracle's architecture in Windows versions they do
> not use multiple processes with shared memory there but rather a single
> process with a thread per connection (which would be the dedicated
> server process on *nix).
That was stated in this document:
http://download.oracle.com/docs/html/B13831_01/ap_unix.htm
This, however, doesn't apply to Windows 7 and Oracle 11.2. Does anyone
have anything more on that? I did one NNF installation on Windows 7, but
that was all. I don't have sufficient knowledge to be able to
differentiate between processes and threads on Windows. I must say that I
am more interested in Windows 7/ 11G combination than in the older
software.
Still seems to be the same for 11gR2:
http://download.oracle.com/docs/cd/E11882_01/win.112/e10845/ap_unix.htm#i634536
Everything else would have surprised me. Such a dramatic architectural
change is not done easily.
jg
--
@home.com is bogus.
http://www.signonsandiego.com/news/2011/jun/24/viasats-space-gamble/