Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

No one could logon to productio database for a while

5 views
Skip to first unread message

dbapl...@hotmail.com

unread,
Sep 22, 2006, 2:06:07 PM9/22/06
to
I am using 9.2.0.5 on HP UNIX. I have an after logon trigger which
executes quickly - all it does it set some stored outline parameters
(trigger is provided below).

While this logon trigger is being executed, if someone else tries to
connect, will Oracle wait for logon trigger to be completed before they
can connect or will Oracle execute another "instance" of logon
trigger? There is only one logon trigger in database. For example if I
put a sleep in logon trigger for 5 minutes, will no one will be able to
logon during this 5 minutes, I do not think so but want o check.

Today for some reason, no one could logon to my production database
including sys.
When sys tried to logon from sqlplus, session was just hung.

We have no of process set to 1200 and there were only 383 oracle shadow
processes,
But then I killed a time-consuming oracle shadow process, I see a
message from this trigger in my alert.log:

ORA-1013: user requested cancellation of operation, *** Error in login
trigger.

Not sure whether this error message came from this long running killed
process or some other process which was causing execution of logon
trigger.

Aftre killng a process, people could connect. I logged on to system. My
trigger is valid and now everything works. All I did was killed an
Oracle shadow process which somehow will not let any one logon. This
sounds like some Oracle quirk/bug. I am not sure whether logon trigger
is the culprit or something else, but genrally if logon trigger has
some issue it can hose everyone (but not sys). I have no clue how to
find root cause f the problem. I have already looked in alert.log and
trace files

Below is trigger. It is a simple trigger, it calls a function and
function simply sets use_stored_outlines for some users.


CREATE OR REPLACE TRIGGER my_logon AFTER LOGON ON database
BEGIN
wmuser.my_outlnfun;
exception
when others then
sys.dbms_system.ksdwrt(2, SUBSTR (SQLERRM, 1, 100) || ' *** Error
in logon trigger');
END;
/

create or replace procedure my_outlnfun is
begin

if (user = 'JJPROS' or user = 'READGGPROS' or user = 'GGOSPP'') then
execute immediate 'alter session set
use_stored_outlines=my_custom';

end if;
end;

dbapl...@hotmail.com

unread,
Sep 22, 2006, 3:11:16 PM9/22/06
to

Listener.log shows that people were connecting to database, i.e.,
connections were being made, they appear to be hung.

DA Morgan

unread,
Sep 22, 2006, 8:24:07 PM9/22/06
to

My first instinct would be to believe you have a problem with your
trigger. The first problem I see is: 'GGOSPP'' so I am not sure how
it could compile successfully.
--
Daniel Morgan
University of Washington
Puget Sound Oracle Users Group

dbapl...@hotmail.com

unread,
Sep 22, 2006, 10:32:31 PM9/22/06
to


That was a typo when preparing the message for this newsgroup.Trigger
clearly was compiled and I can see trigger as valid by looking in
status in dba_triggers. There is no prolbem after that process was
killed nor before for close to a year when trigger was installed.
Nothing was changed in that trigger.If there was problem with
compilation, then Oracle gives another message when logging, logon does
not get hung it immediately exists with some error.

It is something really unexplainable. How one process can not let
anyone logon and after process is killed everyone can logon. Hung
means there is some kind of lock everyone is waiting for to be
released. Since sys could not login it is some internal disctionary
lock and not application lock.

DA Morgan

unread,
Sep 23, 2006, 12:20:35 AM9/23/06
to
dbapl...@hotmail.com wrote:
> DA Morgan wrote:
>> dbapl...@hotmail.com wrote:
>>> dbapl...@hotmail.com wrote:
>>>> I am using 9.2.0.5 on HP UNIX. I have an after logon trigger which
>>>> executes quickly - all it does it set some stored outline parameters
>>>> (trigger is provided below).

Nothing in inexplicable. I just built the trigger and procedure and it
leads me to wonder whether your assumption as to the root cause of your
problem is correct. I certainly don't see any issue.

dbapl...@hotmail.com

unread,
Sep 23, 2006, 12:50:24 AM9/23/06
to


Are your saying trigger may not be the issue?. I also think so, but I
do not know what else could cause this problem unless one is hitting in
some internal oracle bug which happens once in blue moon.

Sybrand Bakker

unread,
Sep 23, 2006, 2:08:28 AM9/23/06
to

Yeah nicely put, 'some internal oracle bug', on a patch release that
is more than a year old. Do you really think that this 'internal
oracle bug' wouldn't have surfaced?

Obviously you can't login anymore when the number of processes has
been exceeded. Even SYS can't login anymore in that situation (this
didn't apply BTW to version 6, internal could *always* login).
Also, you need to check whether your number of semaphores on Unix
level is adequate ( 2 * #processes + 10). If it isn't, you can't
login.

--
Sybrand Bakker, Senior Oracle DBA

DA Morgan

unread,
Sep 23, 2006, 12:23:53 PM9/23/06
to

It is not a bug. The cause is as yet undetermined. What I am saying is
that it caused no problem in my test lab with a 3 node RAC cluster and
100 connected users. In other words ... look elsewhere for now.

dbapl...@hotmail.com

unread,
Sep 23, 2006, 3:16:00 PM9/23/06
to

No of process was not a factor, because then one gets an Oracle error
message instead of getting hung. I need to look into semaphores.

joel garry

unread,
Sep 25, 2006, 5:05:45 PM9/25/06
to

This sounds like you did something to change privilege, so caused the
login trigger to be invalidated, but it couldn't recompile because it
couldn't get to where privilege for recompilation is evaluated, and the
alter session for everyone else couldn't be evaluated because that was
locked while privilege was being altered. Or something like that,
pehaps originally instigated by some completely unrelated procedure
recompiling or user being modified. Then, when you killed the
offending session (that is, the first one to be locked by the original
one), the privilege could be evaluated by everyone, your procedure
could automatically recompile, and the killed procedure left the
message in the alert log.

It doesn't matter how quickly a trigger works if it has to wait for
something it can't get.

Perhaps you should leave an OEM session open so you can check locks and
waits when a problem like this occurs. I'm speculating you would see
something locked in a base table for the *priv* views and some obscure
latch wait. It is also possible that you are running into memory leaks
or other almost-about-to-ORA-4031 issues since you are on 9205, which
could intensify recompilation issues. (See bug 3258390 for one example
of something where schemaname is messed up - and note that the fix for
it in 9206 makes Bug 4480100 Latch contention for "row cache objects"
latch - in other words, the first bug or something related to it could
cause your problem the second time something is run, and the fix would
make everyone have more latch contention. Six broken eggs or
half-a-dozen - what a mess).

jg
--
@home.com is bogus.
http://content.ytmnd.com/content/e/0/3/e035a08e616468d1b0500bc12dcc11e9.gif

dbapl...@hotmail.com

unread,
Sep 25, 2006, 5:57:49 PM9/25/06
to

Thanks, so looks like gpoing to 9206 is not recommended.

dbapl...@hotmail.com

unread,
Sep 25, 2006, 6:03:09 PM9/25/06
to

My HP UNIX semaphore parameters are:
semmni 8000
semmns 12000
far higher than semmni 2 *1200 + 10
so I do not think semaphore is an issue.

joel garry

unread,
Sep 25, 2006, 7:50:10 PM9/25/06
to

dbapl...@hotmail.com wrote:

> joel garry wrote:
>
> Thanks, so looks like gpoing to 9206 is not recommended.

Wellllllllll, I wouldn't say that, as I've been on 9206 (hp-ux 11i) for
a while. What you should do is figure out if the known bugs in each
affect you (see the descriptions in the patches, and test), and whether
9208 might be the way to go until you can get to 10. For my
environment, 9206 is better and I'll be evaluating 9208 sometime,
depending on a third-party vendors 10ishness. Sometimes you can
mindlessly throw hardware at a latch issue or shared_pool at a 4031,
but no one logging on might be considered more nasty. I don't seem to
have any of these problems, fortunately.

I've seen some pretty strong opinions both ways on 9205 v 9206, seems
very site-dependent.

jg
--
@home.com is bogus.

I'll take a 1000.
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9003581

DA Morgan

unread,
Sep 25, 2006, 8:44:42 PM9/25/06
to

Not sure what docs you are using as the source of your numbers
but mine is:
http://download-west.oracle.com/docs/html/A96167_01/pre.htm#sthref106

There the requirement is listed as:

SEMMNI 4096 Defines the maximum number of semaphore sets in the entire
system.
SEMMNS (SEMMNI * 2) Defines the maximum number of semaphores in the
system. The default value of SEMMNS is 128, which is, in most cases, too
low for Oracle9i software.

You might want to review the other kernel settings in SAM.

dbapl...@hotmail.com

unread,
Sep 29, 2006, 8:30:43 AM9/29/06
to

I noticed this problem all over again today. It probably comes from a
process which uses a statement like:

select plan_table_output from TABLE( dbms_xplan.display('dynamic
_plan_table','" + address + "_" + child_number + "'

Most of the time everything works fine, but sometime it seems to hang
everything. This is a wild guess, does any on know any issues with this
statement in the evrsion of Oracle 9.2.0.5 used.

code for script is quite simple, it does a v$sql to find all sql
satemenmts in SGA,
then does v$sqltext to get complete sql satement and then calls
dbms_xplan.display to show
the plan. Is there any other way to get the plan in a pretty fashion.

Jonathan Lewis

unread,
Oct 1, 2006, 9:58:53 AM10/1/06
to

<dbapl...@hotmail.com> wrote in message
news:1159533043.0...@m7g2000cwm.googlegroups.com...

Don't do this to v$sql_plan this aggressively.

Even using dbms_xplan.display() against a single
cursor in 9i hammers the libray cache and shared
pool latches - it's not surprising that you managed
to freeze your instance by trawling v$sql and
v$sql_text for everything in sight.

The problem with dbms_xplan in 9i is that it
does a hierarchical query (connect by) against
the target table. If you've faked a view that makes
V$sql_plan the target get, the impact on latching
is extreme.

--
Regards

Jonathan Lewis
http://www.oracle.com/technology/community/oracle_ace/ace1.html#lewis

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html

Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

hpuxrac

unread,
Oct 1, 2006, 10:14:04 AM10/1/06
to

joel garry wrote:
> dbapl...@hotmail.com wrote:
> > joel garry wrote:
> >
> > Thanks, so looks like gpoing to 9206 is not recommended.
>
> Wellllllllll, I wouldn't say that, as I've been on 9206 (hp-ux 11i) for
> a while. What you should do is figure out if the known bugs in each
> affect you (see the descriptions in the patches, and test), and whether
> 9208 might be the way to go until you can get to 10. For my
> environment, 9206 is better and I'll be evaluating 9208 sometime,
> depending on a third-party vendors 10ishness. Sometimes you can
> mindlessly throw hardware at a latch issue or shared_pool at a 4031,
> but no one logging on might be considered more nasty. I don't seem to
> have any of these problems, fortunately.
>
> I've seen some pretty strong opinions both ways on 9205 v 9206, seems
> very site-dependent.
>

We were running 9206 on hpux 11.11 pa-risc for a long time just
completed upgrading all our systems to 9208.

It wouldn't make any sense to me for someone still at 9205 to go to
9206 instead of 9208 based on what I have seen so far. As Joel noted,
could vary by site.

dbapl...@hotmail.com

unread,
Oct 1, 2006, 12:14:34 PM10/1/06
to

Jonathan:

Thanks a lot. I have decided not to use this statement any more because
I am almost sure this statement is culprit even though Oracle support
does not admit it. I will go directly against v$sql_plan and not use
some columns which are known to give ORA-600 errors. I do, however,
want to get the sql plan. My script runs every half an hour and tries
to find all the sql statements which take more than x seconds (based on
elapsed time). Then I want to see the plans for these so I need to go
to v$sql_plan.

I had used same script many months ago back in March 2006 for a month
and never had any problem, but in last few weeks, I found this problem
and took me a while to nail it down.

I still do not know why Oracle will get into this state unless it is
not releasing the latches because my script takes less than a minute to
complete unless it gets into this strange state. Is Oracle running out
of latches, can that be increased using some init.ora parameter.

I am really surprised that a package supplied by Oracle can make
database hung and only way to get out of hung state is to kill a shadow
process (this shadow process was created as a result of connecting to
database from my script). The shadow process was keeping one of the 6
CPU"s 100% busy. Normally, it should have exited in less than a
minute but as a result of hung was running for several hours.

dbapl...@hotmail.com

unread,
Oct 1, 2006, 12:26:24 PM10/1/06
to


To add when I do v$sql, I am doing only for my stamenets, i.e,
satements for a specifc schema., i.e., parsing_user_id = 'myschema', so
Oracle is probably running out of latches and unless latch is relased
everything is hung including my shadow process.

dbapl...@hotmail.com

unread,
Oct 2, 2006, 9:11:40 PM10/2/06
to

Jonathan:
I am looking at metalink note 260942.1 which provides another way of
getting explain plan without using dbms_xplan.display. I wnat to make
sure this does not create any latches problem. I do want to get plan's
from v$sql_plan but obviously cannot make productin database hung. My
program is very simple,
it does a loop of v$sql, then goes to v$sql_test and finally to
v$sql_plan (now using note 260942.1), Thanks a lot:
+++++++++++++++++++++++++++++++ Note 260942.1 ***************
Execution Environment:
SQL_PLUS

Access Privileges:
SELECT permission on V$SQL,V$SQL_PLAN and
V$SQL_PLAN_STATISTICS_ALL

Usage:
sqlplus <user>/<pw> @SCRIPTFILE

Instructions:
The script ask for the hash value of the SQL statement.
You can find this value in V$SQL or V$SQL_PLAN
The child_number is set to 0. When you get the wrong SQL statement you
have
also to change this number.
editors, e-mail packages, and operating systems handle text formatting
(spaces,
tabs, and carriage returns), this script may not be in an executable
state
when you first receive it. Check over the script to ensure that errors
of
this type are corrected.The script will produce an output file named
[outputfile].
This file can be viewed in a browser or uploaded for support analysis.

Description
This script displays the SQL and the SQL execution plan that is in the
librarycache. This script does not work on 9.0 because the view
v$sql_plan_statistics_all does not exits. This part can be removed and
than the script works also on 9.0.

References
None

Script
set pagesize 600
set tab off
set linesize 140
set echo off
set long 4000
col TQID format A4
col "SLAVE SQL" format A95 WORD_WRAP
col address format A12
col sql_hash format A15
col exec format 9999
col sql_text format A75 WORD_WRAP
repfooter off;
set timing off veri off space 1 flush on pause off termout on numwidth
10;
alter session set "_complex_view_merging"=false;
select hash_value||decode(child_number, 0, '', '/'||child_number)
sql_hash,
sql_text
from v$sql
where child_number= 0 and hash_value= &hashvalue;

select '| Operation | Name | Starts | E-Rows | A-Rows | Buffers | Reads
| Writes | E-Time |' as "Plan Table" from dual
union all /* QWEKLOIPYRTJHH7 */
select
'------------------------------------------------------------------------------------------------------------------------'
from dual
union all
select rpad('| '||substr(lpad(' ',1*(depth))||operation||
decode(options, null,'',' '||options), 1, 33), 34, ' ')||'|'||
rpad(substr(object_name||' ',1, 19), 20, ' ')||'|'||
lpad(decode(starts,null,' ',
decode(sign(starts-1000), -1, starts||' ',
decode(sign(starts-1000000), -1,
round(starts/1000)||'K',
decode(sign(starts-1000000000), -1,
round(starts/1000000)||'M',

round(starts/1000000000)||'G')))), 8, ' ') || '|' ||
lpad(decode(cardinality,null,' ',
decode(sign(cardinality-1000), -1,
cardinality||' ',
decode(sign(cardinality-1000000), -1,
round(cardinality/1000)||'K',
decode(sign(cardinality-1000000000), -1,
round(cardinality/1000000)||'M',

round(cardinality/1000000000)||'G')))), 8, ' ') || '|' ||
lpad(decode(outrows,null,' ',
decode(sign(outrows-1000), -1, outrows||' ',
decode(sign(outrows-1000000), -1,
round(outrows/1000)||'K',
decode(sign(outrows-1000000000), -1,
round(outrows/1000000)||'M',

round(outrows/1000000000)||'G')))), 8, ' ') || '|' ||
lpad(decode(crgets,null,' ',
decode(sign(crgets-10000000), -1, crgets||' ',

decode(sign(crgets-1000000000), -1,
round(crgets/1000000)||'M',

round(crgets/1000000000)||'G'))), 9, ' ') || '|' ||
lpad(decode(reads,null,' ',
decode(sign(reads-10000000), -1, reads||' ',
decode(sign(reads-1000000000), -1,
round(reads/1000000)||'M',

round(reads/1000000000)||'G'))), 8, ' ') || '|' ||
lpad(decode(writes,null,' ',
decode(sign(writes-10000000), -1, writes||' ',

decode(sign(writes-1000000000), -1,
round(writes/1000000)||'M',

round(writes/1000000000)||'G'))), 8, ' ') || '|' ||
lpad(decode(etime,null,' ',
decode(sign(etime-10000000), -1, etime||' ',
decode(sign(etime-1000000000), -1,
round(etime/1000000)||'M',

round(etime/1000000000)||'G'))), 8, ' ') || '|' as "Explain plan"
from
(select /*+ no_merge */
p.HASH_VALUE, p.ID, p.DEPTH, p.POSITION, p.OPERATION,

p.OPTIONS, p.COST COST, p.CARDINALITY CARDINALITY,
p.BYTES BYTES, p.OBJECT_NODE, p.OBJECT_OWNER,
p.OBJECT_NAME, p.OTHER_TAG, p.PARTITION_START,
p.PARTITION_STOP, p.DISTRIBUTION, pa.starts,
pa.OUTPUT_ROWS outrows, pa.CR_BUFFER_GETS crgets,
pa.DISK_READS reads, pa.DISK_WRITES writes,
pa.ELAPSED_TIME etime
from v$sql_plan_statistics_all pa,
V$sql_plan p
where p.hash_value = &hashvalue
and p.CHILD_NUMBER= 0
and p.hash_value = pa.hash_value(+)
and pa.child_number(+) = 0 )
union all
select
'------------------------------------------------------------------------------------------------------------------------'
from dual;
REM
REM Print slave sql
REM
select /* QWEKLOIPYRTJHH7 */
decode(object_node,null,'',
substr(object_node,length(object_node)-3,1) || ',' ||
substr(object_node,length(object_node)-1,2)) TQID,
other "SLAVE SQL"
from v$sql_plan vp
where other is not NULL
and hash_value = &hash_value
and CHILD_NUMBER= 0;

joel garry

unread,
Oct 3, 2006, 8:40:50 PM10/3/06
to

dbapl...@hotmail.com wrote:
> Description
> This script displays the SQL and the SQL execution plan that is in the
> librarycache. This script does not work on 9.0 because the view
> v$sql_plan_statistics_all does not exits. This part can be removed and
> than the script works also on 9.0.
>

I am not a lawyer, but you might want to look about on metalink for
their terms of service. Probably not a good idea to cut and paste to
the public from there.

"v$sql_plan_statiscs_all does not exits."

Self-fulfilling prophecy? :-)

jg
--
@home.com is bogus.

http://nickjr.co.uk/global/media/images/medium_blue.gif

0 new messages