*** 2005-09-19 10:37:01.153
Global Wait-For-Graph(WFG) at ddTS[0.0] :
BLOCKED 70000016f956fd8 5 [0x2b90011][0x5f20],[TX] [131321,1285] 1
BLOCKER 70000016b5ff288 5 [0x2b90011][0x5f20],[TX] [131317,1159] 1
BLOCKED 70000016b1c8440 5 [0x3fe001d][0x1a1d],[TX] [131317,1159] 1
BLOCKER 700000199574ea0 5 [0x3fe001d][0x1a1d],[TX] [131321,1285] 1
and sometimes also like:
----------enqueue------------------------
lock version : 10665
Owner node : 0
grant_level : KJUSEREX
req_level : KJUSEREX
bast_level : KJUSERNL
notify_func : 0
resp : 7000001b13711d8
procp : 700000160d71fa0
pid : 5365832
proc version : 39
oprocp : 0
opid : 0
gid : 618
xid : 67532 66652
dd_time : 0.0 secs
dd_count : 0
timeout : 0.0 secs
On_timer_q? : N
On_dd_q? : N
lock_state : GRANTED
Open Options : KJUSERDEADLOCK
Convert options : KJUSERNOQUEUE
History : 0x9a514395
Msg_Seq : 0x0
res_seq : 0
The problem is that it looks like no one including local Oracle support
knows what is the meaning of values in this tracefile. We are also
pursuing this with Oracle Worldwide Support, but it can take a lot of
time. Is there someone who investigated this kind of trace file before?
Search on Metalink showed only a couple of similar, but never answered
questions.
Thank you
Dusan Bolek
Forgot to mention that. Sorry.
Dusan
Yea, we 've experienced a problem like that. Due to some issues with
bitmap indexes we were getting periodically ORA-00060, with very similar
traces (i got those by setting '60 trace name ERRORSTACK level 1' event).
The traces weren't too informative, and i opened a TAR with the question
about more accurate diagnostic. The Oracle WWS suggestion was: "...
systemstate level 10 is what you need, though the trace information is
really **only to be read by Oracle** (there are no docs on how to read
it)...". So, i'm afraid, in a similar way your traces are useful for the
Oracle Support only...
Alexey Sergeyev
"Dusan Bolek" <spa...@seznam.cz> wrote in message
news:1130917453.7...@g44g2000cwa.googlegroups.com...
Dusan,
Metalink Note:262226.1 offers some interpretation. Your process 131321
on node 1 is waiting for a TX lock in mode 5 (the two hex numbers,
0x2b90011 and 0x5f20, may be id1 and id2 in v$lock, respectively).
Process 131317 on the same node is holding it. Unlike in a single
instance deadlock graph, a Global Wait-For-Graph is missing the current
SQL. The good news to you is that your database is capable of producing
this deadlock involving only one instance. So set cluster_database to
false and try to reproduce the deadlock to get a better trace file.
Note:181489.1 has an interesting section about global cache cr request.
But I think it's unrelated to the above deadlock.
>
> and sometimes also like:
>
> ----------enqueue------------------------
> lock version : 10665
> Owner node : 0
> grant_level : KJUSEREX
> req_level : KJUSEREX
> bast_level : KJUSERNL
> notify_func : 0
> resp : 7000001b13711d8
> procp : 700000160d71fa0
> pid : 5365832
That looks like lkdebug output and is about a different instance, a
different process. But confirm with Oracle support.
Yong Huang
Dusan
If the deadlock is on mode 5 enqueues at both
ends, it is most likely to be the foreign key locking
issue, possibly with a referential integrity constraint
declared as 'on delete cascade' but not covered by
a suitable index.
Look of foreign keys without indexes, and then
look for code that deletes the parent, or updates
the columns of the indexes protecting the parent key.
--
Regards
Jonathan Lewis
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
Cost Based Oracle: Fundamentals
Now available to pre-order.
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/appearances.html
Public Appearances - schedule updated 4th Sept 2005
--
Dusan Bolek
The strange thing is that I can find none of the processes that are
participating in a deadlock.
When checked against V$PROCESS, even immediately after a new dead lock
emerges in trace file,
there is no row returned from that table.
--
Dusan Bolek
You're right. I did some testing and find that the two numbers,
[131321,1285], in your case, do not in any way denote a process
(Note:262226.1 says the first number is PID). Instead they correspond
to transaction_id0 and transaction_id1 of v$ges_blocking_enqueue,
respectively (or the same in v$dlm_locks). Documentation says they're
lower and upper 4 bytes of the transaction identifier where the lock
belongs to. I can't find more information about it. Perhaps for our
purpose, we can conceptually think of the combination of the two
numbers, i.e. a transaction identifier, as a process identifier.
By the way, I do see the SQL involved in the global deadlock (tested in
9.2.0.7.0 on Linux):
...
*** 2005-11-04 13:38:33.199
user session for deadlock lock 0x7553ab14
...
Current SQL Statement:
update test set a = :"SYS_B_0" where a = :"SYS_B_1"
Global Wait-For-Graph(WFG) at ddTS[0.28] :
BLOCKED 0x7553ab14 5 [0xf001d][0x8353],[TX] [2162689,7995] 0
...
In any case, follow Jonathan's practical advice.
Yong Huang
--
Dusan Bolek
Using log miner is the way to go, unless you enable SQL trace
beforehand. Although it may not help solve your problem, can you ask
Oracle when the number(s) is/are PIDs? It may be that with two numbers
in brackets (as in your case), they're transaction IDs, and if it's a
single number as in Bug:3977157, it's PID. AIX can have surprisingly
high PIDs. A message at
http://unix.derkeiler.com/Mailing-Lists/AIX-L/2004-02/0059.html
says its max value is > 67 million. Solaris is only up to 999999 and
Linux 32k.
Please post back anything interesting you and Oracle support find.
Thanks.
Yong Huang
These values are:
<BLOCKED|BLOCKER> <lockp> <cvt|held mode> <res name> <pid|did|txn_id>
<node>
The second to last field (<pid|did|txn_id>) could be a pid or the txn
id depending
on if we have an XID based deadlock search flag.
However, as I already said. The mentioned txn id is not the transaction
id that could be found in a Log Miner output. :-(
--
Dusan Bolek
See below:
*** 2005-11-09 08:16:10.087
user session for deadlock lock 70000016a0b29b8
pid=825 serial=32833 audsid=14358159 user: 425/USER
O/S info: user: wasadmin, term: , ospid: 1323120, machine: cn67b1
program: /was5/WebSphere/AppServer/java/bin/java@cn67b1 (
application name: /was5/WebSphere/AppServer/java/bin/java@cn67b1 (,
hash value=0
Current SQL Statement:
DELETE FROM SESSION_DATA WHERE SESSION_ID = :1 AND DATA_KEY = :2
user session for deadlock lock 70000016c0b07b0
pid=867 serial=37699 audsid=14358272 user: 425/USER
O/S info: user: wasadmin, term: , ospid: 1323120, machine: cn67b1
program: /was5/WebSphere/AppServer/java/bin/java@cn67b1 (
application name: /was5/WebSphere/AppServer/java/bin/java@cn67b1 (,
hash value=0
Current SQL Statement:
DELETE FROM SESSION_DATA WHERE SESSION_ID = :1 AND DATA_KEY = :2
Global Wait-For-Graph(WFG) at ddTS[0.eff6] :
BLOCKED 70000016a0b29b8 5 [0x20a0008][0x1cb52],[TX] [66361,527468] 0
BLOCKER 70000016f469718 5 [0x20a0008][0x1cb52],[TX] [131371,47259] 1
BLOCKED 70000016f469718 5 [0x20a0008][0x1cb52],[TX] [131371,47259] 1
BLOCKER 70000016bc15980 5 [0x20a0008][0x1cb52],[TX] [131217,103403] 1
BLOCKED 70000016b47f828 5 [0xbc001d][0x6ed23],[TX] [131217,103403] 1
BLOCKER 70000016d0b11a0 5 [0xbc001d][0x6ed23],[TX] [131469,20450] 1
BLOCKED 70000016d0b11a0 5 [0xbc001d][0x6ed23],[TX] [131469,20450] 1
BLOCKER 70000016d0b8a18 5 [0xbc001d][0x6ed23],[TX] [131394,51017] 1
BLOCKED 70000016d0b8b98 5 [0x400005][0x6914],[TX] [131394,51017] 1
BLOCKER 70000016c0b0648 5 [0x400005][0x6914],[TX] [66403,574442] 0
BLOCKED 70000016c0b07b0 5 [0x3d0017][0x82a7],[TX] [66403,574442] 0
BLOCKER 70000016a0b2400 5 [0x3d0017][0x82a7],[TX] [66361,527468] 0
--
Dusan Bolek
My error - my comment is only appropriate for TM enqueues
in mode 5; not TX in mode 5.
On the other hand, I've never seen a TX in mode 5 - I had thought
that modes 4 and 6 were the only possibilities for TX. So I wonder
if we are being misled by the Metalink note.
Two thoughts - application oriented only:
Does this code come from a logoff trigger ?
Is webapplication server configured to drop a number of connections
if they have been idle for a few minutes.
The code looks like an attempt to record a logoff in the database -
but in RAC, you can use the same session_id from different nodes,
so maybe this code is simply not RAC-aware and causes a genuine
deadlock somehow.
--
Regards
Jonathan Lewis
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/appearances.html
Public Appearances - schedule updated 4th Nov 2005
Given that the documentation and notes are wrong about the
process id / transaction id - might the comment about the
mode 5 lock also be wrong ?
Could you see a transaction holding or requesting a TX mode 5
in the few seconds before the deadlock was reported ?
I just tested again. I do not see lock mode 5. When the two sessions
involving a deadlock were hung, the session that later detected ORA-60
had this in v$lock:
SID TY ID1 ID2 LMODE REQUEST CTIME
BLOCK
--- -- ---------- ---------- ---------- ---------- ----------
----------
133 TX 327689 60114 6 0 72
2
133 TM 114196 0 3 0 72
2
133 TX 852009 33959 0 6 60
0
and these in v$ges_blocking_enqueue:
HANDLE GRANT_LEV REQUEST_L OPEN_OPT_DEADLOCK WHICH_QUEUE STATE
-------- --------- --------- ----------------- ----------- -----
68D149C0 KJUSEREX KJUSEREX 1 1 GRANTED
68D157E4 KJUSERNL KJUSEREX 1 2 OPENING
After this session detected ORA-60, followed by a few seconds delay,
the last row in v$lock output (where LMODE=0) is gone and the second
row in v$ges_blocking_enqueue is also gone. The lmd0 process trace file
shows:
Global Wait-For-Graph(WFG) at ddTS[0.5e] :
BLOCKED 0x68d157e4 5 [0xd0029][0x84a7],[TX] [2621441,14115] 0
BLOCKER 0x6d6964bc 5 [0xd0029][0x84a7],[TX] [1835010,4352] 1
BLOCKED 0x6c633798 5 [0x50009][0xead2],[TX] [1835010,4352] 1
BLOCKER 0x68d149c0 5 [0x50009][0xead2],[TX] [2621441,14115] 0
So you're right. According to v$lock, there's no mode 5 lock involved.
In fact, showing 5 on all four lines in this Wait-For-Graph plus the
fact that 5 is the only number I see in all WFGs I find makes me wonder
if this number is hardcoded in Oracle code.
Yong Huang
--
Dusan Bolek