Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

distributed transaction hangs

1 view
Skip to first unread message

andrei...@gmail.com

unread,
Jan 5, 2007, 11:03:24 AM1/5/07
to
Hello,

My question is: how to determine which session on the remote db
corresponds to the local session while a distributed transaction is in
progress? In Note:118219.1 on metalink it is written that the
V$SESSION.PROCESS and V$SESSION.MACHINE will have the same values on
all nodes, but it seems to be not true to me, I guess that article is
old and things have changed since then, because I don't see any
sessions with same process and machine values on the remote db.

Basically the problem which I have is that in some cases (very rarely)
it happens so that after executing some DML over dblink, the local
session hangs waiting for "SQL*Net message from dblink" event. And all
other sessions who execute the same procedure hang waiting for the same
event. Which causes eventually the maximum number of session limit
being reached and I have to restart the database, because killing them
one by one is too inconvinient and takes too much time.

So the fact that they are not dying on the distributed_lock_timeout=60,
must be telling that the remote session is not waiting for a lock? But
it must be waiting for something. So I need to find out which remote
session corresponds to the local one and see what is that remote
session up to. Are my reasonings correct, or must another kind of
approach be used for solving such problems?

wbr,
Andrei Kubar

Jonathan Lewis

unread,
Jan 5, 2007, 11:10:13 AM1/5/07
to

<andrei...@gmail.com> wrote in message
news:1168013004.2...@51g2000cwl.googlegroups.com...

It may be out of date by now (written in 2002 for 8.1.7)
but Mark Powell has a note that might help here:

http://www.jlcomp.demon.co.uk/faq/find_dist.html

--
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html


Mark D Powell

unread,
Jan 5, 2007, 5:20:35 PM1/5/07
to

On Jan 5, 11:10 am, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:
> <andrei.ku...@gmail.com> wrote in messagenews:1168013004.2...@51g2000cwl.googlegroups.com...


>
>
>
>
>
> > Hello,
>
> > My question is: how to determine which session on the remote db
> > corresponds to the local session while a distributed transaction is in
> > progress? In Note:118219.1 on metalink it is written that the
> > V$SESSION.PROCESS and V$SESSION.MACHINE will have the same values on
> > all nodes, but it seems to be not true to me, I guess that article is
> > old and things have changed since then, because I don't see any
> > sessions with same process and machine values on the remote db.
>
> > Basically the problem which I have is that in some cases (very rarely)
> > it happens so that after executing some DML over dblink, the local
> > session hangs waiting for "SQL*Net message from dblink" event. And all
> > other sessions who execute the same procedure hang waiting for the same
> > event. Which causes eventually the maximum number of session limit
> > being reached and I have to restart the database, because killing them
> > one by one is too inconvinient and takes too much time.
>
> > So the fact that they are not dying on the distributed_lock_timeout=60,
> > must be telling that the remote session is not waiting for a lock? But
> > it must be waiting for something. So I need to find out which remote
> > session corresponds to the local one and see what is that remote
> > session up to. Are my reasonings correct, or must another kind of
> > approach be used for solving such problems?
>
> > wbr,

> > Andrei KubarIt may be out of date by now (written in 2002 for 8.1.7)


> but Mark Powell has a note that might help here:
>
> http://www.jlcomp.demon.co.uk/faq/find_dist.html
>
> --
> Regards
>

> Jonathan Lewishttp://jonathanlewis.wordpress.com
>
> Author: Cost Based Oracle: Fundamentalshttp://www.jlcomp.demon.co.uk/cbo_book/ind_book.html
>
> The Co-operative Oracle Users' FAQhttp://www.jlcomp.demon.co.uk/faq/ind_faq.html- Hide quoted text -- Show quoted text -

The queries still work on 9.2.0.6 and I am pretty sure I tested them on
our 10.1.0.4 system before we had to remove it.

>From the description of the problem it sounds as if the problem might
be related to the distributed locks taken in RBS segments. Do these
sessions that issue remote queries commit. If not, change them to do
so. At the bottom of the referenced article is a link to another
article that demostrates that remote queries are transactions and hold
RBS entries until the transaction is terminated (commit/rollback).

HTH -- Mark D Powell --

Mark D Powell

unread,
Jan 6, 2007, 11:57:26 AM1/6/07
to

Andrei,


>> So the fact that they are not dying on the distributed_lock_timeout=60, must be telling that the remote session is not waiting for a lock? But it must be waiting for something <<

The problem could be just poorly performing SQL, that is, the
distributed SQL needs to be tuned.

Here is a full example of how to find the remote query.

-- issue distributed query
SQL> select count(*) from mpowel01.r...@put.world;

COUNT(*)
----------
3

-- find out local session sid
SQL> select sid from v$mystat where rownum = 1;

SID
----------
35

-- get local session info
-- session_sid is a join of v$session to v$process for col of interest
SQL> @mon/session_sid
Enter value for session_id: 35

USERNAME OSUSER STATUS STATEMENT SID SERIAL#
------------ ------------ -------- ---------------- ------ -------
SVR PROCESS APPL MACHINE APPL PROCESS LOCKWAIT TM FR LAST
------------ --------------- ------------ ---------------- ----------
MPOWEL01 mpowel01 ACTIVE Select 35 14384
69814 ddcdev1 183418 000:00:00

-- on remote system run distr_fr_remote script
-- notice "<==" that remote v$session.process = local v$process.spid
PUT1 > @ora/distr/distr_fr_remote

USERNAME OSUSER STATUS SID SERIAL#
------------ ------------ -------- ---------- ----------
MACHINE
PROCESS
----------------------------------------------------------------
------------
TERMINAL PROGRAM
------------------------------
------------------------------------------------
SLOL mpowel01 INACTIVE 45 13094
ddcdev1 69814
<==
oracle@ddcdev1 (TNS V1-V3)

SLOL mmacle01 INACTIVE 14 1071
ddcdev1 165388
oracle@ddcdev1 (TNS V1-V3)


-- get session information for sessions that are initiated remotely

PUT1 > @mon/session_sid
Enter value for session_id: 45

USERNAME OSUSER STATUS STATEMENT SID SERIAL#
------------ ------------ -------- ---------------- ------ -------
SVR PROCESS APPL MACHINE APPL PROCESS LOCKWAIT TM FR LAST
------------ --------------- ------------ ---------------- ----------
SLOL mpowel01 INACTIVE Idle 45 13094
70334 ddcdev1 69814 000:02:21

andrei k.

unread,
Jan 8, 2007, 2:55:11 PM1/8/07
to
> > >From the description of the problem it sounds as if the problem might
> > be related to the distributed locks taken in RBS segments. Do these
> > sessions that issue remote queries commit. If not, change them to do
> > so. At the bottom of the referenced article is a link to another
> > article that demostrates that remote queries are transactions and hold
> > RBS entries until the transaction is terminated (commit/rollback).

If it is related to the locks, then should I see at the remote db, that
there are some locks which have v$lock.block = 1 ? If so, then it is
not related to the locks, since I didn't see any locks with block = 1
at the remote db.

> The problem could be just poorly performing SQL, that is, the
> distributed SQL needs to be tuned.

no, the SQL it is waiting for is not more than a simple INSERT of one
row..


> -- notice "<==" that remote v$session.process = local v$process.spid

thanks for this one! is it basically what I needed. Only in my case it
was like this:
at local db:
v$process.spid = 134932
and at the remote db
v$session.process = 1048:134932

so it has put some "1048:" in front of it, what ever that means.. but
still I can catch the remote session now.

the query from your FAQ, which uses the x$k2gte view also works with
me! So thanks a lot, now I'm ready to make some further investigations
when the problems appears again.. I'll be back here then I guess ;)


br.Andrei

0 new messages