Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Any way to reduce waits over the link? (SQL*Net message from dblink)

2,798 views
Skip to first unread message

NetComrade

unread,
Sep 10, 2003, 12:41:19 PM9/10/03
to
The queries are fine (not pulling whole tables)
The network is fine (running at about half the capacity)
Db's are not direct-connected, traffic goes over a switch, 100Mbps
ethernet.
When looking at trace files, it looks like oracle does some kind of
checking over the link to make sure the objects are valid, that might
be a contributor to these waits, any way to get rid of this?
Some queries have already been 'tuned' to use snapshots, but some
other queries require instanteneous updates to local tables, so
snapshots won't work.

My last statspack for an hour shows:
Avg
Time TotalWait wait Waits
Event Waits outs Time(cs) (ms) /txn
--------------------------- -------- ---- -------- ---- ------
SQL*Net message from dblink 1,428,648 55 454,719 3 7.1
db file sequential read 588,295 0 440,978 7 2.9
log file sync 94,051 5 67,230 7 0.5
db file parallel write 31,395 0 44,485 14 0.2

454,719cs/100/60=75mins of wait time on a 10cpu machine
the file reads cannot really be tuned, unless i increase the buffer
cache a bit, but it's running at 99% hit rate.
log file sync's can't be tuned either, and they don't bother me much,
redo is running on dedicated disks, striping didn't help.
.......
We use Oracle 8.1.7.4 on Solaris 2.7 boxes
remove NSPAM to email

Brian Peasland

unread,
Sep 10, 2003, 2:40:15 PM9/10/03
to
> The queries are fine (not pulling whole tables)

Are you sure? Have you taken the query that is sent to the remote
database and tuned it on that remote database? What percentage of your
data are you pulling across your database link as opposed to the amount
of data you actually need on the local database? Maybe its a good idea
to do a full table scan instead of an index lookup! One never knows for
sure until they tune the query on the remote database.

> The network is fine (running at about half the capacity)

Just because a network runs at 50% capacity or 10% capacity does not
necessarily mean that data is moving fast across the network. Granted,
if the network is at 100% capacity, data will move slower....

> Avg
> Time TotalWait wait Waits
> Event Waits outs Time(cs) (ms) /txn
> --------------------------- -------- ---- -------- ---- ------
> SQL*Net message from dblink 1,428,648 55 454,719 3 7.1

This one event contributes approximately 50% of your total wait time.
This event will only occur if you are waiting on communication from a
remote database, initiated at your local database. i.e. a database link
is involved here between two databases. Have you tuned your distributed
query?

> db file sequential read 588,295 0 440,978 7 2.9

You might want to see if this event can be tuned too since it accounts
for nearly the other 50% of your total wait time.

> log file sync 94,051 5 67,230 7 0.5
> db file parallel write 31,395 0 44,485 14 0.2
>
> 454,719cs/100/60=75mins of wait time on a 10cpu machine
> the file reads cannot really be tuned, unless i increase the buffer
> cache a bit, but it's running at 99% hit rate.

Why can't the file reads be tuned? You may be requesting more logical
reads than you really need, which *may* translate to more physical reads
than you need. The BCHR at 99% doesn't mean that this wait event is a
waste of time to tune. This one event contributes 44% of your total wait
time. It may be a good idea to investigate this further. Is there I/O
contention on the disks or in the controllers? Are these disks fast
enough? Your BCHR of 99% does not mean that you don't have I/O tuning
(logical and/or physical) to do. That type of reasoning is a myth.

> log file sync's can't be tuned either, and they don't bother me much,
> redo is running on dedicated disks, striping didn't help.

HTH,
Brian

--
===================================================================

Brian Peasland
dba@remove_spam.peasland.com

Remove the "remove_spam." from the email address to email me.


"I can give it to you cheap, quick, and good. Now pick two out of
the three"

Tanel Poder

unread,
Sep 10, 2003, 3:09:56 PM9/10/03
to
Hi!

My first question would be, what's the problem? Is your application slow or
are you just tuning everything what appears in statspack top-5 list? (In
other words - CTD is a disease probably even worse than SARS, thousands of
DBAs have left their lives and spend all their time tuning until they die
(or the sama happens to all of their databases) ;)

Note that when you post a statspack top-5 output taken in some timeframe,
you should also post the information how much CPU was used during this
timeframe, this way we can see the ratio between service & wait times and
recommend whether it'd be appropriate to concentrate on waits at all. There
should be statistic called CPU used ... or something in SP report as well.

But anyway, your stats report shows that during this hour all of your
sessions were spending totally 12,5% on waiting for data from dblink. You
have about 7 waits per transaction and average wait lasts for 3ms. You have
two options here, either reduce number of waits (thus reduce number of
roundtrips to remote server by tuning sql etc or reduce the average duration
of a wait).
I can't help you with sql, but wait time "more data from dblink" generally
breaks to two: 1) network latency(wait time) 2) query execution&fetching
(service time).

1) network latency can be tuned using TCP parameters, using faster protocol
(named pipes in windows maybe), using dedicated network, using faster
network, etc..
2) query execution & fetching - this is normal query response time tuning,
just in remote server.

> The queries are fine (not pulling whole tables)

When latency is a problem, it is wiser to pull more data with less requests.
This is design & SQL issue.

> The network is fine (running at about half the capacity)
> Db's are not direct-connected, traffic goes over a switch, 100Mbps
> ethernet.
> When looking at trace files, it looks like oracle does some kind of
> checking over the link to make sure the objects are valid, that might
> be a contributor to these waits, any way to get rid of this?

If you're joining tables in remote location, then create views with required
join conditions there and select from them instead of drectly from tables.
That way you can save sqlnet roundtrips required for checking validity (or
whatever) of referenced objects. That way you only evaluate your view, not
any underlying objects. Also, note that selecting the first time over
database link in a session does more roundtrips than subsequent operations.
Thus if you got a web application which always spawns a new session, the
overhead could come there..

Tanel.

Tanel Poder

unread,
Sep 10, 2003, 3:42:36 PM9/10/03
to
Here's a little proof to my post (that selecting from a view in remote database which is doing join of some tables, causes fewer sqlnet roundtrips than doing the join directly, without a view):
 
C:\Work\Oracle>sqlplus "admin/admin"
 
SQL*Plus: Release 9.2.0.4.0 - Production on K Sep 10 22:15:25 2003
 
Copyright (c) 1982, 2002, Oracle Corporation.  All rights reserved.
 

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 - Production
 
SQL> create table t1 (id number, name varchar(10));
 
Table created.
 
SQL> create table t2 (id number, name varchar(10));
 
Table created.
 
I create a database link which actually references to my own schema, but it will do the job.
 
SQL> create database link l connect to admin identified by admin using 'orcl';
 
Database link created.
 
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                         0
 
SQL> select t1.name name1, t2.name name2 from t1@l, t2@l where t1.id = t2.id;
 
no rows selected
 
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                        12
 
First select to these 2 tables causes 12 sqlnet roundtrips

SQL> select t1.name name1, t2.name name2 from
t1@l, t2@l where t1.id = t2.id;
 
no rows selected
 
SQL>
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                        17
 
Second one only 5
 
SQL> select t1.name name1, t2.name name2 from t1@l, t2@l where t1.id = t2.id;
 
no rows selected
 
SQL>
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                        22
 
Third and any subsequent steps take also 5 roundtrips - so the cost for this kind of operation is about 5 RT per query.
Now let's create a view to remote server which does exactly the same join (since I made a loopback dblink to my own server then I am both local and remote server, though using different sessions)
 
SQL> create view v as select t1.name name1, t2.name name2 from t1, t2 where t1.id = t2.id;
 
View created.
 
SQL>
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                        23
 
Note that after creating the view, SQL*Net roundtrips increased by one for some reason? This is because when you select anything over database link, a distributed transaction is started. But since "create view" is a DDL command, it issues a implicit commit, thus causing an acknowledgement to be sent to remote server. You can verify this from v$transaction when you've run a select over dblink.
 
SQL> select name1, name2 from v@l;
 
no rows selected
 
SQL>
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                        27
 
When selecting exactly the same "data" from a view, we get only 3 additional roundtrips
 
SQL> select name1, name2 from v@l;
 
no rows selected
 
SQL>
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                        30
 
SQL> select name1, name2 from v@l;
 
no rows selected
 
SQL>
SQL> select n.name, s.value from v$sesstat s, v$statname n where n.statistic# = s.statistic#
  2  and s.sid = (select sid from v$mystat where rownum < 2)
  3  and n.name like 'SQL*Net roundtrips to/from dblink';
 
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
SQL*Net roundtrips to/from dblink                                        33
 
SQL> select name1, name2 from v@l;
 
no rows selected
 
And 3 it remains. So the number of sqlnet roundtrips (thus waits) went down 40% for this simple set of tables.
 
Cheers,
Tanel.
 

Brian Peasland

unread,
Sep 10, 2003, 4:27:13 PM9/10/03
to
And this is part of what I was referring to when I said that one needs
to tune the query. In a distributed query, there is the part that gets
executed locally, and the part the gets executed remotely. One needs to
tune the local and the remote ends of the query. In your case, the join
was quicker to be done remotely then to bring back the data and join it
locally.

Cheers,
Brian

--

NetComrade

unread,
Sep 11, 2003, 12:22:37 PM9/11/03
to
On Wed, 10 Sep 2003 18:40:15 GMT, Brian Peasland
<dba@remove_spam.peasland.com> wrote:

>> The queries are fine (not pulling whole tables)
>
>Are you sure?

I wouldn't say it if I wasn't :)
The tables that the distributed queries are joined with are large
(millions or 10's of millions of rows), and are executed very, very
frequently. The explain plans always show the 'remote' query to be
using a where clause on an indexed column. Such queries on the
'master' database have been looked at, and require no further tuning.

> Have you taken the query that is sent to the remote
>database and tuned it on that remote database? What percentage of your
>data are you pulling across your database link as opposed to the amount
>of data you actually need on the local database?

Are you talking about a percentage in a query, or in a percentage of
overall data?
Some queries are executed completely on a remote (master) site,
therefore 100% of data is pulled, some queries are 50-50, some are
70-30 (local-remote). As for overall amount of data, I would guess
about 10-15 percent of data is pulled across the link. This is an
"OLTP" server, 99% of queries take less than .1 seconds to execute.

> Maybe its a good idea
>to do a full table scan instead of an index lookup! One never knows for
>sure until they tune the query on the remote database.

I don't think pulling any of the tables across the link is going to be
beneficial. In rear cases when such queries crawl into production,
they cause havoc on the 'master' db. Distributed queries are very
difficult to tune, sometimes they require a subquery to use the index,
sometimes a join works better.

>
>> The network is fine (running at about half the capacity)
>
>Just because a network runs at 50% capacity or 10% capacity does not
>necessarily mean that data is moving fast across the network. Granted,
>if the network is at 100% capacity, data will move slower....
>
>> Avg
>> Time TotalWait wait Waits
>> Event Waits outs Time(cs) (ms) /txn
>> --------------------------- -------- ---- -------- ---- ------
>> SQL*Net message from dblink 1,428,648 55 454,719 3 7.1
>
>This one event contributes approximately 50% of your total wait time.
>This event will only occur if you are waiting on communication from a
>remote database, initiated at your local database. i.e. a database link
>is involved here between two databases. Have you tuned your distributed
>query?

Yes, database link is obviously involved. See notes above. Queries are
fine.


>> db file sequential read 588,295 0 440,978 7 2.9
>
>You might want to see if this event can be tuned too since it accounts
>for nearly the other 50% of your total wait time.
>>

>> 454,719cs/100/60=75mins of wait time on a 10cpu machine
>> the file reads cannot really be tuned, unless i increase the buffer
>> cache a bit, but it's running at 99% hit rate.
>
>Why can't the file reads be tuned? You may be requesting more logical
>reads than you really need, which *may* translate to more physical reads
>than you need. The BCHR at 99% doesn't mean that this wait event is a
>waste of time to tune. This one event contributes 44% of your total wait
>time. It may be a good idea to investigate this further. Is there I/O
>contention on the disks or in the controllers? Are these disks fast
>enough? Your BCHR of 99% does not mean that you don't have I/O tuning
>(logical and/or physical) to do. That type of reasoning is a myth.
>

Unfortunately, increasing the buffer cache would marginally bring any
positive results. There is only a small set of data that is shared
between users, and stays fairly static most of the day (and is cached
on front ends anyway), the rest of data belongs to individual web
users, or possibly small groups of users (groups of 10-20, out of
400K). I can try increasing the buffer cache, but we have no io wait
problem (disks are not overloaded, that same statspack shows:
Physical reads: 185.34 persecond, on a 10disk
stripe-mirror set(Sun A5200, via fibre hubs) there are 2 more
databases on the same disks, but they currently have very little
activity). The block size is 4K. That said, I'll try increasing it
anyway :), but i doubt this problem is going away until we upgrade the
disks (currently 10K, but a software raid (Volume Manager).

As for having too many 'logical' reads, that _might_ be the case for
some queries, but all queries are scrutinized all the time, and there
are no 'bad-performers' currently (well, there is one, but that does
lots of memory reads--20K or so, which don't translate into physical
reads, the rest are multi-joins, and have no more than 20-50 logical
reads, with few execeptions that look at a few hundred rows).

NetComrade

unread,
Sep 11, 2003, 12:35:35 PM9/11/03
to
On Wed, 10 Sep 2003 22:09:56 +0300, "Tanel Poder"
<change_to_m...@integrid.info> wrote:

>Hi!
>
>My first question would be, what's the problem? Is your application slow or
>are you just tuning everything what appears in statspack top-5 list? (In
>other words - CTD is a disease probably even worse than SARS, thousands of
>DBAs have left their lives and spend all their time tuning until they die
>(or the sama happens to all of their databases) ;)

I don't like wasting my time on tuning things that don't need to be
tuned :) The problem is that during peak times the load on the server
jumps to levels which I don't feel are acceptable (25 on a 10 CPU
machine), sometimes to a point where clients (web app) cannot connect
to the db anymore. I don't know how much an hour of wait time
contributes to load, but all that context switching probably has to
contribute some significant cpu usage.


>Note that when you post a statspack top-5 output taken in some timeframe,
>you should also post the information how much CPU was used during this
>timeframe, this way we can see the ratio between service & wait times and
>recommend whether it'd be appropriate to concentrate on waits at all. There
>should be statistic called CPU used ... or something in SP report as well.

Here it is:
CPU used by this session 909,953 252.8

>But anyway, your stats report shows that during this hour all of your
>sessions were spending totally 12,5% on waiting for data from dblink. You
>have about 7 waits per transaction and average wait lasts for 3ms. You have
>two options here, either reduce number of waits (thus reduce number of
>roundtrips to remote server by tuning sql etc or reduce the average duration
>of a wait).

That's what I am trying to do :)

>I can't help you with sql, but wait time "more data from dblink" generally
>breaks to two: 1) network latency(wait time) 2) query execution&fetching
>(service time).

More data would probably indicate pulling lots of data, such stats is
low:
SQL*Net more data from dblin 2,554 0 480 2
0.0

Queries are tuned, see my other post.

>1) network latency can be tuned using TCP parameters, using faster protocol
>(named pipes in windows maybe), using dedicated network, using faster
>network, etc..

The databases are on different physical machines. I'll be happy to use
any other protocol that can go over the ethernet and work faster than
TCP.
The latency doesn't seem to be an issue:
41 packets transmitted, 41 packets received, 0% packet loss
round-trip (ms) min/avg/max = 0/0/2
Bandwidth is not an issue, we monitor it via MRTG. Not sure if GigE is
going to help..


>If you're joining tables in remote location, then create views with required
>join conditions there and select from them instead of drectly from tables.
>That way you can save sqlnet roundtrips required for checking validity (or
>whatever) of referenced objects. That way you only evaluate your view, not
>any underlying objects. Also, note that selecting the first time over
>database link in a session does more roundtrips than subsequent operations.
>Thus if you got a web application which always spawns a new session, the
>overhead could come there..

Interesting suggestion.. Haven't thought about it (about creating
views on joined-remote queries). Unforrtunately this one would require
some application changes, but I'll definetely take it into
consideration.
As for your second point, yes, it is our problem a little. Although
our web application doesn't connect on every request, it does
disconnect and reconnect after N page requests, something we are
working on to eliminate.

NetComrade

unread,
Sep 11, 2003, 2:08:46 PM9/11/03
to
Good stuff..
I just want to confirm you're saying that:

SQL*Net roundtrips to/from dblink 1,429,402 397.1/sec 7.1/tran
contribute to


SQL*Net message from dblink 1,428,648 55 454,719 3 7.1

I can think of at least two frequently executed query that does a
join, and would benefit from this, but I doubt it's going to help
significantly.

Thanks.

On Wed, 10 Sep 2003 22:42:36 +0300, "Tanel Poder"
<change_to_m...@integrid.info> wrote:


>Here's a little proof to my post (that selecting from a view in remote =
>database which is doing join of some tables, causes fewer sqlnet =


>roundtrips than doing the join directly, without a view):
>

.......

0 new messages