Slower performance after enabling async io on Oracle Linux

Matthias Hoys

unread,

May 14, 2012, 5:03:56 AM5/14/12

to

Hello,

Platform is Oracle Linux 6.2 64-bit with Oracle 11.2.0.3 64-bit.

After enabling async io, I see slower performance for data pump exports, RMAN backups and long-running queries (between 10 and 15% slower).

I enabled async io with:

ALTER SYSTEM SET FILESYSTEMIO_OPTIONS=SETALL SCOPE=SPFILE;
aio-nr is 19052 and aio-max-nr is 1048576. Storage is on iSCSI.

Anyone else with the same experience? Is extra tuning for async io possible on Oracle Linux?

Thanks,
Matthias Hoys

joel garry

unread,

May 14, 2012, 12:17:31 PM5/14/12

to

Well, it could be simply squeezing the bottleneck balloon, where
something goes faster and consequently something depending on it slows
down (for example, more cpu required to service more I/O). Some
hugepages misconfiguration could have strange consequences simply from
wasted memory, too.

Also see, for example, http://blog.tanelpoder.com/2010/11/23/asynch-descriptor-resize-wait-event-in-oracle/
fixed by your patch, but the linux/oracle async interaction is so
delicate, you may be seeing something unfixed by the patch. Any
errors like
http://www.freelists.org/post/oracle-l/WARNINGio-submit-failed-due-to-kernel-limitations-MAXAIO-for-process128-pending-aio127,2
?

jg
--
@home.com is bogus.
http://www.utsandiego.com/news/2012/may/10/hackers-taking-aim-small-medium-businesses/

Mladen Gogala

unread,

May 14, 2012, 8:52:33 PM5/14/12

to

Matthias, I the problem may as well lie on the network side. Check netstat
-s for timeouts, retransmits or packets being dropped. Depending on the
hardware, you may want to use jumbo packets, increase rmem and wmem
parameters. Below are two articles about tuning the network:

http://fasterdata.es.net/host-tuning/linux/
http://www.ibm.com/developerworks/linux/library/l-adfly/index.html

Problem with Linux is that it is not well instrumented, there is no wait
event interface so you cannot use the same technique as with the
database. You can use systemtap to explore the performance of the network
layer and figure out what is taking so long.
Also, hopefully your iSCSI traffic doesn't have to go through any
firewall? Packet inspection can take time, just as with the customs
office.

So, if you want a recipe, measure the duration of the system calls by
using strace and systemtap. Check netstat -s, make sure that your network
is optimally configured.

--
http://mgogala.byethost5.com

Mladen Gogala

unread,

May 14, 2012, 8:58:49 PM5/14/12

to

On Tue, 15 May 2012 00:52:33 +0000, Mladen Gogala wrote:

> Matthias, I the problem may as well lie on the network side.

BTW, can you test I/O against a local disk? Does it exhibit the same
slowdown or not? If not, your culprit is the network. It can be anything
from bad configuration, faulty port, adapter or cable or even some
forgotten firewall between the iSCSI initiator and the server. Let us
know what was the problem once you solve the mystery.

--
http://mgogala.byethost5.com

Matthias Hoys

unread,

May 15, 2012, 9:01:51 AM5/15/12

to

On Monday, May 14, 2012 6:17:31 PM UTC+2, joel garry wrote:
> Well, it could be simply squeezing the bottleneck balloon, where
> something goes faster and consequently something depending on it slows
> down (for example, more cpu required to service more I/O). Some
> hugepages misconfiguration could have strange consequences simply from
> wasted memory, too.
>
> Also see, for example, http://blog.tanelpoder.com/2010/11/23/asynch-descriptor-resize-wait-event-in-oracle/
> fixed by your patch, but the linux/oracle async interaction is so
> delicate, you may be seeing something unfixed by the patch. Any
> errors like
> http://www.freelists.org/post/oracle-l/WARNINGio-submit-failed-due-to-kernel-limitations-MAXAIO-for-process128-pending-aio127,2
> ?
>
> jg
> --
> @home.com is bogus.
> http://www.utsandiego.com/news/2012/may/10/hackers-taking-aim-small-medium-businesses/

Using vmstat/top, I don't see anything which could indicate a CPU bottleneck. I didn't configure hugepages, since this is a small Linux box with only 2GB SGA.
When async io is enabled, I only see very few of the "asynch descriptor resize" waits. When filesystemio_options is set to NONE and disk_asynch_io is set to TRUE, the following warnings occur in archive logging trace files:
"Log read is SYNCHRONOUS though disk_asynch_io is enabled!".
There are no "maxaio kernel limitation" errors...

Matthias

Matthias Hoys

unread,

May 15, 2012, 9:16:46 AM5/15/12

to

On Tuesday, May 15, 2012 2:52:33 AM UTC+2, Mladen Gogala wrote:
> Matthias, I the problem may as well lie on the network side. Check netstat
> -s for timeouts, retransmits or packets being dropped. Depending on the
> hardware, you may want to use jumbo packets, increase rmem and wmem
> parameters. Below are two articles about tuning the network:
>
> http://fasterdata.es.net/host-tuning/linux/
> http://www.ibm.com/developerworks/linux/library/l-adfly/index.html
>
> Problem with Linux is that it is not well instrumented, there is no wait
> event interface so you cannot use the same technique as with the
> database. You can use systemtap to explore the performance of the network
> layer and figure out what is taking so long.
> Also, hopefully your iSCSI traffic doesn't have to go through any
> firewall? Packet inspection can take time, just as with the customs
> office.
>
> So, if you want a recipe, measure the duration of the system calls by
> using strace and systemtap. Check netstat -s, make sure that your network
> is optimally configured.
>

Using netstat -s, I think I found something:

Tcp:
1168305 active connections openings
264105 passive connection openings
910210 failed connection attempts

I'm no network specialist, but it seems like there are a huge amount of "failed connection attempts"?? Or is this normal for iSCSI, I never worked with iSCSI storage before...

Btw: we don't have a local disk, everything is on VMWare and the disks are on the iSCSI box.

Thanks for the help,
Matthias Hoys

Matthias Hoys

unread,

May 15, 2012, 10:07:47 AM5/15/12

to

Update: the Oracle Listener is causing all the "failed connection attempts". When I stop it, no more failed connection attempts. I'm now further debugging this...

Matthias

Matthias Hoys

unread,

May 15, 2012, 10:30:17 AM5/15/12

to

Problem solved: I saw the following in the listener.log file: "WARNING: Subscription for node down event still pending". The solution is to add "SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF" to the listener.ora file, and restart the listener. Apparently, the listener service constantly tries to contact a RAC service, but this is a non-RAC installation. It was already a known issue with Oracle 10g. Way to go, Oracle ;-) Let's see if this improves the async io performance.

Matthias Hoys

joel garry

unread,

May 15, 2012, 12:15:51 PM5/15/12

to

lol, I ran into that on hp-ux, but since you are linux, I had no clue
to say anything. The symptoms and async i/o are different, but the
message is the same.

jg
--
@home.com is bogus.

http://www.pcworld.com/businesscenter/article/253980/oracle_considered_buying_rim_and_palm_ellison_says.html
http://tkyte.blogspot.com/2005/09/lefties-unite.html#112653167997033408

Mladen Gogala

unread,

May 15, 2012, 2:05:53 PM5/15/12

to

On Tue, 15 May 2012 07:30:17 -0700, Matthias Hoys wrote:

> Problem solved: I saw the following in the listener.log file: "WARNING:
> Subscription for node down event still pending". The solution is to add
> "SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF" to the listener.ora file,
> and restart the listener. Apparently, the listener service constantly
> tries to contact a RAC service, but this is a non-RAC installation. It
> was already a known issue with Oracle 10g. Way to go, Oracle ;-) Let's
> see if this improves the async io performance.
>
> Matthias Hoys

Please, let us know what the outcome is. I am really curious now.

--
http://mgogala.byethost5.com

Mladen Gogala

unread,

May 16, 2012, 12:32:59 AM5/16/12

to

On Tue, 15 May 2012 09:15:51 -0700, joel garry wrote:

> lol, I ran into that on hp-ux, but since you are linux, I had no clue to
> say anything. The symptoms and async i/o are different, but the message
> is the same.

iSCSI has two components: disks and network. Who has ever heard of
network causing trouble?

--
http://mgogala.byethost5.com

Matthias Hoys

unread,

May 16, 2012, 4:32:51 AM5/16/12

to

On Tuesday, May 15, 2012 8:05:53 PM UTC+2, Mladen Gogala wrote:
>
> Please, let us know what the outcome is. I am really curious now.
>

It doesn't make any difference. Sysadmin checked everything between the guest/host and the iSCSI box, they didn't see anything wrong.
I now analyzed a long-running query (> 5 minutes) with a level 8 10046 trace.
This is with async io enabled (disk_asynch_io=TRUE,filesystemio_options=SETALL):

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 3 0.00 0.00
db file sequential read 143264 0.78 242.64
latch: cache buffers lru chain 2 0.00 0.00
latch: object queue header operation 2 0.00 0.00
log file switch completion 3 0.06 0.15
db file scattered read 2333 0.07 4.93
asynch descriptor resize 1 0.00 0.00
log buffer space 3 0.13 0.25
direct path read temp 97 0.00 0.32
reliable message 67 0.00 0.03
latch: cache buffers chains 1 0.00 0.00

11 user SQL statements in session.
6 internal SQL statements in session.
17 SQL statements in session.
********************************************************************************
1 session in tracefile.
11 user SQL statements in trace file.
6 internal SQL statements in trace file.
17 SQL statements in trace file.
13 unique SQL statements in trace file.
5560145 lines in trace file.
716 elapsed seconds in trace file.

This is with async io disabled (disk_asynch_io=FALSE,filesystemio_options=NONE):

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
Disk file operations I/O 4 0.00 0.00
db file sequential read 166147 2.21 84.01
db file scattered read 2406 1.74 7.05
undo segment extension 2 0.01 0.01
log file switch completion 2 0.02 0.05
log buffer space 4 2.86 2.92
latch free 1 0.00 0.00
db file parallel read 2 0.15 0.19
direct path write temp 577 0.53 2.46
latch: shared pool 11 0.00 0.00
direct path read temp 577 0.01 0.54
latch: object queue header operation 1 0.00 0.00

11 user SQL statements in session.
6 internal SQL statements in session.
17 SQL statements in session.
********************************************************************************
1 session in tracefile.
11 user SQL statements in trace file.
6 internal SQL statements in trace file.
17 SQL statements in trace file.
13 unique SQL statements in trace file.
5583431 lines in trace file.
409 elapsed seconds in trace file.

So, you see, turning off async io drastically improves performance in my situation (409 versus 716 seconds). I'm going to leave it off for now, and maybe create a service request on MOS to see if they can help. I couldn't find a lot on OTN.

Matthias Hoys

Mladen Gogala

unread,

May 16, 2012, 8:41:32 AM5/16/12

to

On Wed, 16 May 2012 01:32:51 -0700, Matthias Hoys wrote:
>
> So, you see, turning off async io drastically improves performance in my
> situation (409 versus 716 seconds). I'm going to leave it off for now,
> and maybe create a service request on MOS to see if they can help. I
> couldn't find a lot on OTN.
>
> Matthias Hoys

Yup, with the "async" option, you have time waited for sequential read as
242.64 and for scattered read as 4.93 while without async the time for
single block IO is 84.01 and for scattered read 7.05. The worst problem
is for single block IO, while scattered read is even slightly faster than
the synchronous version. If I am allowed to guess, the problem is
caching. You may want to turn off direct IO and just leave async IO and
to set the read ahead on the block devices, with blockdev --setra.
As I have previously said, async I/O on Linux leaves a lot to be desired,
especially when the network is involved.

--
http://mgogala.byethost5.com

Matthias Hoys

unread,

May 16, 2012, 9:55:18 AM5/16/12

to

On Wednesday, May 16, 2012 2:41:32 PM UTC+2, Mladen Gogala wrote:
> Yup, with the "async" option, you have time waited for sequential read as
> 242.64 and for scattered read as 4.93 while without async the time for
> single block IO is 84.01 and for scattered read 7.05. The worst problem
> is for single block IO, while scattered read is even slightly faster than
> the synchronous version. If I am allowed to guess, the problem is
> caching. You may want to turn off direct IO and just leave async IO and
> to set the read ahead on the block devices, with blockdev --setra.
> As I have previously said, async I/O on Linux leaves a lot to be desired,
> especially when the network is involved.
>

You are right. It was the direct I/O that was causing the problems.
I now only enabled async I/O by setting filesystemio_options to ASYNCH and disk_asynch_io to TRUE, and I now have a performance which is about 10% faster than disabling everything:

Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------

db file sequential read 154100 1.81 62.99
reliable message 138 0.00 0.06
log file switch (checkpoint incomplete) 2 0.02 0.04
log buffer space 8 4.06 8.41

latch: cache buffers chains 1 0.00 0.00

latch: cache buffers lru chain 5 0.00 0.00
write complete waits 1 0.96 0.96
db file parallel read 1431 0.03 1.34
db file scattered read 2589 0.02 3.51

asynch descriptor resize 1 0.00 0.00

log file switch completion 2 0.03 0.06
library cache: mutex X 1 0.00 0.00

latch: object queue header operation 2 0.00 0.00

SQL*Net break/reset to client 2 0.00 0.00

1 session in tracefile.
11 user SQL statements in trace file.

15 internal SQL statements in trace file.
26 SQL statements in trace file.
21 unique SQL statements in trace file.
5576643 lines in trace file.
355 elapsed seconds in trace file.

Nice one :-)

Thanks,
Matthias

Mladen Gogala

unread,

May 16, 2012, 5:04:46 PM5/16/12

to

On Wed, 16 May 2012 06:55:18 -0700, Matthias Hoys wrote:

> Nice one
>
> Thanks, Matthias

I would still recommend making read and write network buffers larger and
switching to jumbo packets.

--
http://mgogala.freehostia.com