Large pages

Mladen Gogala

unread,

Nov 11, 2012, 2:01:17 AM11/11/12

to

There is a feature on Windows 2008 (64bit) called "large page
support", completely analogous to huge pages on Linux:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa366720(v=vs.85).aspx
http://www.stanford.edu/dept/itss/docs/oracle/10g/win.101/b10113/ap_64bit.htm#CHDGFJJD

I've built such an instance only once, but I am no longer working for the
company that I built it for. Does anybody else here use that? What are the
experiences? This has been mentioned on a discussion forum.

--
http://mgogala.byethost5.com

Noons

unread,

Nov 11, 2012, 6:32:17 AM11/11/12

to

Not on Windows, but in Aix and Linux, I would not use anything else.
Particularly with today's servers with very large memory sizes.

Jonathan Lewis

unread,

Nov 11, 2012, 4:51:43 PM11/11/12

to

"Mladen Gogala" <gogala...@gmail.com> wrote in message
news:k7nift$rc3$2...@solani.org...

Not used it for Windows - but I wonder how relevant (or not) it might be
when compared to the Unix variant. The commonly stated problem relates to
the size of the memory map - along the lines of having an 8 byte pointer
for an 8KB page.

In Unix (unless you use intimate shared memory) you get one map per
process - so an Oracle system with 1,000 processes would end up using as
much memory for maps of the SGA as it would on the SGA itself if it were
using standard pages. In Windows Oracle runs as a single process with
multiple threads - does this mean it needs only one map with the result
that standard pages don't present the same memory threat ?

Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com/all-postings

Author: Oracle Core (Apress 2011)
http://www.apress.com/9781430239543

Mladen Gogala

unread,

Nov 11, 2012, 7:24:26 PM11/11/12

to

On Sun, 11 Nov 2012 21:51:43 +0000, Jonathan Lewis wrote:

> Not used it for Windows - but I wonder how relevant (or not) it might be
> when compared to the Unix variant. The commonly stated problem relates
> to the size of the memory map - along the lines of having an 8 byte
> pointer for an 8KB page.
>
> In Unix (unless you use intimate shared memory) you get one map per
> process - so an Oracle system with 1,000 processes would end up using as
> much memory for maps of the SGA as it would on the SGA itself if it were
> using standard pages. In Windows Oracle runs as a single process with
> multiple threads - does this mean it needs only one map with the result
> that standard pages don't present the same memory threat ?

Yup, that's precisely why I ask. Windows do not have SYSVR4 IPC
mechanisms, probably because they do not belong to that family of
operating systems, and have completely different architecture. I played
with that once, but I no longer have access to the database I have built,
so I am asking around.

--
http://mgogala.byethost5.com

Noons

unread,

Nov 11, 2012, 11:34:06 PM11/11/12

to

On Nov 12, 11:24 am, Mladen Gogala <gogala.mla...@gmail.com> wrote:

> > In Unix (unless you use intimate shared memory) you get one map per
> > process - so an Oracle system with 1,000 processes would end up using as
> > much memory for maps of the SGA as it would on the SGA itself if it were
> > using standard pages. In Windows Oracle runs as a single process with

Actually, that is not correct. The problem has to do with the memory
map for ALL physical memory, the number of virtual memory processes
using it is irrelevant.

> Yup, that's precisely why I ask. Windows do not have SYSVR4 IPC
> mechanisms, probably because they do not belong to that family of
> operating systems, and have completely different architecture. I played
> with that once, but I no longer have access to the database I have built,
> so I am asking around.

Should be exactly the same. The problem has to do with mapping
virtual to physical memory for very large physical memory sizes. At
4K map per entry, a memory size of say, 50GB will easily force a TLA
table that has in excess of 10 million entries, which have to be
searched and resolved by the virtual memory firmware as well as the
OS, for normal memory management as well as statement execution and
branching.
It's got nothing to do with the number of processes. The number of
entries in the TLA table maps physical addresses to virtual addresses,
how many processes use those virtual addresses is immaterial.
The total size of the mapped physical memory does not change because a
process is using it or not.

Jonathan Lewis

unread,

Nov 13, 2012, 5:53:26 AM11/13/12

to

|"Noons" <wizo...@gmail.com> wrote in message
news:10f49835-b336-4d9a...@q5g2000pbk.googlegroups.com...

|On Nov 12, 11:24 am, Mladen Gogala <gogala.mla...@gmail.com> wrote:
|
|> > In Unix (unless you use intimate shared memory) you get one map per
|> > process - so an Oracle system with 1,000 processes would end up using
as
|> > much memory for maps of the SGA as it would on the SGA itself if it
were
|> > using standard pages. In Windows Oracle runs as a single process with
|
|Actually, that is not correct. The problem has to do with the memory
|map for ALL physical memory, the number of virtual memory processes
|using it is irrelevant.

Noons,

I can't work out from your statement exactly which bit of what I've said is
not correct. Could you please clarify.

The "memory map" certainly maps logical addresses to physical memory
addresses - easy enough to see (if you know the answer) when you look at
some of the x$ objects and see wildly different addresses for objects - for
example, Oracle maps the PGA for a process into an entirely different set
of (logical) addresses from the SGA - to the extent that the (logical) gap
between memory addresses can be far larger than the physical memory
available on the machine.

But (depending on O/S and choice of O/S parameters the page size used in
the mapping may vary, each process may have it's own map, or all processes
that attach to the same physical memory may share a single map of the
memory (e.g. Solaris Intimate Shared Memory). I wrote a note about this a
coupld of years ago highlighting a surprise side effect
(http://jonathanlewis.wordpress.com/2010/06/23/memory/ ); more
significantly Christo Kutrovsky (of Pythian) posted a video of a
presentation demonstrating the various effects (and how to measure them)
when configuring Linux.
(http://www.pythian.com/news/741/pythian-goodies-free-memory-swap-oracle-and-everything/
)

--

Mladen Gogala

unread,

Nov 13, 2012, 5:56:53 AM11/13/12

to

On Tue, 13 Nov 2012 10:53:26 +0000, Jonathan Lewis wrote:

> I can't work out from your statement exactly which bit of what I've said
> is not correct. Could you please clarify.

Jonathan, the part that is not correct is about every process on Unix
having its own page table. Page tables for shared memory are also shared.
There is one per segment, not one per process.

--
http://mgogala.byethost5.com

Jonathan Lewis

unread,

Nov 13, 2012, 12:24:32 PM11/13/12

to

"Mladen Gogala" <gogala...@gmail.com> wrote in message

news:k7t91l$l4e$1...@solani.org...

Mladen,

The reason that I referenced Christo's video and notes is that he has
results that agree with my comment and contradict yours. From the text:
<quote:>
As discussed earlier, each process has a page table. This page table is
private for the process and cannot be shared. (Solaris is different in this
respect.)

In Oracle, there is usually a large shared memory segment shared amongst
multiple processes. Each process still has a page table that is maintained.
For example, for a 1.7 Gb SGA (the typical 32-bit limit), 445,440 x 4 Kb
pages are needed. We would need 445,440 leaf PTE entries times 4 bytes
each - that's about 2 Mb. Each process would need a PTE table that is 2 Mb
in size to fully describe its 2 Gb of mappings. If you have a large number
of processes, say 1000, then you will need 2000 Mb of RAM to manage a 1.7
Gb SGA. Quite inefficient.

<end quote>

However,
a) he does mention Solaris and intimate shared memory - and explains that
shared page tables are possible
b) the article is dated Dec 2007 - so things may have changed

Do you have any specific versions of Unix in mind when you state that the
page table for a shared memory segment is automatically shared ? Is this,
perhaps a specific default for OEL.

Noons

unread,

Nov 13, 2012, 9:38:31 PM11/13/12

to

On Nov 13, 9:53 pm, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:

> I can't work out from your statement exactly which bit of what I've said is
> not correct. Could you please clarify.

You said:
>In Unix (unless you use intimate shared memory) you get one map per
>process - so an Oracle system with 1,000 processes would end up using
as
>much memory for maps of the SGA as it would on the SGA itself if it
were
>using standard pages.

That is incorrect, In Unix there is no such thing as a memory map per
process.
The largepages feature has nothing to do with the memory of each
process.
Memory map is an incorrect term to describe virtual memory translation
to physical addresses in the context of a process or group of
processes.

> The "memory map" certainly maps logical addresses to physical memory
> addresses - easy enough to see (if you know the answer) when you look at
> some of the x$ objects and see wildly different addresses for objects - for
> example, Oracle maps the PGA for a process into an entirely different set
> of (logical) addresses from the SGA - to the extent that the (logical) gap
> between memory addresses can be far larger than the physical memory
> available on the machine.

What memory Oracle allocates for each required chunk does not
constitute a memory map. There is no such thing as a process "memory
map". Memory allocation is not the same as memory mapping.

> But (depending on O/S and choice of O/S parameters the page size used in
> the mapping may vary, each process may have it's own map, or all processes
> that attach to the same physical memory may share a single map of the
> memory (e.g. Solaris Intimate Shared Memory).

Not quite correct. Page size is not an attribute of the process. It's
an attribute of the virtual memory management that is used to map a
given virtual address space to actual memory, whatever the process.

> I wrote a note about this a
> coupld of years ago highlighting a surprise side effect

> (http://jonathanlewis.wordpress.com/2010/06/23/memory/); more

> significantly Christo Kutrovsky (of Pythian) posted a video of a
> presentation demonstrating the various effects (and how to measure them)
> when configuring Linux.

> (http://www.pythian.com/news/741/pythian-goodies-free-memory-swap-orac...
> )

Yes, but what might happen in Linux - according to Christo - does not
define what Unix does: last time I looked they were NOT the same OS!

As I said: in Unix, largepages are not part of a process "memory map".
They are used to define a portion (NOT an address range, simply a
quantity) of physical memory that will be managed by virtual memory
pages of a certain size. It is perfectly possible - in fact almost
mandatory - to have multiple regions of physical memory dedicated to
different types of paging. In Aix for example, there are 3 sizes of
pages possible which can be used in various combinations by various
processes, singly or concurrently: 4K, 65K and 16M. In Aix 7.1 and P7
hardware there is a 4th size, but I won't go into that.

What one can do in Unix is allocate a certain amount of physical
memory to a certain type of paging. Once that is done, processes are
free to use that memory as native address space for execution, or as
attached segments of "shared" memory. As an example, it is perfectly
possible for a process using 4KB pages to attach a shared memory
segment that is managed - from the point of view of virtual-to-
physical addressing - as 16MB pages. Nothing to do with the process.
All processes attaching to that shared segment will use its pagesize
for those addresses, regardless of what they started with.

Here is the breakdown of such page sizes and how much physical memory
is managed by each pagesize in my Aix test system:

aubdc00-ora01t:sandy$vmstat -P ALL
System configuration: mem=16384MB
pgsz memory page
----- -------------------------- ------------------------------------
siz avm fre re pi po fr sr cy
4K 1222496 341519 615634 0 0 0 0 0 0
64K 32138 31995 143 0 0 0 0 0 0
16M 600 376 224 0 0 0 0 0 0

Or in round numbers:
9GB of 16MB pages,
2GB of 64K pages and
5GB of 4K pages.
for a grand total as indicated above of around 16GB of physical
memory.

Note that the free 16MB pags are 224 above (fre column): this will
become relevant below.

Where each pagesize region of physical memory starts is entirely up to
the OS to manage. It will be mapped to virtual addresses anyway, not
relevant here. What is important to note is that the memory managed by
each type of pagesize is contiguous, whatever its quantity may be.

A program can map virtual memory to physical memory with various page
sizes, or it can use its entire address space in one type of page
size. For example, I can run sqlplus in 4K pagespace, using a SGA
entirely in 16MB pages.

See above for the pages active in 16MB: they are the SGA, and there
are 376 of them, with 224 spare not being used at the moment.

If I now set the environment variables that cause Aix to execute
sqlplus ENTIRELY in 16MB space, I'll get the following:
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2390433 604576 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
aubdc00-ora01t:sandy$export
LDR_CNTRL=LARGE_PAGE_TEXT=Y@LARGE_PAGE_DATA=M
(from now on ALL programs in my session will be executing - BY DEFAULT
- entirely in 16MB pagesize - the column "flp" will measure how many
are free)
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2403604 620137 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 396 204
(notice how the free largepages - flp column - has dropped to 204?
That's vmstat itself running in largepages)
(Now for sqlplus:)
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:23:00 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2480102 600964 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 430 170
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release
11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2403607 603731 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 396 204
(Notice how flp dropped to 170 when sqlplus was running. And it's
back to 204 flp after I exit sqlplus and run vmstat in 16MB pages)

Now to prove my point, I'll turn off the largepages for the whole
process and show that sqlplus is NOT using up the largepages at all,
although of course it is attaching to an Oracle instance where the SGA
DOES use 16MB pages:
aubdc00-ora01t:sandy$unset LDR_CNTRL
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2390890 604159 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
(Notice how the flp has now popped back to 224, as initially - now
vmstat is running in the default 4K pages)
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:30:00 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2394755 600292 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
(and of course from inside sqlplus we can check vmstat and there is
the constant 224 flp again, as sqlplus is NOT using largepages
itself. Although it is attaching to a SGA that has been locked and is
using largepages)
SQL> sho parameter lock_sga
NAME TYPE VALUE
------------------------------------ -----------
------------------------------
lock_sga boolean TRUE

Now to prove that the SGA is indeed using the largepages, let's
shutdown and startup and check flp column while we do that:
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2390933 604112 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:39:08 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 782537 672407 0 0 0 0 0 0 10 4194 588 6 3 91 0
0.06 8.9 0 600
SQL> startup
ORACLE instance started.
Total System Global Area 6263357440 bytes
Fixed Size 2233112 bytes
Variable Size 1073745128 bytes
Database Buffers 5167382528 bytes
Redo Buffers 19996672 bytes
Database mounted.
Database opened.
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2386771 608244 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
SQL>

See how the free large pages (flp) changes as I shutdown and restart?
That's the SGA in 16MB pages, while the rest of the programs,
libraries and OS are using 4K pages.

What Christo might have found in Linux is SPECIFIC to Linux only, and
I strongly suspect to the specific version and flavour he was
running. Unix manages virtual memory in a different manner and there
are again small differences between Solaris, Aix, and others.

But the bottom line is: largepages (hugepages, or whatever we decide
to call them) have nothing to do with a per process count and ALL to
do with how much physical memory is addressed by each type of paging.

Processes simply use largepages or not, either for execution space or
as attached shared memory. Mixed page size spaces in a single program
are perfectly possible, precisely because the pagesize and how it maps
to physical memory is EXTERNAL to the entire per process memory
management.

All a process knows is it uses (virtual) memory, between numeric
addresses set by the OS. As in 0A000(hex) - 0FFFFF(hex). Those two
limits are set by the virtual memory management mechanism of the OS,
once, for each program. The CPU then maps those to physical addresses
using the TLA tables set up and managed by the OS, depending on memory
configuration.
There can be boundaries between those two limits, which are mapped to
pages of various sizes. Remember the old Oracle shared memory setups
of release 5 (and 6), where we had to re-link Oracle to use shared
memory in Unix? One of the parameters was the size of shared memory
available (SHMALL) and another (optional) was the virtual address
where it started. All to do with that.
Thank the Gods that's gone now!

The whole thing can get very complex very quickly and there are many,
many nuances so I won't go into a lot more detail here than I could.
In fact, I've already put too much into it and it will be difficult to
follow for a lot of folks: my apologies!

The whole field of virtual memory, paging translation and physical to
virtual mapping are a separate universe that has nothing to do with
individual processes, although of course it affects their execution.
This was explained in detail in some of the old McGraw-Hill OS books
from the 70s and 80s, which are unfortunately out of print nowadays -
but no less relevant. There is also at least one Dijkstra book that
describes the whole thing in detail and I do believe from a faint
memory Knuth talks about it as well in one of his tomes. IBM's online
technical library still has some excellent books on the subject, going
into much more detail.

Kevin and I have discussed this largepage thing quite a few times in
his blog, there are many entries there that might be worth folks
reading as he does a much better job of addressing this than my feeble
attempts to explain it.

Again, just to make my point very clear: Linux does NOT manage virtual
memory the same way as Unix nor is it valid to extrapolate from Linux
to Unix - or Windows, for that matter. If I may quote you, it all
must be tested for validity rather than assumed.
As such, I'd suggest that anyone looking at using largepages in
Windows give it a try and check results before assuming anything.
Worth doing? You bet: I get an immediate CPU usage improvement of
between 10 and 15% in Aix by just switching the SGA into 16MB: there
is a MUCH smaller TLA for the virtual memory translation to traverse
when I do that, and that traversing costs CPU cycles!

Mladen Gogala

unread,

Nov 13, 2012, 11:19:37 PM11/13/12

to

On Tue, 13 Nov 2012 17:24:32 +0000, Jonathan Lewis wrote:

> However,
> a) he does mention Solaris and intimate shared memory - and explains
that
> shared page tables are possible
> b) the article is dated Dec 2007 - so things may have changed
>
> Do you have any specific versions of Unix in mind when you state that
the
> page table for a shared memory segment is automatically shared ? Is
this,
> perhaps a specific default for OEL.

Jonathan, I don't see any results and the articles about Linux page
tables are surprisingly hard to find. I was unable to find the conclusive
proof. The best I was able to find are the following articles:

http://appcrawler.com/wordpress/2010/05/11/686/
http://lwn.net/Articles/149888/
http://lwn.net/Articles/149804/

However, logic tells me that it is not possible for every process to have
its own, independently maintained page table because there would be no way
to ensure coherence. If process A experiences a page fault and gets frame
1234 into its page table, it wouldn't be possible for the process B to
see that event without sharing page tables. And shared memory means not
only the same content but also the same addresses and sequence number.
There would be no way to ensure that without sharing page tables. The
proof can be seen from the fact that shared memory is only counted once
when you are doing sum of memory consumption per process.
It would be absolutely impossible to coordinate access if each process
was operating its own map of the shared memory.
That is why shared memory is a separate part of kernel, very expensive to
implement. One needs exceedingly complex, segment based memory management.
--
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com

Jonathan Lewis

unread,

Nov 14, 2012, 4:07:33 AM11/14/12

to

"Noons" <wizo...@gmail.com> wrote in message

news:2b35f5db-4533-4d76...@u4g2000pbo.googlegroups.com...

On Nov 13, 9:53 pm, "Jonathan Lewis" <jonat...@jlcomp.demon.co.uk>
wrote:

|You said:
|>In Unix (unless you use intimate shared memory) you get one map per
|>process - so an Oracle system with 1,000 processes would end up using
as
|>much memory for maps of the SGA as it would on the SGA itself if it
were
|>using standard pages.
|
|That is incorrect, In Unix there is no such thing as a memory map per
process.
|The largepages feature has nothing to do with the memory of each
process.
|Memory map is an incorrect term to describe virtual memory translation
to physical addresses in the context of a process or group of
|processes.

Noons,

Thanks for that (mostly cut) - I'm going to have to read through that lot
very carefully

Jonathan Lewis

unread,

Nov 14, 2012, 4:14:55 AM11/14/12

to

"Mladen Gogala" <gogala...@gmail.com> wrote in message

news:pan.2012.11...@gmail.com...

Mladen,

Thanks for the comments and links. I'll be following this stuff up to see
where I went wrong.

In passing - logic also dictates that I ask the question:
What happens if I attached to a shared memory segment AFTER I've
allocated a load of private memory and the (logical) addresses in the page
table for the private memory overlap the logical addresses in the page
table for the shared memory segment ? Clearly one can bypass the issue with
a little thought - e.g. by allocating from high logical addresses downwards
for shared and low upwards for private - but I hadn't previously thought
that point through.)

Noons

unread,

Nov 14, 2012, 5:49:58 AM11/14/12

to

Jonathan Lewis wrote,on my timestamp of 14/11/2012 8:07 PM:

>
> Thanks for that (mostly cut) - I'm going to have to read through that lot
> very carefully

Please do run it by Kevin to ensure I didn't fall into any inconsistency. The
subject is very complex and with many nuances and it is only too easy to say too
much or too little and mess up the overall message as a result. I struggle to
explain these concepts as English is not my prime language and it's not easy to
put it all in sensible, coherent text. If you need me to do any
testing/verification, just holler: I'm taking the next 2 days off annual leave
and have free time to go check in detail if needed.
What I said is mostly applicable to Unix, specifically to the latest releases of
Aix. Linux - and Windows, of course! - handle things differently.
I'll go view Christo's video to try and fully understand his points.

joel garry

unread,

Nov 14, 2012, 12:05:11 PM11/14/12

to

On Nov 13, 6:38 pm, Noons <wizofo...@gmail.com> wrote:

>
> The whole thing can get very complex very quickly and there are many,
> many nuances so I won't go into a lot more detail here than I could.
> In fact, I've already put too much into it and it will be difficult to
> follow for a lot of folks: my apologies!
>

Don't apologize; blog! There is surely a shortage of excellent Oracle
AIX exposition.

jg
--
@home.com is bogus.
"Focus, focus, focus." - Career advice from Warren Buffett.

Mladen Gogala

unread,

Nov 16, 2012, 10:46:25 AM11/16/12

to

On Wed, 14 Nov 2012 09:14:55 +0000, Jonathan Lewis wrote:

> Mladen,
>
> Thanks for the comments and links. I'll be following this stuff up to
> see where I went wrong.

Jonathan, I apologize for delay, but I am extremely busy these days. I'll
try responding during the Thanksgiving holiday, which is the next week.

Noons

unread,

Nov 19, 2012, 11:53:47 PM11/19/12

to

On Nov 15, 4:05 am, joel garry <joel-ga...@home.com> wrote:

>
> Don't apologize; blog! There is surely a shortage of excellent Oracle
> AIX exposition.

On a totally different note, nothing to do with Jonathan's point
whatsoever:

...and a lot of plain, DEAD WRONG lore in the so-called "community"
forums...

I've lost count of the number of times I've heard "IBM experts" and
other such claiming in forums that to get performance with Oracle in
Aix 5L (and later) one needs to mount all jfs2 file systems storing
database data with the "cio" option.

Which promptly removes any use of the file system buffer cache for ALL
Unix commands involving file IO on ANY file in such file systems!

It could have been true before 10g Oracle.
Since then, all we have to do is turn on filesystemio_options = SETALL
and bingo, ALL Oracle data files (and ONLY those) located on jfs2
filesystems will be opened with the cio option.
While using the Aix commands on them to copy/move/backup will use the
normal file system cache and not drop the speed of access.

But what do I see everywhere I go with Oracle in Aix, regardless of
what version? All jfs2 file systems mounted with cio option.
Now, THERE is a "rule of thumb" GUARANTEED to destroy ALL IO
performance other than through Oracle...
(groan...)

hhk...@gmail.com

unread,

Nov 20, 2012, 6:01:43 AM11/20/12

to

That's interesting... My reason for requesting the cio mount option for some AIX jfs2 filesystems (which BTW don't contain anything but Oracle .dbf files) is that otherwise any attempt at accessing the .dbf files from "outside" - cp or TSM ARCHIVE - fails with "Invalid argument" because the Oracle instance has the file open with the cio option due to filesystemio_options = SETALL, but the utility attempts to open the file without cio.

My need for this is caused by some .dbf files for a READ ONLY tablespace, which I want to archive on tape, preferably without first taking the tablespace OFFLINE, which is the only workaround I have found. I'm on AIX 5.3.

Noons

unread,

Nov 21, 2012, 1:31:35 AM11/21/12

to

hhk...@gmail.com wrote,on my timestamp of 20/11/2012 10:01 PM:

> That's interesting... My reason for requesting the cio mount option for some
> AIX jfs2 filesystems (which BTW don't contain anything but Oracle .dbf files)
> is that otherwise any attempt at accessing the .dbf files from "outside" - cp
> or TSM ARCHIVE - fails with "Invalid argument" because the Oracle instance
> has the file open with the cio option due to filesystemio_options = SETALL,
> but the utility attempts to open the file without cio.

That should not be a problem at all with Aix 7.1. But I'll give it a try as
well, just to be absolutely sure.

>
> My need for this is caused by some .dbf files for a READ ONLY tablespace,
> which I want to archive on tape, preferably without first taking the
> tablespace OFFLINE, which is the only workaround I have found. I'm on AIX
> 5.3.

Better yet: use RMAN to copy them to the FRA and then zoom them off from there.
Nowadays I use RMAN for just about everything to do with datafiles.
11.2.0.3 and Aix 7.1 here.

joel garry

unread,

Nov 21, 2012, 12:49:05 PM11/21/12

to

On Nov 20, 10:31 pm, Noons <wizofo...@yahoo.com.au> wrote:

Googling:
rman read only datafiles

Gives some interesting reading, especially Hemant pointing out the
difference with backup optimization set on.

jg
--
@home.com is bogus.

"...a surprise birthday purge. Ready, aim, surprise!" - Congress of
Wonders

TheBoss

unread,

Nov 21, 2012, 3:56:22 PM11/21/12

to

joel garry <joel-...@home.com> wrote in
news:a68df625-c348-4647...@uc4g2000pbc.googlegroups.com:

And an equally interesting article on Martin Bach's blog:

http://martincarstenbach.wordpress.com/2012/04/04/rman-duplicate-and-
read-only-tablespaces/

And while we are talking AIX, he also has an interesting one on that:

http://martincarstenbach.wordpress.com/2010/11/01/frightening-number-of-
linking-errors-for-11-2-0-1-3-on-aix-5-3-tl11/

--
Jeroen

Noons

unread,

Nov 21, 2012, 7:00:58 PM11/21/12

to

On Nov 22, 7:56 am, TheBoss <TheB...@invalid.nl> wrote:
>
> > Googling:
> > rman read only datafiles
>
> > Gives some interesting reading, especially Hemant pointing out the
> > difference with backup optimization set on.

Like I said: I am on Aix 7.1 with Oracle 11.2.0.3.
What happens/happened in earlier release combos doesn't concern or
alarm me in the least.

> http://martincarstenbach.wordpress.com/2012/04/04/rman-duplicate-and-
> read-only-tablespaces/

Once again: old lore.

> And while we are talking AIX, he also has an interesting one on that:
>
> http://martincarstenbach.wordpress.com/2010/11/01/frightening-number-of-
> linking-errors-for-11-2-0-1-3-on-aix-5-3-tl11/

Yeah, but once again: Aix 5.3 is NOT, I repeat - NOT - Aix 7.1.

Noons

unread,

Nov 21, 2012, 7:02:45 PM11/21/12

to

On Nov 20, 10:01 pm, hhkr...@gmail.com wrote:

> That's interesting... My reason for requesting the cio mount option for some AIX jfs2 filesystems (which BTW don't contain anything but Oracle .dbf files) is that otherwise any attempt at accessing the .dbf files from "outside" - cp or TSM ARCHIVE - fails with "Invalid argument" because the Oracle instance has the file open with the cio option due to filesystemio_options = SETALL, but the utility attempts to open the file without cio.
>
> My need for this is caused by some .dbf files for a READ ONLY tablespace, which I want to archive on tape, preferably without first taking the tablespace OFFLINE, which is the only workaround I have found. I'm on AIX 5.3.

Upgrade to Aix 7.1. Just confirmed this morning: database files in
jfs2 fs not mounted in cio mode and opened by Oracle in cio mode can
certainly be concurrently read by CP and moved somewhere else. That
is what you were trying to do with the readonly ts datafile, is it not?

hhk...@gmail.com

unread,

Nov 22, 2012, 3:59:45 PM11/22/12

to

Yep, that's what I was trying to achieve. Thanks for testing this, I'll be sure to remember this if we ever get to upgrade AIX - as of now we're on AIX 5.3 ML08, so I cannot even get to play with 11gR2.

( I say *if* we ever get to upgrade: These AIX boxes also host some Oracle 8.1.7.4 databases (don't laugh), which we cannot upgrade because an application based on Webdb 2.2 - stop that laughter, I said! )

BTW I have only made small attempts at looking into large pages on AIX, but your old blog entry on dbasrus.blogspot.dk about how to do it is bookmarked for when I finally get the time!

TheBoss

unread,

Nov 22, 2012, 5:11:41 PM11/22/12

to

Noons <wizo...@gmail.com> wrote in news:9a858c66-856e-4a6a-ae39-
8d732b...@uk1g2000pbb.googlegroups.com:

> On Nov 22, 7:56 am, TheBoss <TheB...@invalid.nl> wrote:
>>
>> > Googling:
>> > rman read only datafiles
>>
>> > Gives some interesting reading, especially Hemant pointing out the
>> > difference with backup optimization set on.
>
> Like I said: I am on Aix 7.1 with Oracle 11.2.0.3.

Yes you have made that perfectly clear.
As clear as the fact that someone else in this thread said he is on 5.3.

> What happens/happened in earlier release combos doesn't concern or
> alarm me in the least.
>

Well, I'm sorry you got the impression my post was directed specifically to
you; that certainly wasn't my intention.

>
>
>> http://martincarstenbach.wordpress.com/2012/04/04/rman-duplicate-and-
>> read-only-tablespaces/
>
> Once again: old lore.

Mwah, just over half a year old; I'm sure you have seen info here much
older than that.

>
>> And while we are talking AIX, he also has an interesting one on that:
>>
>> http://martincarstenbach.wordpress.com/2010/11/01/frightening-
>> number-of-linking-errors-for-11-2-0-1-3-on-aix-5-3-tl11/
>
>
> Yeah, but once again: Aix 5.3 is NOT, I repeat - NOT - Aix 7.1.

I'm perfectly aware of that, thank you.
I've used both, and stil use 7.1 although not with Oracle.

Cheers!

--
Jeroen

Noons

unread,

Nov 22, 2012, 10:26:39 PM11/22/12

to

On Nov 23, 7:59 am, hhkr...@gmail.com wrote:

> Yep, that's what I was trying to achieve. Thanks for testing this, I'll be sure to remember this if we ever get to upgrade AIX - as of now we're on AIX 5.3 ML08, so I cannot even get to play with 11gR2.

One thing: you don't need 11gr2 for Aix 7.1. 10.2.0.3 works perfectly
well, I have it here. Although it isn't *officially* supported.
The only thing I found in 10g doesn't work with Aix 7.1 is largepages
and again that might have been an error of mine. Everything else
seems to work perfectly fine. In fact that's how I upgraded most of my
dbs:
upgrade Aix to 7.1, start instance in 10g, use dbua to upgrade it to
11g.
Strangely enough, dbua 11g logs in to the database before it starts
its upgrade. How on earth do they expect it to be up if it is being
upgraded? (rolling eyes...)
I had to leave the 10g ORACLE_HOME intact, with the db running. dbua
was then run from the 11g ORACLE_HOME and proceeded to do all the
transformations to the db and changes to ORACLE_BASE from 10g setup to
11g.

> ( I say *if* we ever get to upgrade: These AIX boxes also host some Oracle 8.1.7.4 databases (don't laugh), which we cannot upgrade because an application based on Webdb 2.2 - stop that laughter, I said! )

Aye.... That was my first battle here, nearly 6 years ago. Had
everything and the kitchen sink running, including 7.1 in Windows NT 4
server! Took me a couple of years to get EVERYTHING in 10g and Aix
5.3. And now it's almost all there in 11gr2 and 7.1.
Except for Peoplesoft's HR, - of course, it *HAD* to be an Oracle
application...

> BTW I have only made small attempts at looking into large pages on AIX, but your old blog entry on dbasrus.blogspot.dk about how to do it is bookmarked for when I finally get the time!

Sure. You don't have to use largepages until you are into VERY large
SGAs, >> 10GB. That's where it really makes a difference. I've found
it improves overall CPU performance by an average of 10% once a SGA in
the tens of GBs gets moved to largepages. And upgrading to Aix 7.1
and Oracle 11gr2 gave me another el-cheapo 15% CPU gain (on Power 6 h/
w)! I've got the Unix performance graphic to show it in the
SydneyOracle meetup library, "11g Upgrade adventure" presentation:
http://www.sydneyoracle.com.au/files/

hhk...@gmail.com

unread,

Nov 23, 2012, 7:34:07 AM11/23/12

to

What I have tested on 10.2.0.4 databases on AIX 5.3 is backing them by 64K pages, in reference to an IBM document:
http://www.scribd.com/doc/50959984/Tuning-IBM-AIX-5-3-and-AIX-6-1-for-Oracle-Database
and also input from Doug Burns and Maxym Kharchenko:
http://intermediatesql.com/aix/can-oracle-11g-memory_target-work-with-aix-large-pages/
and also input from some MetaLink notes.

Quote from page 13 of IBM's document:
"The new 64 KB pages are preferred over 16 MB pages, and it is recommended that you use them instead of 16 MB pages on systems that support both page sizes. This is because 64 KB pages provide most of the benefit of 16 MB pages, without the special tuning requirements of 16 MB pages."

In my tests it was as simple as:
export ORACLE_SGA_PGSZ=64k
export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STACKPSIZE=64K
before starting up the database.

Quote confusingly 'ps' still reports 4K for the SHMPGSZ column:
ps -fZ -p "$(ps -ef | grep ora_smon_ | grep -v grep | awk '{print $2}')"
but the other columns report 64K, and using 'vmstat -P all' you can see that the SGA is backed by 64K pages.

Noons

unread,

Nov 25, 2012, 6:06:44 AM11/25/12

to

On Nov 23, 11:34 pm, hhkr...@gmail.com wrote:

> What I have tested on 10.2.0.4 databases on AIX 5.3 is backing them by 64K pages, in reference to an IBM document:

> http://www.scribd.com/doc/50959984/Tuning-IBM-AIX-5-3-and-AIX-6-1-for...

> and also input from Doug Burns and Maxym Kharchenko:

> http://intermediatesql.com/aix/can-oracle-11g-memory_target-work-with...

> and also input from some MetaLink notes.
>
> Quote from page 13 of IBM's document:
> "The new 64 KB pages are preferred over 16 MB pages, and it is recommended that you use them instead of 16 MB pages on systems that support both page sizes. This is because 64 KB pages provide most of the benefit of 16 MB pages, without the special tuning requirements of 16 MB pages."

Now, you see: right there, is some mighty confusion.

First of all: page 13 of the document does NOT say that. Note: I am
FULLY aware of the Massanari documents: there are multiple versions of
them and some other better documents. Nothing new.

The quote is out of context. What the doc says is that we can use
16MB large pages for the SGA, but if we ALSO use large pages for code
and text, those should instead use 64KB large pages. Which is
perfectly correct. And nothing to do with what I said: I made it very
clear that I was referring only to the SGA.

But the way you quoted it, it sounds like 16MB pages are incorrect for
ANYTHING. Wrong: they are correct for the SGA and should be used for
it.

From the SAME page of that document:

"When using large pages to map virtual memory, the translation
lookaside buffer (TLB) is able to map more virtual memory with a given
number of entries, resulting in a lower TLB miss rate forapplications
that use a large amount of virtual memory. Additionally, when using
large pages, thereare fewer page boundaries, which improve the
performance of prefetching. Both online transactionprocessing (OLTP)
and data-warehouse environments can benefit from using large pages"

THAT is the really important bit that I urge you to understand in
depth. All other "recommendations" and "rules of thumb" are dross.

As well and with all due respect: Doug and Maxim are not exactly Aix
experts for Oracle and the link you provided is for 11g testing or a
very specific feature: dynamic SGA resizing with large page sizes and
it doesn't even mention which Aix release is being used. Being a
member of the Oaktable/Ace/whatever is NOT a certificate that one is
fully knowledgeable in EVERYTHING and we should all follow blindly
what is said without any reasoning being applied.

I urge you not to confuse apples with oranges: it does NOT work when
dealing with subtle and complexe subjects such as memory management.

And as I said in a comment to Maxim's blog: if anyone thinks
dynamically changing SGA size is an operation that one wants to do
very often if at all, large pages or not, then I've got a bridge to
sell them. So, what's the point of even considereing it?

Dynamic SGA memory resizing is one the most imbecile features
introduced to Oracle recently, to cater for idiots who can't configure
a system upfront. The result is the mountain of bugs reported in MOS
when using that feature. I wish that for once the "auto-everything"
madness would abate...

> In my tests it was as simple as:
> export ORACLE_SGA_PGSZ=64k
> export LDR_CNTRL=DATAPSIZE=64K@TEXTPSIZE=64K@STACKPSIZE=64K
> before starting up the database.

Careful with that LDR_CNTRL. It does NOTHING whatsoever to control
what pagesize the SGA uses. It exclusively deals with the program
execution from there on. If you don't include it in ALL programs/
scripts/sessions/anything that runs in that server, it will be mostly
useless when used to start the instance. Go back to the example I
gave for Jonathan: it shows the effect of that environment variable
and what it does to sqlplus. It's got nothing to do with theSGA
itself.

Or rather: given that once again we are going into "what the summity
says" instead of analyzing things properly and rationalizing what is
being done in full knowledge, I'd rather you didn't change anything.

I really can't be bothered with religion and quite frankly, the Oracle
community nowadays is nothing else than a "church of the Ace". Sorry,
but I really don't have time for that sort of bull. Good bye and keep
using Oracle's "engineered systems" instead of true leading edge
hardware: it's a lot "safer"...