On Nov 13, 9:53 pm, "Jonathan Lewis" <
jonat...@jlcomp.demon.co.uk>
wrote:
> I can't work out from your statement exactly which bit of what I've said is
> not correct. Could you please clarify.
You said:
>In Unix (unless you use intimate shared memory) you get one map per
>process - so an Oracle system with 1,000 processes would end up using
as
>much memory for maps of the SGA as it would on the SGA itself if it
were
>using standard pages.
That is incorrect, In Unix there is no such thing as a memory map per
process.
The largepages feature has nothing to do with the memory of each
process.
Memory map is an incorrect term to describe virtual memory translation
to physical addresses in the context of a process or group of
processes.
> The "memory map" certainly maps logical addresses to physical memory
> addresses - easy enough to see (if you know the answer) when you look at
> some of the x$ objects and see wildly different addresses for objects - for
> example, Oracle maps the PGA for a process into an entirely different set
> of (logical) addresses from the SGA - to the extent that the (logical) gap
> between memory addresses can be far larger than the physical memory
> available on the machine.
What memory Oracle allocates for each required chunk does not
constitute a memory map. There is no such thing as a process "memory
map". Memory allocation is not the same as memory mapping.
> But (depending on O/S and choice of O/S parameters the page size used in
> the mapping may vary, each process may have it's own map, or all processes
> that attach to the same physical memory may share a single map of the
> memory (e.g. Solaris Intimate Shared Memory).
Not quite correct. Page size is not an attribute of the process. It's
an attribute of the virtual memory management that is used to map a
given virtual address space to actual memory, whatever the process.
> I wrote a note about this a
> coupld of years ago highlighting a surprise side effect
> (
http://jonathanlewis.wordpress.com/2010/06/23/memory/); more
> significantly Christo Kutrovsky (of Pythian) posted a video of a
> presentation demonstrating the various effects (and how to measure them)
> when configuring Linux.
> (
http://www.pythian.com/news/741/pythian-goodies-free-memory-swap-orac...
> )
Yes, but what might happen in Linux - according to Christo - does not
define what Unix does: last time I looked they were NOT the same OS!
As I said: in Unix, largepages are not part of a process "memory map".
They are used to define a portion (NOT an address range, simply a
quantity) of physical memory that will be managed by virtual memory
pages of a certain size. It is perfectly possible - in fact almost
mandatory - to have multiple regions of physical memory dedicated to
different types of paging. In Aix for example, there are 3 sizes of
pages possible which can be used in various combinations by various
processes, singly or concurrently: 4K, 65K and 16M. In Aix 7.1 and P7
hardware there is a 4th size, but I won't go into that.
What one can do in Unix is allocate a certain amount of physical
memory to a certain type of paging. Once that is done, processes are
free to use that memory as native address space for execution, or as
attached segments of "shared" memory. As an example, it is perfectly
possible for a process using 4KB pages to attach a shared memory
segment that is managed - from the point of view of virtual-to-
physical addressing - as 16MB pages. Nothing to do with the process.
All processes attaching to that shared segment will use its pagesize
for those addresses, regardless of what they started with.
Here is the breakdown of such page sizes and how much physical memory
is managed by each pagesize in my Aix test system:
aubdc00-ora01t:sandy$vmstat -P ALL
System configuration: mem=16384MB
pgsz memory page
----- -------------------------- ------------------------------------
siz avm fre re pi po fr sr cy
4K 1222496 341519 615634 0 0 0 0 0 0
64K 32138 31995 143 0 0 0 0 0 0
16M 600 376 224 0 0 0 0 0 0
Or in round numbers:
9GB of 16MB pages,
2GB of 64K pages and
5GB of 4K pages.
for a grand total as indicated above of around 16GB of physical
memory.
Note that the free 16MB pags are 224 above (fre column): this will
become relevant below.
Where each pagesize region of physical memory starts is entirely up to
the OS to manage. It will be mapped to virtual addresses anyway, not
relevant here. What is important to note is that the memory managed by
each type of pagesize is contiguous, whatever its quantity may be.
A program can map virtual memory to physical memory with various page
sizes, or it can use its entire address space in one type of page
size. For example, I can run sqlplus in 4K pagespace, using a SGA
entirely in 16MB pages.
See above for the pages active in 16MB: they are the SGA, and there
are 376 of them, with 224 spare not being used at the moment.
If I now set the environment variables that cause Aix to execute
sqlplus ENTIRELY in 16MB space, I'll get the following:
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2390433 604576 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
aubdc00-ora01t:sandy$export
LDR_CNTRL=LARGE_PAGE_TEXT=Y@LARGE_PAGE_DATA=M
(from now on ALL programs in my session will be executing - BY DEFAULT
- entirely in 16MB pagesize - the column "flp" will measure how many
are free)
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2403604 620137 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 396 204
(notice how the free largepages - flp column - has dropped to 204?
That's vmstat itself running in largepages)
(Now for sqlplus:)
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:23:00 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2480102 600964 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 430 170
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release
11.2.0.3.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2403607 603731 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 396 204
(Notice how flp dropped to 170 when sqlplus was running. And it's
back to 204 flp after I exit sqlplus and run vmstat in 16MB pages)
Now to prove my point, I'll turn off the largepages for the whole
process and show that sqlplus is NOT using up the largepages at all,
although of course it is attaching to an Oracle instance where the SGA
DOES use 16MB pages:
aubdc00-ora01t:sandy$unset LDR_CNTRL
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2390890 604159 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
(Notice how the flp has now popped back to 224, as initially - now
vmstat is running in the default 4K pages)
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:30:00 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2394755 600292 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
(and of course from inside sqlplus we can check vmstat and there is
the constant 224 flp again, as sqlplus is NOT using largepages
itself. Although it is attaching to a SGA that has been locked and is
using largepages)
SQL> sho parameter lock_sga
NAME TYPE VALUE
------------------------------------ -----------
------------------------------
lock_sga boolean TRUE
Now to prove that the SGA is indeed using the largepages, let's
shutdown and startup and check flp column while we do that:
aubdc00-ora01t:sandy$vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2390933 604112 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
aubdc00-ora01t:sandy$sqlplus
SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 14 12:39:08 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Enter user-name: / as sysdba
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit
Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 782537 672407 0 0 0 0 0 0 10 4194 588 6 3 91 0
0.06 8.9 0 600
SQL> startup
ORACLE instance started.
Total System Global Area
6263357440 bytes
Fixed Size 2233112 bytes
Variable Size 1073745128 bytes
Database Buffers
5167382528 bytes
Redo Buffers 19996672 bytes
Database mounted.
Database opened.
SQL> !vmstat -l
System configuration: lcpu=2 mem=16384MB ent=0.70
kthr memory page faults
cpu large-page
----- ----------- ------------------------ ------------
----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
pc ec alp flp
2 1 2386771 608244 0 0 0 0 0 0 10 4194 588 6 3 91
0 0.06 8.9 376 224
SQL>
See how the free large pages (flp) changes as I shutdown and restart?
That's the SGA in 16MB pages, while the rest of the programs,
libraries and OS are using 4K pages.
What Christo might have found in Linux is SPECIFIC to Linux only, and
I strongly suspect to the specific version and flavour he was
running. Unix manages virtual memory in a different manner and there
are again small differences between Solaris, Aix, and others.
But the bottom line is: largepages (hugepages, or whatever we decide
to call them) have nothing to do with a per process count and ALL to
do with how much physical memory is addressed by each type of paging.
Processes simply use largepages or not, either for execution space or
as attached shared memory. Mixed page size spaces in a single program
are perfectly possible, precisely because the pagesize and how it maps
to physical memory is EXTERNAL to the entire per process memory
management.
All a process knows is it uses (virtual) memory, between numeric
addresses set by the OS. As in 0A000(hex) - 0FFFFF(hex). Those two
limits are set by the virtual memory management mechanism of the OS,
once, for each program. The CPU then maps those to physical addresses
using the TLA tables set up and managed by the OS, depending on memory
configuration.
There can be boundaries between those two limits, which are mapped to
pages of various sizes. Remember the old Oracle shared memory setups
of release 5 (and 6), where we had to re-link Oracle to use shared
memory in Unix? One of the parameters was the size of shared memory
available (SHMALL) and another (optional) was the virtual address
where it started. All to do with that.
Thank the Gods that's gone now!
The whole thing can get very complex very quickly and there are many,
many nuances so I won't go into a lot more detail here than I could.
In fact, I've already put too much into it and it will be difficult to
follow for a lot of folks: my apologies!
The whole field of virtual memory, paging translation and physical to
virtual mapping are a separate universe that has nothing to do with
individual processes, although of course it affects their execution.
This was explained in detail in some of the old McGraw-Hill OS books
from the 70s and 80s, which are unfortunately out of print nowadays -
but no less relevant. There is also at least one Dijkstra book that
describes the whole thing in detail and I do believe from a faint
memory Knuth talks about it as well in one of his tomes. IBM's online
technical library still has some excellent books on the subject, going
into much more detail.
Kevin and I have discussed this largepage thing quite a few times in
his blog, there are many entries there that might be worth folks
reading as he does a much better job of addressing this than my feeble
attempts to explain it.
Again, just to make my point very clear: Linux does NOT manage virtual
memory the same way as Unix nor is it valid to extrapolate from Linux
to Unix - or Windows, for that matter. If I may quote you, it all
must be tested for validity rather than assumed.
As such, I'd suggest that anyone looking at using largepages in
Windows give it a try and check results before assuming anything.
Worth doing? You bet: I get an immediate CPU usage improvement of
between 10 and 15% in Aix by just switching the SGA into 16MB: there
is a MUCH smaller TLA for the virtual memory translation to traverse
when I do that, and that traversing costs CPU cycles!