Linux kernel 2.6.38 introduced a novelty called "transparent huge pages".
The goal was to blur the difference between the huge 2M pages and normal
4K pages. The patch is described here:
http://lwn.net/Articles/359158
Of course, current "production grade" kernels are version 2.6.32, both in
RH and OL version, which means that they do not contain this patch.
[root@rac1 hugepages]# uname -r
2.6.32-300.3.1.el6uek.i686
[root@rac1 hugepages]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.2 (Santiago)
[root@rac1 hugepages]# cat /etc/oracle-release
Oracle Linux Server release 6.2
[root@rac1 hugepages]#
However, the bleeding edge Linux distributions, like Fedora F16 use much
higher kernel version which does have this patch. Oracle RDBMS will use
large pages normally:
SQL*Plus: Release 11.2.0.3.0 Production on Sat Apr 28 18:07:56 2012
Copyright (c) 1982, 2011, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area
2137886720 bytes
Fixed Size 2230072 bytes
Variable Size 469764296 bytes
Database Buffers 1660944384 bytes
Redo Buffers 4947968 bytes
Database mounted.
Database opened.
SQL> show parameter use_large
NAME TYPE VALUE
------------------------------------ -----------
------------------------------
use_large_pages string ONLY
[root@medo transparent_hugepage]# grep -i huge /proc/meminfo
AnonHugePages: 284672 kB
HugePages_Total: 4096
HugePages_Free: 3072
HugePages_Rsvd: 1
HugePages_Surp: 0
Hugepagesize: 2048 kB
So, 2GB SGA and 1024 large pages consumed. That's expected. However, I
expected this patch to help with VirtualBox. It doesn't do that. After
some digging, I figured out that the program must be written in a special
way to utilize this possibility. VirtualBox, apparently, still cannot do
that:
[root@medo 2996]# ps -fp 2996
UID PID PPID C STIME TTY TIME CMD
mgogala 2996 2979 26 18:08 ? 00:04:34 /usr/lib/virtualbox/
VirtualBox -
[root@medo 2996]# ps -F -p 2996
UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
mgogala 2996 2979 26 568521 691156 1 18:08 ? 00:04:34 /usr/lib/
virtual
[root@medo 2996]# grep -i huge /proc/meminfo
AnonHugePages: 286720 kB
HugePages_Total: 4096
HugePages_Free: 4096
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
[root@medo 2996]#
From the second line, it is visible that the VirtualBox process, PID=2996,
is using a hefty 690MB of memory (RSS column, expressed in KB). However,
not a single huge page was consumed. If you read the THP description from
the earlier link, you will also see that the patch makes them swappable,
which is not my intention. This can be disabled like this:
echo never>/sys/kernel/mm/transparent_hugepage/enabled
cat enabled
always madvise [never]
Of course, in order to do that, kernel must be newer than 2.6.38:
[root@medo transparent_hugepage]# uname -r
3.3.2-6.fc16.x86_64
When VirtualBox is patched to utilize transparent huge pages, turning
this feature on will make sense. Database instance can use huge pages,
even without the transparent huge pages support. Also, for serious work I
advise XFS:
[root@medo vm]# mount |grep xfs
/dev/sdb1 on /misc type xfs (rw,relatime,attr2,noquota)
/dev/sdb2 on /data type xfs (rw,relatime,attr2,noquota)
/dev/mapper/vg_medo-lv_home on /home type xfs (rw,relatime,attr2,noquota)
XFS has defragmenter, supports direct I/O and async I/O and my
experiences with it so far are great. Brtfs will really have to be
something special, in order to beat it. It is possible to set "real time
IO priority" for database files on XFS file system (xfs_io command),
which will make any I/O request against the database files be executed
before any other IO requests pending against that device. Turning that
flag on for redo logs and system tablespace makes sense.
--
http://mgogala.byethost5.com