Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Oracle memory allocation on Linux 2.6

39 views
Skip to first unread message

vital...@gmail.com

unread,
Apr 1, 2008, 10:37:48 AM4/1/08
to
Hi all,

With the latest Linux kernels, things have gotten hard to tune
regarding the SGA/PGA we can allocate. I'm mainly talking about the
physical memory that is allocated to the OS buffers/caches to make
full use of the memory. With earlier kernels, we had vm.pagecache to
control easily these amounts of memory. Now there is vm.swapiness with
the unfathomable algorithm behind it (read: AFAIK we cannot really
control the OS buffers/caches anymore.)

Moreover, SGA locking is not possible (or rather not advised by Oracle
support) and it is really difficult to check how much physical memory
our SGA is actually using. Even with vm.swapiness=0, huge amounts of
OS buffers/caches get allocated with the obvious conclusion that
"inactive" SGA blocks are paged out to disk, with at least two cons:
this unwanted paging activity (even if it is limited) and the fact
that calculating the "free" physical memory on our box we could
allocate to the PGA/SGA (or other applications) without major paging
is almost impossible.

Any advice in this regard from your experience with Linux 2.6 (and any
Oracle version)?

TIA,
Jerome

Mladen Gogala

unread,
Apr 1, 2008, 12:28:26 PM4/1/08
to
On Tue, 01 Apr 2008 07:37:48 -0700, vitalisman wrote:

> With the latest Linux kernels, things have gotten hard to tune regarding
> the SGA/PGA we can allocate. I'm mainly talking about the physical
> memory that is allocated to the OS buffers/caches to make full use of
> the memory. With earlier kernels, we had vm.pagecache to control easily
> these amounts of memory. Now there is vm.swapiness with the unfathomable
> algorithm behind it (read: AFAIK we cannot really control the OS
> buffers/caches anymore.)

True. Linux is beginning to imitate Windows, even the shortcomings.
No control over the VM that Unix OS was renowned for. Linux, too, started
with undocumented algorithms and is taking control away from the system
administrators. It is beginning to resemble an ego trip of yet another
"IT industry giant", namely Ken Olsen. To make long story short - Linux
sucks. I am awaiting a competing OS. What they did with 2.6 is a disaster.
Competitor is bound to appear, sooner or later. My money is on some flavor
of BSD. This is not the first time things like that have been discussed
here. As far as I can recollect, Noons was also very critical of Linux,
same as Kevin Closson.

>
> Moreover, SGA locking is not possible (or rather not advised by Oracle
> support) and it is really difficult to check how much physical memory
> our SGA is actually using. Even with vm.swapiness=0, huge amounts of OS
> buffers/caches get

Do you have FILSYSTEMIO_OPTIONS set to either "SETALL" or "DIRECTIO"?
In my experience, that will significantly reduce the use of caches.

--
http://mgogala.freehostia.com

vital...@gmail.com

unread,
Apr 2, 2008, 6:20:27 AM4/2/08
to
On Apr 1, 6:28 pm, Mladen Gogala <mgog...@yahoo.com> wrote:
> On Tue, 01 Apr 2008 07:37:48 -0700, vitalisman wrote:
> > With the latest Linux kernels, things have gotten hard to tune regarding
> > the SGA/PGA we can allocate. I'm mainly talking about the physical
> > memory that is allocated to the OS buffers/caches to make full use of
> > the memory. With earlier kernels, we had vm.pagecache to control easily
> > these amounts of memory. Now there is vm.swapiness with the unfathomable
> > algorithm behind it (read: AFAIK we cannot really control the OS
> > buffers/caches anymore.)
>
> True. Linux is beginning to imitate Windows, even the shortcomings.
> No control over the VM that Unix OS was renowned for. Linux, too, started
> with undocumented algorithms and is taking control away from the system
> administrators. It is beginning to resemble an ego trip of yet another
> "IT industry giant", namely Ken Olsen. To make long story short - Linux
> sucks. I am awaiting a competing OS. What they did with 2.6 is a disaster.
> Competitor is bound to appear, sooner or later. My money is on some flavor
> of BSD.

I agree with you. I was a big supporter of FreeBSD a few years ago.
Its TCP/IP stack and Virtual Memory system allows for great overall
performances. Oracle benefits from them, even if it has to run in
Linux-compatibility mode. However this architecture is not supported
(and some components like Oracle client aren't regularly updated), and
few companies are interested in FreeBSD (at least in Europe), except
for webservers and security-related apps (thanks to the code ported
from OpenBSD.)

The latest Linux kernels just seem to ignore apps that have their own
buffer cache and that need to accurately control memory allocation.
That's rather sad, and Oracle Unbreakable Linux making use of this
kernel is a bit surprising. Maybe the Oracle dev team knows better
regarding Linux VMemory tuning, but on the Unbreakable Linux boxes I
got my hand on, no VMemory seemed to be tuned differently from a
standard RHEL.

> > Moreover, SGA locking is not possible (or rather not advised by Oracle
> > support) and it is really difficult to check how much physical memory
> > our SGA is actually using. Even with vm.swapiness=0, huge amounts of OS
> > buffers/caches get
>
> Do you have FILSYSTEMIO_OPTIONS set to either "SETALL" or "DIRECTIO"?
> In my experience, that will significantly reduce the use of caches.

We are on RAC 9i and RAC 10g with OCFS2.

On 10g, filesystemio_options=asynch. I've not seen any read
performance improvement by turning it to setall ; I/O seems to be
still cached (unlike with ext3.)

On 9i, filesystemio_options=none (default) and Oracle binaries are not
linked for AIO (default, at least with RAC). I'm in the process of
testing AIO and DIO on RAC 9i following Oracle support questions, but
the first attempts gave way to some instance internal errors... I've
asked the support if this is really a supported configuration (at
least with OCFS1, it was not, apparently.)

Thanks Mladen for your answer!

Jerome

NetComrade

unread,
Apr 2, 2008, 12:16:34 PM4/2/08
to

Have you tried playing with Veritas/ODM?

.......
We run Oracle 9iR2,10gR2, 10g2RAC on RH4/RH5 and Solaris 10 (Sparc)
remove NSPAM to email

joel garry

unread,
Apr 2, 2008, 2:10:04 PM4/2/08
to

Of course, I don't know anything in depth about this, but Wim
Coekaerts is the director of linux engineering, and he mentions in
Note:261889.1 that O10 uses the hugetbl pool preallocated unpageable
by default, but you have to ask for a patch for 9i. Now, if you are
not seeing this, and actually seeing some evidence that SGA is being
swapped out, maybe someone knows how to get Wim involved in this
discussion? Maybe a support call is called for?

He also mentioned on his blog a while back that Oracle is simply
supporting linux. Maybe if someone put together a _real_ Oracle-
specific linux they might become as rich as Linus :-)

jg
--
@home.com is bogus.
It's amusing to see xmlloader directories in .se domains.

vital...@gmail.com

unread,
Apr 2, 2008, 4:14:25 PM4/2/08
to
On 2 avr, 20:10, joel garry <joel-ga...@home.com> wrote:

> Of course, I don't know anything in depth about this, but Wim
> Coekaerts is the director of linux engineering, and he mentions in
> Note:261889.1 that O10 uses the hugetbl pool preallocated unpageable
> by default, but you have to ask for a patch for 9i. Now, if you are

Thanks Joel. I'll have a thorough look at that tomorrow.

> He also mentioned on his blog a while back that Oracle is simply
> supporting linux. Maybe if someone put together a _real_ Oracle-
> specific linux they might become as rich as Linus :-)

lol

Mladen Gogala

unread,
Apr 3, 2008, 4:28:50 AM4/3/08
to
On Wed, 02 Apr 2008 03:20:27 -0700, vitalisman wrote:

> On 9i, filesystemio_options=none (default) and Oracle binaries are not
> linked for AIO (default, at least with RAC). I'm in the process of
> testing AIO and DIO on RAC 9i following Oracle support questions, but
> the first attempts gave way to some instance internal errors... I've
> asked the support if this is really a supported configuration (at least
> with OCFS1, it was not, apparently.)
>
> Thanks Mladen for your answer!

Jerome, direct I/O should lower memory consumption simply because it
bypasses buffer cache over which we have no control. you still can control
the amount of consumed memory through

- min_free_kbytes,
- dirty_background_ratio,
- dirty_expire_centisecs
- dirty_writeback_centisecs.

You cannot control the size of cache components (like buffer cache) but
you can control behavior and overall limit with min_free_kbytes. It's
essentially the same approach as the one taken by oracle. You can
determine an overall lump of memory that you want to use, but not the
structure of that lump. The structure will be adjusted by some AI
predictive component of the software. So far, the "I" part is failing
miserably in both cases. With Linux, I really have a problem with the
"OOM killer" component. I cannot see how would that be superior to the
traditional mechanisms for fine-tuning the memory allocation. I used to
work on an ancient and arcane OS, the best one I've ever seen, called
"VAX/VMS". I started with the version 4.2 and the last version that I used
was 5.5-2. Believe it or not, VAX/VMS used to have far superior monitoring
tools and memory tuning mechanisms to any of modern Unix systems. Linux
systems are far inferior to things like AIX 5.3, HP-UX 11.11 or Solaris
10 and those systems are, in turn, far inferior to VAX/VMS.
I believe that the root of the evil is in the crusade against expensive
administrators. DBA personnel, as well as system administrators are
considered "expensive" and disliked by the modern management. Just as
there is a tendency to cut the number of expensive workers in automobile
industry, there is also a tendency to replace everybody by "business
professionals", people programming with Hibernate and Tapestry for
WebLogic or JBoss, knowing next to nothing about databases or the
underlying OS. That's the real spirit of Windows platform, as well as the
Linux platform. If I understand the business correctly, what is wanted is
the system that can be used and administered by Elbonians. Fortunately for
us, the effect is precisely the opposite. Oracle11 is the most complex
database to date and Linux 2.6 is the most problems-prone Linux version
ever. You need better administrators than ever, fewer and fewer are up to
the task. I will not shed a tear over a demise of Linux, when that
happens. However, don't be mistaken: the industry will succeed,
eventually. They succeeded in the automobile industry, there is no reason
to doubt the progress and the prospect of the ultimate victory for idiots.
HAL 9000 awaits us.

--
Mladen Gogala
http://mgogala.byethost5.com

Robert Klemme

unread,
Apr 3, 2008, 9:17:23 AM4/3/08
to
On Apr 3, 10:28 am, Mladen Gogala <mgog...@yahoo.com> wrote:

> I believe that the root of the evil is in the crusade against expensive
> administrators. DBA personnel, as well as system administrators are
> considered "expensive" and disliked by the modern management. Just as
> there is a tendency to cut the number of expensive workers in automobile
> industry, there is also a tendency to replace everybody by "business
> professionals", people programming with Hibernate and Tapestry for
> WebLogic or JBoss, knowing next to nothing about databases or the
> underlying OS. That's the real spirit of Windows platform, as well as the
> Linux platform. If I understand the business correctly, what is wanted is
> the system that can be used and administered by Elbonians. Fortunately for
> us, the effect is precisely the opposite. Oracle11 is the most complex
> database to date and Linux 2.6 is the most problems-prone Linux version
> ever. You need better administrators than ever, fewer and fewer are up to
> the task. I will not shed a tear over a demise of Linux, when that
> happens. However, don't be mistaken: the industry will succeed,
> eventually. They succeeded in the automobile industry, there is no reason
> to doubt the progress and the prospect of the ultimate victory for idiots.
> HAL 9000 awaits us.

While I sympathize with your statement, I do not fully agree. It is
true that appreciation for well trained, smart people that know how to
do their job (and more) seems to be declining. This is sad and I
dislike that tendency as I prefer to do a good job over having a job.

It is also true that too many people develop applications with too
little background - especially on databases. I find it particularly
distressful that OR mappers are so prevalent nowadays because they
hide away database details from the application - which is good in a
way, because it leads to separation of concerns - but at the same time
hides important aspects such as transactions, when data is read from
the database and when from memory, how to best pool accesses to
minimize DB round trips etc. This almost automatically (!) leads to
bad performance.

The point where I disagree is when you turn down improvements in
management automation. Basically systems become more complex all the
time and we also learn more about system behavior. It is much more
efficient to try to put that knowledge in code than to train a lot of
people to do the same. The Java Virtual Machines of today are an
extremely good example of where this can lead - performance has much
improved over earlier versions and this is because the JVM is
"intelligent" enough to do optimizations on the running code.

My 0.02 EUR...

Kind regards

robert

Mladen Gogala

unread,
Apr 3, 2008, 9:36:20 AM4/3/08
to
On Thu, 03 Apr 2008 06:17:23 -0700, Robert Klemme wrote:

> The point where I disagree is when you turn down improvements in
> management automation.

Management automation? I know few managers that could be replaced by
a Perl script, but I wasn't aware of the trend?

> Basically systems become more complex all the
> time and we also learn more about system behavior. It is much more
> efficient to try to put that knowledge in code than to train a lot of
> people to do the same.

HAL 9000 awaits us. How do you put knowledge in the code? How do you
get a computer to fix a horribly botched applications in which queries
are created on views upon views upon views?

> The Java Virtual Machines of today are an
> extremely good example of where this can lead - performance has much
> improved over earlier versions and this is because the JVM is
> "intelligent" enough to do optimizations on the running code.

Hmmm, I am not sure that JVM can outrun Perl.

joel garry

unread,
Apr 3, 2008, 1:28:10 PM4/3/08
to

I started with 3.something, having been a RSTS pro. What I don't
understand is why the Cutler group didn't put those mechanisms in NT -
or did they, and the Elbonians didn't know what to do with them?

Of course, I thought VMS was unnecessarily verbose in administration
and slow with I/O at that time, hence DEC's prediliction for forcing
people off of superior performing PDP's (PDP 11/70 v. VAX 750)...

Having worked with RSTS, VMS, some Windows, and various unix, I find
my bias fairly consistent towards unix, even the jurassic parts seem
better - just in general being able to edit a text file rather than a
registry or GUI helps feed my feeling of control.

> I believe that the root of the evil is in the crusade against expensive
> administrators. DBA personnel, as well as system administrators are
> considered "expensive" and disliked by the modern management. Just as
> there is a tendency to cut the number of expensive workers in automobile
> industry, there is also a tendency to replace everybody by "business
> professionals", people programming with Hibernate and Tapestry for
> WebLogic or JBoss, knowing next to nothing about databases or the
> underlying OS. That's the real spirit of Windows platform, as well as the
> Linux platform. If I understand the business correctly, what is wanted is
> the system that can be used and administered by Elbonians. Fortunately for
> us, the effect is precisely the opposite. Oracle11 is the most complex
> database to date and Linux 2.6 is the most problems-prone Linux version
> ever. You need better administrators than ever, fewer and fewer are up to
> the task. I will not shed a tear over a demise of Linux, when that
> happens. However, don't be mistaken: the industry will succeed,
> eventually. They succeeded in the automobile industry, there is no reason
> to doubt the progress and the prospect of the ultimate victory for idiots.
> HAL 9000 awaits us.

Depends how you define victory. In dollars, technical superiority is
rarely the determinant.

Not sure what you mean by succeeded in the automobile industry,
manufacturing capacity planning is AFU there, labor cost control still
has a long way to go. US economy will even impact Toyota. Chrysler,
Land Rover, Jaguar, Aston-Marton have all been bought with inflated IT
(that would be Tata), oil and casino dollars at fire-sale prices.
Design (Nissan, Chrysler) and luxury (Ford Premier) centers on the US
west coast are being "consolidated" towards Detroit, England and
Japan. Things aren't going to improve any time soon.

HAL 9000? More likely 2010, a Stooges Odyssey.
http://thedailywtf.com/Articles/Youll-Need-to-Come-Downtown.aspx

jg
--
@home.com is bogus.

Right in front of the bank I saved my pennies at as a kid. But why
has no one mentioned the jet fuel pipeline that runs right under
Sepulveda there? http://www.latimes.com/news/local/la-me-explosion29mar29,0,5311983.story

Robert Klemme

unread,
Apr 4, 2008, 4:00:32 AM4/4/08
to
On Apr 3, 3:36 pm, Mladen Gogala <mgog...@yahoo.com> wrote:
> On Thu, 03 Apr 2008 06:17:23 -0700, Robert Klemme wrote:
> > The point where I disagree is when you turn down improvements in
> > management automation.
>
> Management automation? I know few managers that could be replaced by
> a Perl script, but I wasn't aware of the trend?
>
> > Basically systems become more complex all the
> > time and we also learn more about system behavior. It is much more
> > efficient to try to put that knowledge in code than to train a lot of
> > people to do the same.
>
> HAL 9000 awaits us. How do you put knowledge in the code?

We do that all the time when developing software.

> How do you
> get a computer to fix a horribly botched applications in which queries
> are created on views upon views upon views?

Of course you don't. As I said in my earlier posting, I'm all in for
people that know their job and do good work. But this thread is about
automated memory management and you seem to generally dislike features
like this while I tried to point out that they can actually work by
pointing at the JVM example. The fact (?) that Linux MM is not as
good as it could be does not prove that automation like this is bad in
general.

> > The Java Virtual Machines of today are an
> > extremely good example of where this can lead - performance has much
> > improved over earlier versions and this is because the JVM is
> > "intelligent" enough to do optimizations on the running code.
>
> Hmmm, I am not sure that JVM can outrun Perl.

I am sure you can find benchmarks that prove each claim - Perl being
faster and Java being faster.

Cheers

robert

vital...@gmail.com

unread,
Apr 5, 2008, 2:11:36 AM4/5/08
to
On 2 avr, 22:14, vitalis...@gmail.com wrote:
> On 2 avr, 20:10, joel garry <joel-ga...@home.com> wrote:
>
> > Of course, I don't know anything in depth about this, but Wim
> > Coekaerts is the director of linux engineering, and he mentions in
> > Note:261889.1 that O10 uses the hugetbl pool preallocated unpageable
> > by default, but you have to ask for a patch for 9i. Now, if you are
>
> Thanks Joel. I'll have a thorough look at that tomorrow.

This important setting is mentioned in the RAC installation guide for
Linux. Something the guy who installed our cluster seems to have
overlooked or deemed unuseful... (It's not a prereq but rather a
suggestion in the guide.)

It's also surprising that, in addition to the RAC install guide,
hugetbl is mentioned in the Linux database admin guide only (and not
in the database install guide.)

Thanks again Joel for pointed that out.

Jerome

vital...@gmail.com

unread,
Apr 5, 2008, 2:24:54 AM4/5/08
to
On 2 avr, 18:16, NetComrade <netcomradeNS...@bookexchange.net> wrote:

> >Do you have FILSYSTEMIO_OPTIONS set to either "SETALL" or "DIRECTIO"?
> >In my experience, that will significantly reduce the use of caches.
>
> Have you tried playing with Veritas/ODM?

I've never used ODM. But is it compatible with OCFS2?

Message has been deleted

vital...@gmail.com

unread,
Apr 5, 2008, 2:41:38 AM4/5/08
to
On 3 avr, 10:28, Mladen Gogala <mgog...@yahoo.com> wrote:

> On Wed, 02 Apr 2008 03:20:27 -0700, vitalisman wrote:
> > On 9i, filesystemio_options=none (default) and Oracle binaries are not
> > linked for AIO (default, at least with RAC). I'm in the process of
> > testing AIO and DIO on RAC 9i following Oracle support questions, but
> > the first attempts gave way to some instance internal errors... I've
> > asked the support if this is really a supported configuration (at least
> > with OCFS1, it was not, apparently.)

> > Thanks Mladen for your answer!

> Jerome, direct I/O should lower memory consumption simply because it
> bypasses buffer cache over which we have no control.

I know how DIO should work, but my first tests with OCFS2 did not
evidence any OS cache bypassing. Have you ever enabled DIO with OCFS2?
Did that work?

> - min_free_kbytes,
> - dirty_background_ratio,
> - dirty_expire_centisecs
> - dirty_writeback_centisecs.

> You cannot control the size of cache components (like buffer cache) but
> you can control behavior and overall limit with min_free_kbytes.

Those are parameters I usually check/tune on Linux.

Thanks again for your answer.

NetComrade

unread,
Apr 14, 2008, 3:39:19 PM4/14/08
to

Sorry, no clue.. my understanding with OCFS, an oracle supplied FS,
double buffering shouldn't be an issue... but i never used it... was
'fortunate' enough to work with ADM

.......
We run Oracle 9iR2,10gR2, 10g2RAC on RH4/RH5 and Solaris 10 (Sparc)

We use RMAN and remote catalog for backups

vital...@gmail.com

unread,
Apr 15, 2008, 9:11:35 AM4/15/08
to
On Apr 14, 9:39 pm, NetComrade <netcomradeNS...@bookexchange.net>
wrote:

> On Fri, 4 Apr 2008 23:24:54 -0700 (PDT), vitalis...@gmail.com wrote:
> >On 2 avr, 18:16, NetComrade <netcomradeNS...@bookexchange.net> wrote:
>
> >> >Do you have FILSYSTEMIO_OPTIONS set to either "SETALL" or "DIRECTIO"?
> >> >In my experience, that will significantly reduce the use of caches.
>
> >> Have you tried playing with Veritas/ODM?
>
> >I've never used ODM. But is it compatible with OCFS2?
>
> Sorry, no clue.. my understanding with OCFS, an oracle supplied FS,
> double buffering shouldn't be an issue... but i never used it... was
> 'fortunate' enough to work with ADM

Could you please explain why this "double buffering" is wanted? And
have you found a means of inhibiting it altogether (using Direct IO on
Linux/OCFS2 does not do it according to my tests and a few messages
found on the net)?

Thanks.

NetComrade

unread,
Apr 15, 2008, 12:58:29 PM4/15/08
to

If my reply read as 'double buffering' is a good thing, my apologies.
With ODM, I am pretty sure it's not an issue... I've verified it with
trussing/stracing in the past and proper system calls.. I haven't
done anything beyond that (no clue into 'peaking' into buffers)

I am not familiar with OCFS2, but why would Oracle build a file system
that sucks similarly to others? I think w/o peaking into buffers, it's
hard to tell if buffered data is Oracle's. There will also be local
file systems.. Linux does not like to 'waste' memory.
If you really think it's an issue in your case, go raw or ODM.. I'd be
curious to see if you have links showing that directio is a 'scam'

vital...@gmail.com

unread,
Apr 16, 2008, 6:01:52 AM4/16/08
to
On Apr 15, 6:58 pm, NetComrade <netcomradeNS...@bookexchange.net>
wrote:

According to the (few) tests I've done, datafile blocks on OCFS2 are
still buffered by Linux with DIO. It seems to be the "expected"
behaviour as evidenced by messages by other Net comrades ;-) such as
http://tinyurl.com/6rehxw

tomsla

unread,
Apr 17, 2008, 3:32:19 AM4/17/08
to
To solve the problem witch swapping out SGA and caching datafiles you
should use HugePages on Linux and for your Storage ORACLE ASM.

That works perfect.

Have look here:

http://www.puschitz.com/TuningLinuxForOracle.shtml#SwapSizeRecommendations

--------------
Tomasz Slawek

vital...@gmail.com

unread,
Apr 17, 2008, 4:55:57 AM4/17/08
to
On Apr 17, 9:32 am, tomsla <tomasz.sla...@gmail.com> wrote:
> To solve the problem witch swapping out SGA and caching datafiles you
> should use HugePages on Linux

Yes, but they are allocated from Linux high memory. Depending upon the
kernel and the available physical memory, this is not always an
option.

> and for your Storage ORACLE ASM.
>
> That works perfect.

Thanks for your reply. I haven't tested ASM with RAC yet.
Regarding I/O performance, have you found any drawback in using ASM
rather than OCFS2?

0 new messages