Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Any gotchas for STMM

274 views
Skip to first unread message

Richard

unread,
Jun 11, 2010, 10:37:33 PM6/11/10
to
Folks,

I wonder what the real-world results for turning STMM on, along with
many of the AUTOMATIC memory config parameters that's been introduced
since V9 ? We are turning STMM on in production. There's never enough
traffic or volume of activities in development to get good estimates
and find gotchas. Your experience with this in production settings and
recommendations will get us far.

Thank you in advance.
RL

Mark A

unread,
Jun 12, 2010, 12:38:13 AM6/12/10
to
"Richard" <rsl...@gmail.com> wrote in message
news:aa240752-92fc-4146...@a30g2000yqn.googlegroups.com...

Only a donkey would use STMM.

IBM introduced STMM as a way to convince customer executives that DB2 needs
less expertise than other databases because it is self-tuning (or at least a
lot less expertise than needed by previous versions of DB2). This is
supposed to lower the cost of ownership. The reality is 100% opposite of the
claims.

If you want to use STMM, I would recommend that you use 9.7.2. Doesn't work
well in 9.5 and can crash your system, although I think 9.5.5 may be
noticeably better than previous fixpacks. Personally, I think STMM is
worthless anyway and I recommend that it be turned off (default is on for
new databases). If you do use STMM, and end up in a mental institution,
don't blame me.

One other thing. If you intend to take any DB2 certification exams, they
will ask you a bunch of questions about STMM that not even the DB2 STMM
developers would know the answers to.

Richard

unread,
Jun 13, 2010, 5:15:38 PM6/13/10
to
Thanks for your input.
How about the AUTOMATIC database memory - the close cousin to STMM,
how do they fair in real production ?

RL

Mark A

unread,
Jun 14, 2010, 10:11:01 AM6/14/10
to
"Richard" <rsl...@gmail.com> wrote in message
news:065314d9-0315-48bc...@37g2000vbj.googlegroups.com...

Automatic settings are generally recommended, and I have seen no problems
with it.


ajs...@gmail.com

unread,
Jun 14, 2010, 10:29:29 AM6/14/10
to

In general, I'd say that STMM is recommended both in test and in
production. We have thousands of customers happily running STMM in
production and have hit only a hand-full of issues. If you don't
believe me, here are some of our customer quotes around STMM:

"The self tuning memory manager (STMM) technology we now consider a
"must have". We don't let one database go live without this feature
enabled. STMM saves us 1 month of manual adjustments to the memory
model and the fact that it works across instances really helps us
consolidate databases and get the most out of our servers." - Canadian
Department of National Defence

"STMM in DB2 is rock solid, no matter how many transactions our
customers throw at DB2 it just auto configures and hums along." - TMW
Systems

"We are very impressed with the performance improvements achieved with
the DB2 Self-Tuning Memory Manager (STMM). Reports that took two to
three minutes to extract before are now extracted in less than 10
seconds!" - Automatos

Mark, it might help if you outline specific issues that you've hit
with STMM. The vagueness of your response leaves more questions than
answers. For instance, why do you say that STMM in DB2 9.5 can "crash
your system"?

Thanks,
Adam

Jean-Marc Blaise

unread,
Jun 14, 2010, 3:00:52 PM6/14/10
to
I have many customers using STMM in France in DB2 9.5 (from FP3 to FP5) and
we have not problem with it.
We have only deactivated because of 1 application the tuning of LOCKLIST,
that's all.

Regards,

JM

"Mark A" <no...@nowhere.com> wrote in message
news:hv5d9n$v60$1...@news.eternal-september.org...

Mark A

unread,
Jun 15, 2010, 1:00:44 AM6/15/10
to
"ajs...@ca.ibm.com" <ajs...@gmail.com> wrote in message
news:7b8510a1-1bdc-4d1f...@j4g2000yqh.googlegroups.com...

> In general, I'd say that STMM is recommended both in test and in
> production. We have thousands of customers happily running STMM in
> production and have hit only a hand-full of issues. If you don't
> believe me, here are some of our customer quotes around STMM:
>
>
> "The self tuning memory manager (STMM) technology we now consider a
> "must have". We don't let one database go live without this feature
> enabled. STMM saves us 1 month of manual adjustments to the memory
> model and the fact that it works across instances really helps us
> consolidate databases and get the most out of our servers." - Canadian
> Department of National Defence
>
> "STMM in DB2 is rock solid, no matter how many transactions our
> customers throw at DB2 it just auto configures and hums along." - TMW
> Systems
>
> "We are very impressed with the performance improvements achieved with
> the DB2 Self-Tuning Memory Manager (STMM). Reports that took two to
> three minutes to extract before are now extracted in less than 10
> seconds!" - Automatos
>
> Mark, it might help if you outline specific issues that you've hit
> with STMM. The vagueness of your response leaves more questions than
> answers. For instance, why do you say that STMM in DB2 9.5 can "crash
> your system"?
>
> Thanks,
> Adam

1. I cannot disclose the specific extent of the problems I have seen in our
applications due to confidentiality issues (some of which involves IBM).

2. If someone went from 2-3 minutes to 10 seconds on a query (I assume this
is a data warehouse) then they must have used the DB2 defaults previously,
including the pitifully small default bufferpools size of 1000 pages, or
pitifully small sort heaps. Just because the DB2 database defaults are
ridiculously small (most of them in 9.5 have not been changes since OS/2
Database Manager when the largest PC's had 16 MB of main memory), does not
mean STMM works well. Any half-competent DBA could change the defaults to
run quite well without STMM. Unfortunately, not all DBA's are competent, or
more often than not many managers don't think they even need DBA's.

3. I will admit that the problem may be worse with Linux, where DB2 STMM
gives up memory it doesn't need at the moment, but can't get it back when it
asks for it back a few seconds later. We experienced this big time with
locklist and with sort heaps (with disastrous results). However, I have also
heard of others complain about AIX also. I don't know about DB2 on Windows.

4. There have been known problems with multiple instances on the same server
with STMM, and these problems are irrefutable (although may have been fixed
with 9.5.5 or 9.7.x). Most likely STMM works better with one instance and
one database per server. That is not the norm in most enterprises.

5. The whole idea that STMM can manage bufferpools for you completely takes
away the advantage of having multiple bufferpools, since DB2 treats them all
the same. In other words if you have two bufferpools, how does STMM know
that you always want a 100% hit ratio on one of them, but can live with less
than 100% on the other?

6. The DPF Balanced Warehouse Configurations (DPF) do not use STMM and
specifically advise against it in the Balanced Warehouse manuals.

7. Current IBM DB2 Instructors have admitted that STMM does not work well in
9.5 and have recommended that you wait until 9.7. I don't want to mention
their names for obvious reasons.

8. Since I am a former IBM employee and former DB2 instructor when I worked
for IBM, I have some idea what I am talking about.

Yes, STMM may work better than the defaults, but that is not saying much.
And STMM can definitely crash or hang your system. Just ask any IBM support
person. I did admit that it works better in 9.7, but to be honest I fail to
see the benefit, especially for bufferpools (as I mention above when one
wants more than one bufferpool, each with different priorities).

One theoretical advantage of STMM is that is saves memory by giving it up
when not needed, so other parts of DB2 can use it. In practice, memory is
cheap and plentiful, and aside from bufferpools, the amounts of memory we
are talking about are not worth being stingy about. So if one just allocated
liberal amounts of memory to locklist, sortheaps, and the few other heaps
controlled by STMM, then one would almost always be better off (and also you
won't have your logs filled up with notices of STMM adjustments ad nauseam).

Adam, I would suggest that if want some education on this matter, that you
read each and every APAR that has been fixed with each 9.5 fixpack. You can
get the list at the fixpack download site.

In summary, the DB2 default memory allocations were atrocious, and poorly
documented as to how to rationally configure them. In response, we get a
totally automated overkill that doesn't yet work as advertised. Like most
thinks happening in Toronto these days, it is designed to sell DB2 to
executives who don't know any better, not to really help out DBA's.

Also, I would like to know why those DB2 certification tests have such nasty
and complicated questions about STMM?


Mark A

unread,
Jun 15, 2010, 1:02:57 AM6/15/10
to
"Jean-Marc Blaise" <jmbl...@hotmail.com> wrote in message
news:4c167c5e$0$24821$426a...@news.free.fr...

>I have many customers using STMM in France in DB2 9.5 (from FP3 to FP5) and
>we have not problem with it.
> We have only deactivated because of 1 application the tuning of LOCKLIST,
> that's all.
>
> Regards,
>
> JM

Can you explain why it does not work with LOCKLIST for that particular
customer? Some of us (in the real world) don't have the luxury of things
that may, or may not work. We need things that work 100% of the time,
everytime.


ajs...@gmail.com

unread,
Jun 15, 2010, 10:20:46 AM6/15/10
to
Mark,

Thanks for the response. I can respond to some of your points here,
but others we should probably discuss in more detail through email as
they don't seem generally applicable to a broad audience.

> 1. I cannot disclose the specific extent of the problems I have seen in our
> applications due to confidentiality issues (some of which involves IBM).

If you're willing, I'd be interested in learning more about these
problems. You can send me these details via email.

> 2. If someone went from 2-3 minutes to 10 seconds on a query (I assume this
> is a data warehouse) then they must have used the DB2 defaults previously,
> including the pitifully small default bufferpools size of 1000 pages, or
> pitifully small sort heaps. Just because the DB2 database defaults are
> ridiculously small (most of them in 9.5 have not been changes since OS/2
> Database Manager when the largest PC's had 16 MB of main memory), does not
> mean STMM works well. Any half-competent DBA could change the defaults to
> run quite well without STMM. Unfortunately, not all DBA's are competent, or
> more often than not many managers don't think they even need DBA's.

You make a valid point, but it's only partially correct. I agree that
DB2's default configuration will not give you optimal performance.
That being said, there are two things that you're not considering.
First of all, DB2 defaults have changed substantially over the past 10
years. For example, it's now the case that the DB2 Configuration
Advisor will run automatically as part of database creation. After
the Configuration Advisor completes, it will have set 36 of the
database configuration parameters (including the size of the default
buffer pool) based on the machine specifications. The result is a
"default" configuration that is tailored to the environment in which
the database will be run.

The second thing that you may not be considering is that even
competent DBAs do not always have the time to optimally configure each
and every database created at their shop. I know many DBAs that will
devote hours and hours to optimally configure some of their databases
and yet, for their test environments, they're happy to let STMM do the
heavy lifting. That being said, some of these same DBAs have enabled
STMM on their most critical databases and found that its tuning
outperformed their hand tuned configuration.

> 3. I will admit that the problem may be worse with Linux, where DB2 STMM
> gives up memory it doesn't need at the moment, but can't get it back when it
> asks for it back a few seconds later. We experienced this big time with
> locklist and with sort heaps (with disastrous results). However, I have also
> heard of others complain about AIX also. I don't know about DB2 on Windows.

I'd be interested to hear more about these situations, perhaps over
email.

> 4. There have been known problems with multiple instances on the same server
> with STMM, and these problems are irrefutable (although may have been fixed
> with 9.5.5 or 9.7.x). Most likely STMM works better with one instance and
> one database per server. That is not the norm in most enterprises.

I'd be interested in hearing more about the problems you've hit over
email. While there have been some issues with multiple instances that
we've fixed, they only affected a hand-full of customers.

> 5. The whole idea that STMM can manage bufferpools for you completely takes
> away the advantage of having multiple bufferpools, since DB2 treats them all
> the same. In other words if you have two bufferpools, how does STMM know
> that you always want a 100% hit ratio on one of them, but can live with less
> than 100% on the other?

I think you may not completely understand how STMM works with multiple
buffer pools. STMM works to optimize the configuration of multiple
buffer pools not by trying to increase their hit rates, but instead by
determining a configuration that will lead to the minimum possible
amount of time spent retrieving pages from disk. With this model, it
is valuable to treat all of the buffer pools the "same" since each of
them is caching pages in an attempt to prevent disk reads/writes. If
all you care about is overall database performance, the tuning that
STMM provides for multiple bufferpools is extremely effective.


> 6. The DPF Balanced Warehouse Configurations (DPF) do not use STMM and
> specifically advise against it in the Balanced Warehouse manuals.

That is correct. STMM is not recommended in the Balanced Warehouse
because in DPF environments, STMM must be used only on partitions that
have similar memory requirements. That being said, I know of several
customers who are happily running STMM in DPF after exercising the
necessary precautions. You can read more about the precautions here:

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0023815.html

> 7. Current IBM DB2 Instructors have admitted that STMM does not work well in
> 9.5 and have recommended that you wait until 9.7. I don't want to mention
> their names for obvious reasons.

That is unfortunate because that's not the official IBM position.

> 8. Since I am a former IBM employee and former DB2 instructor when I worked
> for IBM, I have some idea what I am talking about.
>
> Yes, STMM may work better than the defaults, but that is not saying much.
> And STMM can definitely crash or hang your system. Just ask any IBM support
> person. I did admit that it works better in 9.7, but to be honest I fail to
> see the benefit, especially for bufferpools (as I mention above when one
> wants more than one bufferpool, each with different priorities).
>
> One theoretical advantage of STMM is that is saves memory by giving it up
> when not needed, so other parts of DB2 can use it. In practice, memory is
> cheap and plentiful, and aside from bufferpools, the amounts of memory we
> are talking about are not worth being stingy about. So if one just allocated
> liberal amounts of memory to locklist, sortheaps, and the few other heaps
> controlled by STMM, then one would almost always be better off (and also you
> won't have your logs filled up with notices of STMM adjustments ad nauseam).

I would strongly disagree with your argument. While memory may be
cheap and plentiful in your shop, most of our customers are
consolidating servers to the point where many databases are all
fighting for the same small amount of memory. It is in these
environments where STMM can be the most effective at managing the
needs of the databases, especially if their peak workload requirements
are at different times of the day. In general, I think you're greatly
oversimplifying the configuration dilemma faced by a DBA in the
absence of tools like STMM.

> Adam, I would suggest that if want some education on this matter, that you
> read each and every APAR that has been fixed with each 9.5 fixpack. You can
> get the list at the fixpack download site.

Mark, I've been personally involved in almost all of the STMM APARs to
date.

> In summary, the DB2 default memory allocations were atrocious, and poorly
> documented as to how to rationally configure them. In response, we get a
> totally automated overkill that doesn't yet work as advertised. Like most
> thinks happening in Toronto these days, it is designed to sell DB2 to
> executives who don't know any better, not to really help out DBA's.

There are a great many DBAs (one of which has already posted to this
thread) who are quite happy with STMM. I think it's a gross mis-
statement of the facts to say that STMM was designed to sell DB2 to
executives as opposed to helping out DBAs.

> Also, I would like to know why those DB2 certification tests have such nasty
> and complicated questions about STMM?

That's a good question, and something I can look into. Unfortunately,
I haven't taken a DB2 certification exam in a number of years so I
don't know the answer off-hand.

Again, I welcome feedback about STMM and am willing to help you
through any issues you may be having. Please follow-up via email.

Thanks,
Adam

Richard

unread,
Jun 16, 2010, 12:45:29 AM6/16/10
to
Thanks for all the expert discussion on this important subject.

Many of Mark's points are valid at least in the understanding what are
some potential problems. I am hoping to get more supposition from
other dba's who have experienced STMM in production.

Please give us your feedback. What was your system like before STMM,
what kind of tuning you did to it, and how STMM improve or not improve
that status quo.

Thanks, Richard

Richard

unread,
Jun 16, 2010, 10:24:10 AM6/16/10
to
B.U.M.P
Bring up my post. -Richard

Michel Esber

unread,
Jun 16, 2010, 1:43:55 PM6/16/10
to
> 2. If someone went from 2-3 minutes to 10 seconds on a query (I assume this
> is a data warehouse) then they must have used the DB2 defaults previously,
> including the pitifully small default bufferpools size of 1000 pages, or
> pitifully small sort heaps. Just because the DB2 database defaults are
> ridiculously small (most of them in 9.5 have not been changes since OS/2
> Database Manager when the largest PC's had 16 MB of main memory), does not
> mean STMM works well. Any half-competent DBA could change the defaults to
> run quite well without STMM. Unfortunately, not all DBA's are competent, or
> more often than not many managers don't think they even need DBA's.


Mark, I am one of the happy customers that have been using STMM in
production and I totally disagree with you.

I believe that STMM is really useful for companies that do not have a
very skilled team of DB2 DBAs. Most DBAs are not more than skilled DB2
users, but are very far from being experts. This includes me.

If you know DB2 internals and are an expert, then yes you probably
don't need STMM. If you'd rather sit in front of a DB2 instance and
fine tune it analyzing performance metrics for hours and hours, then
it is best to leave it disabled.

SInce i don't have enough time and hardcore DB2 DBAs, I use STMM :).

And no, my production instance was not using a default bufferpool.
STMM increased our original bufferpool sizes several times until our
response time, which was very very reasonable for our customers,
became virtually instantaneous.

I am not an IBM lover, really. But STMM is great.

-M

Mark A

unread,
Jun 20, 2010, 1:46:26 PM6/20/10
to
"ajs...@ca.ibm.com" <ajs...@gmail.com> wrote in message
news:6054bd1a-1210-4048...@t10g2000yqg.googlegroups.com...

> Mark,
>
> Thanks for the response. I can respond to some of your points here,
> but others we should probably discuss in more detail through email as
> they don't seem generally applicable to a broad audience.
>
> If you're willing, I'd be interested in learning more about these
> problems. You can send me these details via email.

Thanks for the offer, but after many months of desperation trying to get
STMM to work on a critical database server with multiple instances and
databases, we gave up and just hard-coded the memory values. It was not very
difficult. I don't think there is anything to discuss at this point.

Maybe you could contact someone is support and just go over every PMR opened
on STMM if you want more information on the problems customers have
encountered.

> You make a valid point, but it's only partially correct. I agree that
> DB2's default configuration will not give you optimal performance.
> That being said, there are two things that you're not considering.
> First of all, DB2 defaults have changed substantially over the past 10
> years. For example, it's now the case that the DB2 Configuration
> Advisor will run automatically as part of database creation. After
> the Configuration Advisor completes, it will have set 36 of the
> database configuration parameters (including the size of the default
> buffer pool) based on the machine specifications. The result is a
> "default" configuration that is tailored to the environment in which
> the database will be run.
>
> The second thing that you may not be considering is that even
> competent DBAs do not always have the time to optimally configure each
> and every database created at their shop. I know many DBAs that will
> devote hours and hours to optimally configure some of their databases
> and yet, for their test environments, they're happy to let STMM do the
> heavy lifting. That being said, some of these same DBAs have enabled
> STMM on their most critical databases and found that its tuning
> outperformed their hand tuned configuration.

There were very few default parameters that were changed 10 years ago, and
none of the important ones. In DB2 Version 8.2 (which was used by many until
about 2 two years ago, did not have STMM and had the following as defaults:

LOCKLIST 100 (400 KB)
(Linux/UNIX) IBMDEFAULTBP bufferpool 1000 (4 MB)
(Windows) IBMDEFAULTBP bufferpool 250 (1 MB)
LOGBUFSZ 8 (32 KB)
etc.

These are the same exact values as OS/2 Database Manager circa 1990, when
the largest PC's had 16 MB of memory.

Only after 8.2, STMM was added (introduced in 9.1 but changed in 9.5) and in
9.5 auto-configure was made the default (but not before then).

If the DB2 documentation had been written so people could understand typical
values that should be used (in the Reference Guides, not some other manual)
based on the types of applications, then STMM would not have necessary. The
auto-configure is nice, but that was not invoked automatically until 9.5 and
is not very easy to use properly IMO (using properly would force the
answering of the key questions about the intended database).

The problem of STMM is threefold:

1. When multiple bufferpools are desired to accommodate different priorities
for different tables, then STMM cannot know that, as it treats all SQL (and
the tables they go after) the same, giving equal weight to all of them. If
STMM is used for bufferpools (-2), I am not even sure if there is any
benefit to having multiple bufferpools.

2. STMM can cause severe database server problems, such as when STMM gives
up memory (as it frequently does when not needed at a particular moment) but
cannot get the memory back when it tries to get the memory back a few
seconds later. This obviously does not always happen, but when it does, it
can be catastrophic for a OLTP system with high transaction rates. When we
opened PMR's on this problem, IBM was not able to resolve it (9.5.4).

3. STMM has had problems in the past with being to manage a large number of
databases with multiple instances, especially under Linux. IBM support flat
out told us that automatic instance memory would not work under Linux if
more than one instance existed since DB2 could not coordinate the multiple
instance memory (and the databases within those instances).

As I pointed out, some of these may been fixed in 9.5.5 or 9.7.x, but they
were very serious problems. But even before these newer releases were
available, we were getting the same story from (some) in IBM who claimed
everything was fine with STMM, while we told by other IBM'ers (correctly)
that were still some problems that STMM that occurred in certain situations.
Even though improvements have been made in STMM in the most recent fixpacks
and in 9.7, I am skeptical about the claim now (from people who have not
admitted the past problems) that everything is now working perfectly.

> I'd be interested to hear more about these situations, perhaps over
> email.
>

> I'd be interested in hearing more about the problems you've hit over
> email. While there have been some issues with multiple instances that
> we've fixed, they only affected a hand-full of customers.

I don't have time to do that. We already opened several PMR's and engaged
Lab Services to assist, but nothing worked. So we just hard-coded the values
(which was quite simple for any competent DBA to do).

As to how many people have been affected by the problems, I don't think you
have a good gauge on the real number. In our case we use DB2 Linux for
mission critical databases running moderate to high transaction rates, and
we cannot afford to have any problems. A lot of customers don't have such
mission critical systems, or don't use DB2 LUW for them. For example, IBM
doesn't even have any mission critical systems (where they could go out of
business if a database was down for 4 hours).

Surveys conducted by independent consultants have shown that at least 25% of
customers have had at least some problems with STMM. Those running a single
instance and single database have probably had the fewest problems. AIX or
Windows has probably had much fewer problems than Linux.

For example, a poll conducted on 2010-03-12 by DB2Night (file
20100312DB2Night14.wmv on www.DB2NightShow.com) revealed the following:

Are you using STMM in Productions:

Yes, and with good results 17%
Yes, and uncertain of the measureable results 22%
No, we tried it but suffered with adverse consequences 28%
No, we are still on 8.2 or earlier 6%
No, we are not ready to turn it on yet 28%

BTW, IBM'ers are frequent presenters on the DB2NightShow sessions.

> I think you may not completely understand how STMM works with multiple
> buffer pools. STMM works to optimize the configuration of multiple
> buffer pools not by trying to increase their hit rates, but instead by
> determining a configuration that will lead to the minimum possible
> amount of time spent retrieving pages from disk. With this model, it
> is valuable to treat all of the buffer pools the "same" since each of
> them is caching pages in an attempt to prevent disk reads/writes. If
> all you care about is overall database performance, the tuning that
> STMM provides for multiple bufferpools is extremely effective.

On the contrary, I do understand. As a DBA I don't necessarily want to treat
access to all tables and indexes with the same priority, especially with a
very large database where the data is many times the size of server memory.

"Overall" database performance treats every table and every SQL the same,
which is often not optimum IMO.

Granted, there are many DBA's who don't understand how to set up
bufferpools, but if IBM had provided some documentation and guidance on
this, it could be tuned manually in a matter of seconds.

> That is correct. STMM is not recommended in the Balanced Warehouse
> because in DPF environments, STMM must be used only on partitions that
> have similar memory requirements. That being said, I know of several
> customers who are happily running STMM in DPF after exercising the
> necessary precautions. You can read more about the precautions here:
>
> http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0023815.html

If you read your own doc carefully, it says that STMM can be used for DPF if
all partitions have the same characteristics as follows:

- All database partitions are on identical hardware, and there is an even
distribution of multiple logical database partitions to multiple physical
database partitions
- There is a perfect or near-perfect distribution of data
- Workloads are distributed evenly across database partitions, meaning that
no database partition has higher memory requirements for one or more heaps
than any of the others

For the Balance Warehouse offerings from IBM, all of the above requirements
are true (since IBM has complete control over them). The reason why the
consultants who configured the Balanced Warehouse don't use STMM is because
they have had problems with it, and not because it does not meet the
requeirements stated above.

If IBM solves all the problems with STMM, then that may change, but the
reason for not using it in 9.5 Balanced Warehouse was because of the many
problems they encountered. BTW, the IBM Balanced Warehouse config also says
that auto-configure must be turned off, since it creates havic. I think this
recommendation to shut it off also applied to single partition databases in
9.7.0 (fixpack 0), but I don't recall.

> That is unfortunate because that's not the official IBM position.

IBM'ers who make a living by giving consulting advice to customers for
hundreds of dollars per hour (and even more than that for classes) and who
actually go to customer sites to implement things, cannot worry about the "
the official IBM position" of a bunch of marketing people who are trying to
cram stuff down our throats. The customer comes first, and keeping the
customer systems up and running is more important than your marketing goals.

This is surely the most troublesome comment I have heard from IBM in a long
time, because it shows that IBM is not listening to customers or their own
consultants while trying to force things into the marketplace before they
are fully tested.

> I would strongly disagree with your argument. While memory may be
> cheap and plentiful in your shop, most of our customers are
> consolidating servers to the point where many databases are all
> fighting for the same small amount of memory. It is in these
> environments where STMM can be the most effective at managing the
> needs of the databases, especially if their peak workload requirements
> are at different times of the day. In general, I think you're greatly
> oversimplifying the configuration dilemma faced by a DBA in the
> absence of tools like STMM.

With the exception of bufferpools, the things that STMM controls uses an
insignificant amount of memory in 99% of the cases. I have either 32 GB or
64 GB of memory on all my database servers and with the exception of
bufferpools, the other things that STMM controls don't amount to anything
even close to 1 GB (even with multiple databases). If IBM had set more
realistic defaults for these, or better yet just documented how to set them
for realistic scenarios likely to be encountered, then there would be very
little wasted memory and no need for STMM.

> Mark, I've been personally involved in almost all of the STMM APARs to
> date.

There have been STMM APARs? I thought you clearly implied it has been
working quite well? In fact there have been many PMRs and many APARs and
many changes made to STMM without APARs, to fix the problems.

You are implying that the problems are all fixed now. I am sure there has
been improvements over time up to and including 9.7.2, but since you are not
exactly candid about the past problems with STMM, then how can I trust you
when you now say it works fine now? I can't risk my company on something
that only takes a few minutes to hardcode (about 5 parameters). Also, I
cannot migrate all my databases to 9.7 at this time due to the amount of
regression testing that would be needed on the application side.

If IBM had documented how to set the 5 parameters controlled by STMM (don't
recall the exact number) for various types of database scenarios, STMM would
not have been necessary. The one possible exception is bufferpools, but if
DBA's just allocate 50% of the server memory to bufferpools (the total of
all bufferpools for all databases on the server) then they wouldn't need
STMM to be constantly trying to adjust for them. You would be surprised how
many people try and use the default of 1000 4K pages per database and then
complain about performance.

> There are a great many DBAs (one of which has already posted to this
> thread) who are quite happy with STMM. I think it's a gross mis-
> statement of the facts to say that STMM was designed to sell DB2 to
> executives as opposed to helping out DBAs.

If IBM really cared about their current customers, they would not be
releasing code that has so many bugs. All of these changes are to sell DB2
to new customers who think DB2 is too complex. In some ways it is too
complex, but in reality if IBM manuals provided "how-to" documentation for
setting up the memory values (other than the default, min, and max values)
for various types of common database scenarios, very few, STMM would not be
needed.

> Again, I welcome feedback about STMM and am willing to help you
> through any issues you may be having. Please follow-up via email.
>
> Thanks,
> Adam

I appreciate your offer, but I am extremely busy. I don't need your help
since I solved the problems by hard-coding the STMM memory settings. If you
need my help to debug DB2, then you would have to pay my company for my
time, which I doubt you are willing to do.

One other point. I am not against automating the memory configurations for
DB2. Many of them are/were ridiculously complex. The use of automatic memory
was a great improvement, but STMM is a different story and I can not risk my
company on it based on the problems we have already encountered versus a
payback that questionable assuming a competent DBA is available (and no, it
doesn't take months to tune it, just minutes).


Mark A

unread,
Jun 20, 2010, 1:50:56 PM6/20/10
to
"Jean-Marc Blaise" <jmbl...@hotmail.com> wrote in message
news:4c167c5e$0$24821$426a...@news.free.fr...
>I have many customers using STMM in France in DB2 9.5 (from FP3 to FP5) and
>we have not problem with it.
> We have only deactivated because of 1 application the tuning of LOCKLIST,
> that's all.
>
> Regards,
>
> JM

Well, if the application using that one database were you had a problem with
LOCKLIST was a mission critical application, the STMM problems encountered
could have bankrupted your customer. I need a database that stays up, and/or
doesn't hang, all the time, not just most of the time.

Also, I can't play Russian Roulette trying to figure out when it works and
when it doesn't work.


Mark A

unread,
Jun 20, 2010, 2:00:39 PM6/20/10
to
"Michel Esber" <mic...@automatos.com> wrote in message
news:dcdd14e2-cdbc-47a4...@i28g2000yqa.googlegroups.com...

> Mark, I am one of the happy customers that have been using STMM in
> production and I totally disagree with you.
>
> I believe that STMM is really useful for companies that do not have a
> very skilled team of DB2 DBAs. Most DBAs are not more than skilled DB2
> users, but are very far from being experts. This includes me.
>
> If you know DB2 internals and are an expert, then yes you probably
> don't need STMM. If you'd rather sit in front of a DB2 instance and
> fine tune it analyzing performance metrics for hours and hours, then
> it is best to leave it disabled.
>
> SInce i don't have enough time and hardcore DB2 DBAs, I use STMM :).
>
> And no, my production instance was not using a default bufferpool.
> STMM increased our original bufferpool sizes several times until our
> response time, which was very very reasonable for our customers,
> became virtually instantaneous.
>
> I am not an IBM lover, really. But STMM is great.
>
> -M

If you don't have skilled team of DBA's then you must not be using DB2 LUW
for any mission critical applications that your company absolutely depends
on. I can understand that because most companies, not even IBM, even have
ANY mission critical applications where the company would be out of business
if the database was hung or down for a day. Unfortunately, that is not the
case for my company.

If STMM increased your bufferpool size to noticeably improve performance,
then why didn't you just increase it yourself. DB2 bufferpools should be
about 50% of the server memory, unless your total database size is smaller.
Maybe if IBM had documented this, you wouldn't need STMM.

As far as your comment that "Most DBAs are not more than skilled DB2 users,"
maybe your company just needs some different DBA's That sounds like a
management problem, not a database or DBA problem. There are plenty of
competent DB2 DBA's around, and also competent managers who know how to hire
competent DBA's (and who know the value of having a competent DBA).


Serge Rielau

unread,
Jun 21, 2010, 2:48:05 PM6/21/10
to
Mark,

How about sending a note to Adam with your company affiliation so he can
drill down on the issues your company had on his own?
That's not much work for you and saves Adam from divining up what might
have been wrong in your case.

Cheers
Serge


--
Serge Rielau
SQL Architect DB2 for LUW
IBM Toronto Lab

Mark A

unread,
Jun 21, 2010, 5:19:51 PM6/21/10
to
"Serge Rielau" <sri...@ca.ibm.com> wrote in message
news:889qf6...@mid.individual.net...

> Mark,
>
> How about sending a note to Adam with your company affiliation so he can
> drill down on the issues your company had on his own?
> That's not much work for you and saves Adam from divining up what might
> have been wrong in your case.
>
> Cheers
> Serge

Ok. I will do that.


Frederik Engelen

unread,
Jun 22, 2010, 6:04:05 AM6/22/10
to
Mark,

I'm not going into more detail regarding the whole STMM thing except
saying that it works pretty well for us, as long as we fix the
instance_memory parameter.

What I am curious for is why you would only assign 50% of server
memory to the bufferpools. Give or take a few gigs for OS and database
housekeeping, that would leave half of your server memory unused, no?

Kind regards,

Frederik

Mark A

unread,
Jun 22, 2010, 8:45:56 AM6/22/10
to
"Frederik Engelen" <engelen...@gmail.com> wrote in message
news:cad1fcee-4f54-4f83...@d37g2000yqm.googlegroups.com...

I would normally assign more than 50% of total system memory to buffepools.
But most DB2 novices used the defaults in 8.2 or very small amounts which
are closer to 1% or less, so 50% would be a huge improvement over that. One
might go as high as 75-80% depending on total server memory and other
factors, but in most situations one would not notice much difference between
50% and 75%. Also, at least with Linux, DB2 servers do tend to run out of
memory for various reasons.

The documenation of Linux kernel parameters is someitmes contradictory in
the manuals, or sometimes has been completely lacking.

For example, although not mentioned anywhere in the official doc, some
Redbooks recomend:

vm.swappiness=0 (default for RHEL is 60)
vm.dirty_ratio=10
vm.dirty_background_ratio=5

Recommendations for SHMALL have been all over the place, from 90% of system
memory, 100% of system memory, to now apparently 200% of system memory. Any
changes to Linux Kernel Parm recommendations should be a Hiper doc APAR and
not slipstreamed in the InfoCenter.

I noticed in the 9.7 Fixpack 2 Info Center webpages, they are now enforcing
Linux kernel parms automatically that were not enfoced even in 9.7.1. This
is obviously to fix the problems that customers have been experiencing with
memory on DB2 Linux systems, especially with STMM activated.


Mark A

unread,
Jun 22, 2010, 9:13:25 AM6/22/10
to
"Mark A" <no...@nowhere.com> wrote in message
news:hvqba4$h5f$1...@news.eternal-september.org...

> Recommendations for SHMALL have been all over the place, from 90% of
> system memory, 100% of system memory, to now apparently 200% of system
> memory. Any changes to Linux Kernel Parm recommendations should be a Hiper
> doc APAR and not slipstreamed in the InfoCenter.

Here is where it states SHMALL should be 200% (and now enfoced that way in
9.7.2).
"2 * <size of RAM in bytes> (setting is in 4K pages)"
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.qb.server.doc/doc/c0057140.html

Here is where it states that SHMALL should be 90%:
"...whereas the parameter SHMALL should be set to 90% of the available
memory on the database server."
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.perf.doc/doc/c0054689.html

Is anyone at IBM awake these days?


Serge Rielau

unread,
Jun 22, 2010, 11:25:58 AM6/22/10
to
We are under G20 lock down, tasered into unconsciousness...

The recommendation has been changed from 90% top 200%. So the 90% is
outdated.
I have used the Feedback button in the wrong doc to get this fixed
(Hint, hint, it does NOT require an IBM employee to use this
button...docs are big, mistakes happen)

Cheer

The Boss

unread,
Jun 22, 2010, 12:46:18 PM6/22/10
to
On Jun 22, 5:25 pm, Serge Rielau <srie...@ca.ibm.com> wrote:
> On 6/22/2010 9:13 AM, Mark A wrote:
>
>
>
> > "Mark A"<no...@nowhere.com>  wrote in message
> >news:hvqba4$h5f$1...@news.eternal-september.org...
> >> >  Recommendations for SHMALL have been all over the place, from 90% of
> >> >  system memory, 100% of system memory, to now apparently 200% of system
> >> >  memory. Any changes to Linux Kernel Parm recommendations should be a Hiper
> >> >  doc APAR and not slipstreamed in the InfoCenter.
> > Here is where it states SHMALL should be 200% (and now enfoced that way in
> > 9.7.2).
> > "2 *<size of RAM in bytes>  (setting is in 4K pages)"
> >http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db...

>
> > Here is where it states that SHMALL should be 90%:
> > "...whereas the parameter SHMALL should be set to 90% of the available
> > memory on the database server."
> >http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db...

>
> > Is anyone at IBM awake these days?
>
> We are under G20 lock down, tasered into unconsciousness...
>
> The recommendation has been changed from 90% top 200%. So the 90% is
> outdated.
> I have used the Feedback button in the wrong doc to get this fixed
> (Hint, hint, it does NOT require an IBM employee to use this
> button...docs are big, mistakes happen)
>
> Cheer
> Serge

What I would like to see in the docs (and "Best Practice" documents)
is a rationale for these kind of recommendations.
Why is 200% better than 90%?
And is this valid under all circumstances, like running (many)
virtualised Linux-boxes under zVM (or VMware)?
As is, figures like these are just 'silver bullets' and should be
handled with great caution.

--
Jeroen

Mark A

unread,
Jun 22, 2010, 3:41:38 PM6/22/10
to
"Serge Rielau" <sri...@ca.ibm.com> wrote in message
news:88c30a...@mid.individual.net...

> We are under G20 lock down, tasered into unconsciousness...
>
> The recommendation has been changed from 90% top 200%. So the 90% is
> outdated.
> I have used the Feedback button in the wrong doc to get this fixed
> (Hint, hint, it does NOT require an IBM employee to use this button...docs
> are big, mistakes happen)
>
> Cheer
> Serge
> --
> Serge Rielau
> SQL Architect DB2 for LUW
> IBM Toronto Lab

The following PDF docs on the IBM website still say 90% for SHMALL (I
realize the PDF docs are not always up to date). The point is that the 200%
recommendation in the InfoCenter docs is very recent.
- DB2 9.5 Quick Beginnings for DB2 Servers
- DB2 9.7 Installing DB2 Servers

It also still says 90% at in this "Best Practices" document:
http://www.ibm.com/developerworks/data/bestpractices/systemperformance/

The real problem is that no one seems to have known (maybe until recently)
what the value of SHMALL should actually be set to on Linux, rather than the
problem being any typographical errors in the doc. It may be that the 200%
issue is needed to make STMM work properly on a server with a lot of DB2
instances. One IBM support person implied that to me afew months ago before
it was officially recommended in the InfoCenter to use 200%.

Interestingly, in 9.7.2 (but not 9.7.1 or lower) the 200% value for SHMALL
Linux kernel parm is now enforced by DB2 at instance startup. I am not sure
what is going to happen in 9.5.6, but the same enforcement policy would be
nice, because DBA's typically cannot change these parms themselves (this is
the reality of the corporate world, where the OS people with root are in a
separate organization)..

As far as "docs are big, mistakes happen" there is another version of that
goes like this:
"docs are ... big mistakes happen."

I know it is sometimes hard for IBM'ers to understand, but the multi-billion
dollar company I work for can be out of business in a few hours because of
mistakes like these (and I don't mean discrepancies where it says 2
different things in different places, I mean the 200% recommendation is very
recent).

Another problem is that IBM and Red Hat don't get along that well together
(on a personal level) and they are constantly pointing fingers at each
other. For whatever reasons, IBM seems to be closer to SUSE. Unfortunately,
DBA's in large companies rarely have control of which OS to use. This has
been a problem for us in understanding how DB2 uses memory running on RHES.


Serge Rielau

unread,
Jun 22, 2010, 4:35:51 PM6/22/10
to
Mark,

As you said yourself earlier defaults change (or rather: should change).
90% may have been better in the past and the team has learned that 200%
is better on average now.
I don't think that 90% is wrong or right.
What we are talking about here are best practices and best practices
change as products change and as experience accumulates.

I'm being told that not all the doc changes for the recent change from
90% to 200% have been rolled out yet and the doc team is working on it.

But this is not a HIPER APAR. Your company will not go out of business
because you are still, obviously successfully, using 90%.

Cheers

Mark A

unread,
Jun 22, 2010, 6:13:25 PM6/22/10
to
"Serge Rielau" <sri...@ca.ibm.com> wrote in message
news:88cl59...@mid.individual.net...

You are wrong on several fronts.

When IBM released 9.5 and 9.7, 90% was unequivocally specified as the value
to use for SHMALL Linux kernel parm. I provided proof of this in the PDF
manuals (the 9.7 PDF manuals are not very old). 90% has apparently been
discovered to not be appropriate, and now 200% is recommended (as of just a
few months ago). This had nothing to do with DB2 code changes, it had to do
with problems with STMM (I was told by an IBMer that this was particularly
the case if a server had a lot of DB2 instances and was doing high volume
production). The recommendation is retroactive to 9.5.0.

You are claiming that 90% is neither right or wrong. But the 200% setting is
now enforced in future fixpacks (already in 9.7.2 and maybe in 9.5.6), so
that tells me that someone in IBM think 90% is wrong, although admittedly,
not every customer is going to encounter a problem if their server has only
one DB2 instance and is only moderately loaded or not using STMM.

The only reason we are successfully using 90% is because we turned off STMM.
So either the 90% is wrong, or STMM doesn't work correctly (or some
combination of the two).

My company suffered serious customer relations problems with several large
customers because of STMM and memory problems in general, and I don't need
you to tell me what happened, since you don't know a anything about it.
Sounds to me you have working been on that Oracle compatibility code for so
long that you are starting to sound just like Oracle. Maybe you should apply
for a job with them.


ajs...@gmail.com

unread,
Jun 23, 2010, 10:43:55 AM6/23/10
to
Mark,

Rather than continue what I'm sure could be an endless debate, let me
try to bring this discussion to a close. Here's what we've heard so
far:

- At your company you hit some very specific problems with STMM which
occur only on Linux and/or when running a large number of instances/
databases on the same physical machine.
- You agree that these problems have been fixed in more recent
fixpacks.
- Several other customers have chimed in to report that STMM has been
beneficial in their environments.

While I'm not trying to minimize the problems you encountered, it's
always helpful to have some perspective. It's true that there were
some problems with some of the initial versions of STMM (something
which I have never attempted to deny). That being said, if you look
at these problems in the context of the number of customers that are
using STMM, the issues have only affected a handful of customers
running in some very specific environments. It's unfortunate that you
happened to be one of those customers and that it affected your
business so much.

In all my discussions with customers I can say that the overwhelming
majority of those who have tried STMM have been very happy with it. I
am aware of that DB2Night Show poll (I have appeared on the DB2Night
Show myself), but am not sure what to make of it. It's been my
experience that many customers who claim to have had "issues" running
STMM have either not been using it correctly (enabled only one or two
memory consumers) or have expected it to achieve an ideal memory
configuration immediately (when in fact STMM works to achieve an ideal
configuration over time). Without more detailed information, it would
be incorrect to use those numbers to support your argument that a
significant number of people have experienced severe issues as a
result of STMM.

If you feel that it's easy to tune the database memory on your
systems, then you are correct to disable STMM. You'll get consistent
performance (even if it is likely sub-optimal) and you won't have to
worry about hitting some rare STMM problem. Let me reiterate though,
that if you find memory configuration easy, you are in the minority.
I know from speaking with hundreds of customers that most of them find
memory configuration challenging, and for them, STMM is welcome
relief.

I'm still looking into the specific issues that you encountered and
will be in touch via email with any information I can dig up.

Thanks,
Adam

Mark A

unread,
Jun 23, 2010, 4:43:19 PM6/23/10
to
"ajs...@ca.ibm.com" <ajs...@gmail.com> wrote in message
news:9c32158e-f666-4c2b...@k39g2000yqb.googlegroups.com...

I do not know if the problems with STMM have been fixed in the latest
fixpacks. I said that is a possibility they have been fixed, but I have not
tried STMM with anything later than 9.5.4, which had a lot of problems. I
try not to jump to conclusions about what I have not personally tested, but
it seems like you are misquoting me. Several IBM'ers told me it doesn't work
well in 9.5, but works better in 9.7, so not sure what exactly that means as
to whether the problems are fixed in 9.5.5.

I also don't know if the problems with STMM are isolated to Linux. I said
that is a possibility (since I have only used STMM with Linux). Again, you
misquoted me. Based on the survey done on DB2Night (during same session) to
poll what OS's are being used for DB2, it seems likely the STMM problems
also affect other OS's (AIX, etc) since Linux is not widely used enough used
to account for all the negative STMM responses. I don't believe that your
statement that STMM only affecting a "handful" of customer is anywhere close
to accurate.

Here is just one person who had problems with STMM on DB2 for AIX:
http://www.dbforums.com/db2/1646953-stmm-does-not-allocate-enough-sheapthres_shr.html

I don't think that tuning of the DB CFG is particularly easy in DB2, but
STMM is not easy either, especially when there are problems, and then the
problems can be monumental. I noticed that you think the reason many
customers complain about STMM (and have negative responses to DB2Night
surveys about STMM) is that they are not using it correctly. If that is so,
it just proves my point that STMM is actually much more complex than you
have claimed. The DBA's who participate in the DB2Night.com surveys are
typically not novice DBA's.

When DBA's say that they have problems with STMM, they usually don't mean
that there is a slight sub-optimal memory allocation problem. They mean that
they are having serious DB2 memory problems that lead to denial of service
or extremely poor response time. I don't think many would notice, or even
care about, a "slightly" sub-optimal STMM so long as it was in the ballpark
of acceptable performance and did not cause serious database availability
problems.

Your assumption that my not using STMM would result in sub-optimal memory
configuration is false. The problems we have encountered with STMM are
specifically that it frequently gives up memory it thinks is no longer
needed at a particular moment, and then when it tries to re-acquire the
memory a few seconds later when it is needed again, DB2 throws errors in the
diagnosis log that say that DB2 cannot get the memory requested from the OS.
This specifically happens on LOCKLIST memory and sort memory heaps. When
this happens, the memory heap in question is extremely low (because STMM
shrunk it previously) causing one or more of the following:

- lock escalation is out of control when locklist has insufficient memory,
resulting in significant locktimeouts and deadlocks.
- all sorts spill to temporary tablespaces instead of memory sorts
- user bufferpools cannot be allocated, and system bufferpools get 100%
filled with dirty pages and SQL statements start getting error messages,
- new apps cannot connect because shared memory segments cannot be
allocated,
- CPU reaches 90%+ because the system is extremely slow, and application SQL
requests are getting stacked up in the queue faster than DB2 can process
them
- etc

Given that we had severe problems with STMM, we decided to hard-code the
following, which was quite easy and solved all our problems:

LOCKLIST 16384
MAXLOCKS 30
SHEAPTHRES_SHR 40000
SORTHEAP 8000
Bufferpools (separate issue in our database which has multiple bufferpools
for a reason, but normally 50% of server memory for bufferpools is fine)

Given adequate information in the documentation, I think anyone could
configure these even without my suggestions above (but there are no real
recommendations). The numbers I provided above would work fine for 99% of
databases. The sum total of the above values (not including bufferpools) is
not even 250 MB, so one is hardly "wasting" any significant amount of memory
by hard-coding them (and by hard-coding them one will be saving CPU time by
not having STMM constantly trying to tune and adjust them).

I don't have any theoretical problem with DB2 managing memory for me (not
trying to protect my job), or else I wouldn't have tried STMM in the first
place (maybe with the exception of bufferpools which is totally separate
issue if one has multiple bufferpools for very specific reasons).

I am all for trying to make DB2 easier to configure and use, but I am very
concerned when IBM tries to brow-beat people into using new features before
they are completely debugged. Doing that only harms IBM because one of the
main advantages of DB2 over other database products (some more expensive,
and some much less expensive) has been the reliability and support we get
with DB2. If customers don't have that reliability level with DB2 anymore
because of problems due to untested product features, then that is not in
the best interests of IBM, regardless of what marketing strategy some at IBM
have dreamed up to postition the product as easy to use.


Liam

unread,
Jun 24, 2010, 9:54:00 AM6/24/10
to
On Jun 22, 6:13 pm, "Mark A" <no...@nowhere.com> wrote:
> "Serge Rielau" <srie...@ca.ibm.com> wrote in message

It sounds like I may regret jumping in on this thread, but I'll just
throw in some of the rationale behind this change in
recommendation.... I'll start with some background on how DB2 uses
shared memory, and then get to the whole 90% vs 200% recommendations.

The SHMALL kernel tuneable on Linux has always been a hindrance to any
programs that make extensive use of shared memory. My personal
opinion is that Linux should remove this tuneable completely (along
with SHMMAX), particularly since the total amount of shared memory
"created" on the box doesn't have any real impact on the system unless
that memory is actually consuming either RAM or swap space. Linux
(and all other OSes) do a good job of virtualizing memory to
processes, so it shouldn't really matter how much "virtual" memory is
consumed by the sum of all shared memory segments created on the
system, as long as the total of all RAM pages (including both
committed shared memory pages, and committed private memory pages) is
less than the amount of memory on the box (or at least that all the
swap space hasn't been consumed, but I'm sure we all agree that we
don't want database servers to start swapping).

As for how DB2 comes into the picture, most of DB2's memory
allocations are from shared memory regions. As of 9.5 (when DB2 went
threaded), it became much easier for DB2 to "grow" it's shared memory
regions, by simply creating new shared memory segments (all EDUs are
just threads, so are implicitly connected to the new shared memory
segment). Growing is just half the battle though, STMM needs to be
able to "shrink" DB2's memory footprint when it detects there is too
little free memory left on the box. To accomplish this, we use APIs
provided by the OS (all OSes DB2 runs on have their own flavour of
APIs for this) to "decommit" portions of shared memory segments,
meaning the OS will release any RAM + backing store consumed by those
shared memory pages (thus, increasing the amount of free RAM on the
box for other programs to use). If STMM later decides there's enough
free RAM and can grow again, we will first re-commit those regions,
and if we need to grow more, will then allocate new shared memory
regions again. This is where I'm confused by one of your other
comments saying that STMM cannot reclaim this memory.... re-committing
memory on UNIX is as simple as touching those memory pages again,
there is no OS API that needs to be called (the OS just faults those
pages in on demand), so as long as STMM sees enough free memory on the
system, we should have no issue reclaiming that memory. I'm sure Adam
will contact me when he digs up the info on this particular issue :-)
Note that Windows is slightly different - we need to issue an OS API
to re-commit memory there, so in that case, there is a chance that the
OS will deny the request, but that should not happen on any UNIX
platform.

Now for the SHMALL recommendation.... We really wanted SHMALL to be
set out of the way, but we were reluctant to recommend customers set
that to a value greater than RAM. I would wager that most DBAs are
probably not intimately aware with how the various OSes implement
their virtual memory managers, so it would seem odd to recommend
setting SHMALL larger than RAM (i.e. doesn't that mean it will cause
paging?!). So, the recommendation was to set this to a value that is
sufficiently large such that a single DB2 instance can use most of the
memory on the box with no issues - the assumption being that the OS,
file cache, etc will use up at least 10% of free memory on the box, so
total committed memory by DB2 should never be greater than 90% of
RAM. This recommendation was fine prior to 9.5, and should still be
acceptable in most 9.5+ systems if STMM is not enabled. However, once
STMM is enabled, we start monitoring free memory and will start
shrinking our committed memory footprint (RAM) when needed, however,
our shared memory segment footprint that is accounted for by SHMALL
stays the same. Again, with a single DB2 instance on the box, the
original 90% recommendation should still be fine, since we will favor
re-committing that memory prior to creating new segments, so our total
shared memory footprint should stay below the 90% SHMALL limit.
However, this SHMALL recommendation breaks down as soon as there are
multiple instances. Consider a simple scenario where one instance
starts up with STMM enabled, and that instance sees plenty of free
memory, so grows DB2's shared memory regions to account for, say, 75%
of memory on the box. Now, a new DB2 instance starts up, again with
STMM enabled. STMM will see this new memory pressure on the box, and
start releasing RAM, but will still consume 75% of the SHMALL limit.
If both instances have a fairly "equal" need for memory, the new
instance would try to grow it's shared memory to account for 37.5% of
memory on the box, however, since the first instance is already
consuming 75% of the SHMALL limit, the second instance cannot grow
that much, and will be limited to at most 15% of RAM on the box due to
SHMALL. This is the main reason why the new 200% recommendation is
coming in - we have to bite the bullet and recommend setting SHMALL
larger than RAM so that customers with more than one instance on the
box, where those instances' memory consumption is controlled primary
by STMM (so will grow and shrink), are less likely to hit this limit.

So, although far from perfect, the new 200% recommendation is to help
ensure that a larger group of customers are not affected by how SHMALL
interacts with STMM. The unfortunate part is that now that we
recommend setting SHMALL larger than RAM, it raises more questions, so
we now have to try to explain how SHMALL interacts with RAM on the box
- details that most DBAs should not need to care about (as long as
we're not causing pageing, of course!).

From what I've heard, this same recommendation will be applied to all
9.5+ versions of the docs, it will just take some time to get those
docs updated.

Hope this helps clarify this situation....

Cheers,
Liam.

Mark A

unread,
Jun 24, 2010, 10:12:44 PM6/24/10
to
"Liam" <lemon...@gmail.com> wrote in message
news:eed86f59-bf21-4530...@i28g2000yqa.googlegroups.com...

> It sounds like I may regret jumping in on this thread, but I'll just
> throw in some of the rationale behind this change in
> recommendation.... I'll start with some background on how DB2 uses
> shared memory, and then get to the whole 90% vs 200% recommendations.

I don't have a problem if IBM wants to change the recommendation. I don't
really want to know "why" I just want to know what needs to be done to
properly configure DB2. If DBA's are not knowledgable enough to hard-code
4-5 database config parameters, then surely they are not knowledgable about
Linux kernel parms and are looking for IBM to tell them what to do (or
better yet for DB2 to automatically configure the parms). DBA's usually
don't have root authority to change the parms themselves anyway, since this
is up to the OS admins.

> ...This is where I'm confused by one of your other


> comments saying that STMM cannot reclaim this memory.... re-committing
> memory on UNIX is as simple as touching those memory pages again,
> there is no OS API that needs to be called (the OS just faults those
> pages in on demand), so as long as STMM sees enough free memory on the
> system, we should have no issue reclaiming that memory. I'm sure Adam
> will contact me when he digs up the info on this particular issue :-)
> Note that Windows is slightly different - we need to issue an OS API
> to re-commit memory there, so in that case, there is a chance that the
> OS will deny the request, but that should not happen on any UNIX
> platform.

We observed that about 1-2 weeks after rebooting the server (for the reboot,
we used HADR takeovers so that service was not interrupted), that DB2 could
no longer acquire the memory it needed (especially for LOCKLIST and the sort
heaps). DB2 was throwing numerous errors in the DB2 diagnosis log stating
that memory allocation errors had occured trying acquire memory for these
heaps. Unfortunately, I don't have the log in question (I am not the
principal DBA for the application that had this problem) and it looks like
the logs from 6-8 weeks ago have rolled off. But it was clear that DB2 would
give up the memory and not be able to get it back when needed for both
LOCKLIST and sort heaps. This was on a server with 64 GB of memory and
bufferpools hardcoded at about 10 GB total for all databases, and all other
parms set to the db and dbm defaults, and with STMM on.

This sounds reasonable to me and I appreciate the detailed explanation, but
I am not (and don't want to be) a OS memory expert. Neither do most DBA's.
That's why STMM sounded so attractive to begin with (so we don't have to
know anything about DB2 memory heaps and OS memory kernel parms). I think
the vast majority of customers want IBM to tell DB2 customers how to
configure DB2 properly, with or without STMM, or better yet have DB2 set
these values automatically.

Here is a summary of the situation from my perspective as a customer:

1. DB2 9.5 and later has STMM enabled by default.

2. The original recommendation for SHMALL was 90% of server memory. This is
documented in the PDF manuals I mentioned in a previous post for both 9.5
and 9.7 (so the 90% recommendation stood until fairly recently since it was
in the 9.7 PDF doc).

3. Customers who have STMM on by default, and have a lot instances with high
transaction rate applications (needing locklist and sort memory), and had
SHMALL at the original recommendation 90% of server memory, are susceptible
to the possibility of serious database memory problems. I have confirmed
that this has happened on both Linux and AIX but searching various forums.

4. Because of problems using the 90% value with STMM (with multiple DB2
instances), IBM is now recommending 200% of server memory for SHMALL. IBM is
so confident about the need to use 200% that apparently in 9.7.2 (but not
before then) DB2 sets SHMALL to 200% automatically (that is what the
InfoCenter doc says, although I still need to verify that this happens
9.7.2).

5. The 90% value is not a problem if one has STMM turned off, or has it
turned on with a small number of instances. But STMM is turned on by
default.

6. I was admonished earlier in this thread for suggesting that the change
made in the SHMALL recommendation (90% to 200%) in the online InfoCenter
should have been included in a Hiper APAR to make sure all customers knew
about the potential problem and solution. It seems to be that IBM has been
trying to hide (or minimize) the problems with STMM, for whatever reasons. I
am very disappointed that IBM did not communicate this known problem sooner
to its customers because it had a big impact on my company.

7. I mentioned in a previous post that it is possible that IBM has now fixed
the problems with STMM (especially if one knows the recommendation for
SHMALL has changed). I don't know one way or the other. But since IBM ahs
not exactly being candid about the past problems, I am not going to
automatically assume that everything is now fixed. I have heard from some
IBM'ers that STMM works better in 9.7 than in 9.5. Others can decide for
themselves how much risk vs. reward there is in using STMM at this time.

8. The db and dbm configs are still fairly complex, and it has gotten much
more complex with STMM, IMO. There are many things that a DB2 DBA can set in
the db and dbm config, but not many actually know that STMM only controls
these:

LOCKLIST
PCKCACHESZ
SHEAPTHRES_SHR
SORTHEAP
All buffer pools.

9. If a DBA hard-coded these as follows below, it would work fine for 99%+
of databases and one will not encounter the memory problems with STMM
discussed above. The total memory for these parms is about 250 MB (not
counting bufferpools) even when they are hard-coded (they could actually
grow much higher if STMM was enabled).

LOCKLIST 16384
MAXLOCKS 30 [%]
SHEAPTHRES_SHR 40000
SORTHEAP 8000
PCKCACHESZ 4096
bufferpools (set total of all bufferpools in all databases to about 50% of
server memory, or size of database, whichever is less) .

That's all there is to it. Not that complicated, and don't need a OS Admin
with root, don't need to be a OS expert or understand how DB2 shared memory
works. All the other parms in the database can be set to the defaults, and
you can even leave STMM on if you want (but it may not be doing anything
unless two or more of the above are set to automatic--but don't quote me on
this exactly because STMM is so complex very few actually understand it).

10. If IBM had published the above recommendations (or something similar) in
the DB2 reference manuals (or made them the defaults--except for
bufferpools), IBM would not have had to spend millions to develop STMM.
Bufferpools could be configured some other way, other than dynamically
changing them via STMM.

> So, although far from perfect, the new 200% recommendation is to help
> ensure that a larger group of customers are not affected by how SHMALL
> interacts with STMM. The unfortunate part is that now that we
> recommend setting SHMALL larger than RAM, it raises more questions, so
> we now have to try to explain how SHMALL interacts with RAM on the box
> - details that most DBAs should not need to care about (as long as
> we're not causing pageing, of course!).
>
> From what I've heard, this same recommendation will be applied to all
> 9.5+ versions of the docs, it will just take some time to get those
> docs updated.
>
> Hope this helps clarify this situation....
>
> Cheers,
> Liam.

Liam, thanks for the detailed explanation. I will pass this on to our OS
Admins to set the kernel parms correctly. Your explanaiton may help if they
baulk at the latest 200% recommendations in InfoCenter. For some of my apps,
I cannot migrate to 9.7 anytime soon, so we have to change the kernel parms
in Linux for DB2 9.5, but still hoping that SHMALL is set automatically by
DB2 in 9.5.6 when it is available..


ajs...@gmail.com

unread,
Jun 25, 2010, 8:19:56 AM6/25/10
to
Mark,

I think that is a valuable summary. While I disagree with your ninth
point, you're certainly entitled to your opinion. My experience with
hundreds of DB2 customers tells me that hard coded values will not
work in 99% of cases, which is why we don't document any hard coded
recommendations. Instead, we've tried hard with STMM to create a
feature that works well for 100% of customers. Were we 100%
successful with our first attempt? Has any software product ever been
released entirely problem-free? No. This is why we're using
subsequent fixpacks to fix all issues that have been reported by
customers, such as yourself.

Is STMM in the latest 9.5 and 9.7 fixpacks perfect? It would be
foolish for me to claim that it is. That being said, the problems you
mention above have been fixed, and should no longer be a cause for
concern. From its inception, STMM has provided value for a great
number of customers and it will continue to do so. Yes, there are/
were some "gotchas" when running STMM in its first few releases.
We're hopeful however, that most of the serious problems have been
resolved.

Thanks,
Adam

Mark A

unread,
Jun 25, 2010, 9:44:45 AM6/25/10
to

"ajs...@ca.ibm.com" <ajs...@gmail.com> wrote in message
news:40152b40-fd96-4e17...@y4g2000yqy.googlegroups.com...

DBA's have had to hard code values for those 4-5 memory heaps since 1990
(until recent releases of DB2 that had STMM), so I am not suggesting
anything new. The defaults were way too low and were not changed via
auto-configure on by default until after the decision was made to develop
STMM (this is the catch-22).

There has not been any documentation in the reference manuals for the last
20 years on what realistic values should be for those 4-5 memory heaps,
which has been a problem for many customers, especially given that the
defaults were so low. Just listing the default and range of possible values
is not sufficient IMO. If you don't like my values, then a couple of
different of scenarios (different types and sizes of databases) could have
been discussed with different values for each scenario. It does not hurt to
over allocate a little bit, since memory is cheap and plentiful these days
and these setting use a very small amount of memory (with exception of
bufferpools, which is a totally different subject). These memory allocations
don't need to be perfect anyway, so long as the defaults are not used.

It comes down to risk vs. reward. For a company without any competent DBA's,
the benefit of STMM may outweigh the risk (especially if they don't have
mission critical DB2 databases). For other companies, the opposite may be
true. I don't understand why IBM is pushing so hard for everyone to use STMM
when it may not be appropriate for every situation at the current time.

ajs...@gmail.com

unread,
Jun 25, 2010, 3:53:30 PM6/25/10
to
On Jun 25, 9:44 am, "Mark A" <no...@nowhere.com> wrote:
> "ajst...@ca.ibm.com" <ajst...@gmail.com> wrote in message

I completely agree with you that our documentation was lacking in this
area for a long time. While we could add to it now, most of the work
has already been done automatically and under the covers (since v9.1
at least), now that the configuration advisor is being run on database
creation. The new defaults (i.e. what will be generated by the
configuration advisor when the database is created) are in the same
ballpark as what you've suggested.

> It comes down to risk vs. reward. For a company without any competent DBA's,
> the benefit of STMM may outweigh the risk (especially if they don't have
> mission critical DB2 databases). For other companies, the opposite may be
> true. I don't understand why IBM is pushing so hard for everyone to use STMM
> when it may not be appropriate for every situation at the current time.

I agree with this too. I think somehow you may have misinterpreted
IBM's stance on STMM. While STMM is enabled by default, customers can
obviously turn it off if they want. In a past post you mentioned that
we're "brow-beating" people into using the feature. As far as I'm
concerned, that's not the case. In all of my interactions with
customers I try to lay out the best uses for STMM, as well as the
cases where it might not be advantageous. For example, I always
mention that if you have a database that is performing well, it might
not be a good idea to enable STMM because you have very little
upside. Additionally, in cases where DBAs are present, and are
experienced in memory tuning, there might not be substantial benefit.
As you mention however, in companies where DBAs aren't experienced, or
do not have the time to tune every single database that gets created,
there is benefit to using STMM.

Thanks,
Adam

0 new messages