Our Environment:
Our servers are currently configured to use "relaxed LRU".
We are running into tons of memory errors / messages (both
4K and 32K memory pools are spitting out memory errors). We
are doing a "bulk insert" from regular database tables to
datamart tables (also regular tables). This is when both
the 4K and 32K pool memory errors are generated. The error
messages are below.
"The 4K memory pool of named cache default data cache (cache
id 0, cachelet id 1) is configured too small for current
demands (state 2). Transaction progress may cease or
response time may increase."
- AND -
"The 32K memory pool of named cache default data cache
(cache id 0, cachelet id 1) is configured too small for
current demands (state 2). Transaction progress may cease or
response time may increase."
Question:
If we set the Replacement Policy from "relaxed LRU" to
"strict LRU" will this resolve the memory issues?
P.S. Please forgive us for posting this question in a
different way than our prior post. We just want to rule out
that changing the Replacement Policy from "relaxed LRU" to
"strict LRU" will not resolve the issue and is not an
option.
Thank you.
The recommended policy for the default data cache is the default policy, ie, 'strict LRU'.
Primary reason is that for the vast majority of users the default data cache is a catch-all for all activity that isn't
explicitly bound to a user-defined data cache, to include ongoing/moderate/heavy/periodic page writes. This
ongoing/moderate/heavy/periodic write activity is the basic reason you'll rarely find a valid recommendation for using
'relaxed LRU' in the default data cache.
While no one can give you a 100% guarantee that the use of the 'strict LRU' policy will [not] help your situation, it
sounds like you've already figured out that you (probably) have enough intermittent write activity in your default data
cache that precludes the use of 'relaxed LRU'.
I'd suggest you make the change and see what happens.
NOTE: I don't recall ever seeing these messages in a default data cache configured with 'strict LRU'.
NOTE: I *do* recall seeing these error messages (quite often) in default data caches configured with 'relaxed LRU'.
-----------------------
"But sp_sysmon recommends that I use a 'relaxed LRU' policy in the default data cache." ?
sp_sysmon will make generic tuning recommendations based on currently available monitoring data (eg, if cache hit rates
are > 99% then recommend 'relaxed LRU').
It is quite common to run sp_sysmon multiple times and get conflicting recommendations for various tuning suggestions
(eg, 'relaxed LRU' vs 'strict LRU', 'increase ULC' vs 'decrease ULC', etc).
If/when you convert your default data cache to use 'stict LRU', don't be surprised to find the occasional sp_sysmon
recommendation to convert it back to 'relaxed LRU'.
When it comes to tuning recommendations sp_sysmon follows a very limited, hard-coded list of if-then-else rules.
sp_sysmon does not know enough about your entire environment and all of your possible possible workloads. sp_sysmon
does not maintain/compare results from different sp_sysmon invocations. sp_sysmon does not distinguish between
user-defined data caches and the default data cache when making cache tuning recommendations.
sp_sysmon tuning recommendations should be looked at as more of a hint that you *may* want to look into your current
dataserver configuration to see if there could be some tuning opportunities based on *your* knowledge of *your* environment.
The unfortunate part is that for Sybase DBAs just starting to look at sp_sysmon output, these tuning recommendations are
typically more confusing than they are helpful.
-----------------------
"But the error messages say my cache is 'too small'." ?
This error message, when related to the cache replacement policy, is one of those situations where Sybase could have
done a better job at wordsmithing their error messages. (If you work with Sybase long enough you're bound to run into a
few others! ;-)
When using a 'relaxed LRU' policy this message can imply that the dataserver is having a hard time finding clean and/or
unused pages in the data cache pool. Instead of a thorough search of the entire pool, it will make some effort and then
complain (see error message). The upside of this policy is that there is minimal overhead in maintaining the pool; the
downside is that it may become detrimental to overall performance if the dataserver cannot find a clean/unused page in a
timely manner.
With a 'strict LRU' policy all clean/unused pages are maintained in one end of the pool, thus making it easier for the
dataserver to find a 'new' page when it needs one. The upside of this policy is a reduction (elimination?) of the 'pool
is too small' error message, ie, the dataserver can find an unused page in a timely manner; the downside is that there's
a slight overhead cost to maintaining the pool in this manner.
But let's play devil's advocate for a second ... let's say the 'pool is too small' message has nothing to do with the
cache replacement policy. Let's assume that the pool really is 'too small'.
OK, so how would you go about making a pool 'big enough' for your current activity?
Does the pool have to be big enough to hold an entire table(s) and/or index(es)? What about different queries running
at the same time but against different tables/indexes, ie, do you have to size the pool large enough to hold all
possible tables/indexes? Do you have to size the pool large enough to hold the entire database(s)?
The answer to all of these questions is, and has to be, "No". If any of the above were true then most Sybase
installations would fail to run, ie, most companies do not have the money or hardware (if it's even available) to put
100's of GB's (and even TB's) of databases completely into data cache.
While throwing more memory at a pool *may* help your situation ... what happens if/when you add more memory and you
still get the 'pool is too small' error message? If you get to this point then you really need to step back and take a
more detailed look at your environment (eg, replacement policy, actual queries being run - can they be improved, etc.)
In the end, throwing more memory at a 'pool is too small' error message is an attempt at addressing the symptom when you
really want/need to address the root cause of the error message.
One potential root cause is the replacement policy. Change the policy and see if this solves the problem.
If a change to a 'strict LRU' policy does not reduce/eliminate the 'pool is too small' error messages, then the next
step will require an analysis of the workload on your dataserver (eg, looking at query efficiency, looking at batch job
efficiency, etc).
The majority of the DBAs in this newsgroup who work in the deep-end of the pool (eg, 100-500+GB dataservers, dozens and
100's of dataservers, diverse applications and batch job requirements, diverse developer/SQL capabilities, etc) rarely
run into the 'pool is too small' error message ... and I guarantee that very, very few (if any) of us have the luxury of
stuffing our 100+ GB dataservers completely into a data cache pool.
-----------------------
"So, what good is the 'relaxed LRU' policy?" ?
The 'relaxed LRU' policy is ideally used in a data cache where there is little (preferably no) write activity. The idea
being that the dataserver never has to go looking for an unused page. (One obvious (?) contradiction occurs during the
initial population of said data cache pages from disk, eg, while running a cache warming query just after dataserver
startup.)
So for example, a 'relaxed LRU' policy could come in handy for a user-defined cache that has been configured large
enough to hold a set of commonly used reference tables which are pre-loaded at dataserver startup and then maintained in
the cache for always-fast access.
Another view is that the 'relaxed LRU' policy may be ideally suited for short-lived, sequential-in-nature queries where
the dataserver can always find an unused page as it goes around the circular list of pages (ie, the next page in the
list is the oldest page and thus no longer in use).
One case of this situation could be a cache setup for tempdb and/or log activity. It the case of a log cache the idea
is that all transactions are (relatively) short-lived thus insuring plenty of time for the changes to be written to disk
before the dataserver makes a complete circuit of the linked list of pages. Same idea applies to the idea of a tempdb
cache where you can be assured that all tempdb activity is short-lived.
Obviously (?) the viability of using the 'relaxed LRU' policy in a log and/or tempdb cache will be based on the workload
in *your* environment.
One other example where you could probably get away with the use of a 'relaxed LRU' policy ... you could use it for a
default data cache where all dataserver activity can easily fit into the data cache *and* the data is read-only.
There are probably a few other times when you can use a 'relaxed LRU' replacment policy ...
-----------------
The net result is that you should always be safe running with a 'strict LRU' policy for all of your data caches.
But if you have a mostly (all?) read-only database, or a cache with verifiable short-lived activity, you may be able to
obtain some performance improvement if said cache is configured with a 'relaxed LRU' policy. The use of a 'relaxed LRU'
policy will obviously (?) have to be based on the particulars of your dataserver environment.
Your advice and detailed explanation was awesome!!! I'm not
kidding! Wow, the feedback and detailed information you
provided was right on target! Your feedback addressed all
our concerns and quetions.
Sincere thanks for your help and time you have set aside to
give such great feedback and insight!
Thank you,
Susan