Many thanks
Colin Dawson
www.ladbrokes.com
The Checkpoint Stall Problem
We are all aware that updates must wait while a checkpoint completes, and
that SELECTs are not affected. This is not quite true! Read on.
If you have more than 500 dirty buffers at checkpoint time, the checkpoint
thread will not relinquish control of CPU VP#1 until all buffers have been
handed off to CLEANER threads-as long as there are page cleaner threads
available that are not busy. Since user threads currently executing in CPU
VP#1 are not migrated to other VPs prior to beginning the checkpoint, or you
may only have one CPU VP, this means that even FETCHs and OPENs will hang
for approximately the first half of the checkpoint duration (longer if you
set LRU_MAXDIRTY and LRU_MINDIRTY high). If you have response-time
requirements measured in seconds, this can be deadly.
If no page cleaner thread is ready, then the checkpoint thread will
relinquish the VP and place itself on a wait queue until a cleaner thread
frees up. With less than 500 dirty buffers, a single cleaner is launched to
handle all dirty buffers and the checkpoint thread relinquishes the CPU VP
to other threads after that launch. This sounds better, but the result is
the same.
Based on this, I am recommending that sites with large numbers of buffers
and short response time requirements configure with one or only a few page
cleaners so that the checkpoint will have to enter a wait state while the
previous page cleaners complete their last assignments. This should cause
the checkpoint thread to release the VP and permit queries to continue. This
issue is being examined by Informix R&D and, by the time this goes to press,
I should have had an opportunity to test the latest theoretical solution.
Search the C.D.I. archives at http://www.iiug.org/ for my results.
sending to informix-list
There is a problem at checkpoint time for online systems with huge
buffer pools, something like > 8GB of buffers.
Systems with "more than 500 dirty buffers at checkpoint time" are NOT
affected at all if they don't have a huge buffer pool.
In those systems that are affected a poll or sql listener thread on cpu
1 cannot run while the checkpoint code is collecting all dirty pages.
This lasts for maybe a few seconds with 8GB of buffers (depending on cpu
speed) and increases linearly with buffer pool size. If there is no poll
thread on cpu vp 1, connected clients are still not affected (They DO
migrate to other vps!). If there is a sql listener waiting, new
connections must wait for that time.
The problem will NOT go away by tuning LRU_MAXDIRTY, etc.It happens even
with 0 dirty buffers.
This is what happens: At checkpoint time the main_loop thread loops
through the whole buffer pool inspecting the DIRTY flag in every page's
buffer header and collects a list of dirty pages in memory which is
later used by the flushers. This is done in a single thread and without
yielding. No other thread can run on cpu 1 for that time. When the list
of dirty pages is collected the problem is over. This is BEFORE the
first flusher even starts doing anything. This is done twice per
checkpoint. There are two flushes.
This algorithm has been about like this even in old turbo (version 5)
times and it has never been a problem until recently when customers can
afford to configure buffer pools in the GB range. What used to take
milliseconds even in systems that were huge at the time can now take a
couple of seconds. It blocks cpu vp 1 and it increases checkpoint times.
The fix will go through the dirty LRU queues to collect all dirty pages
if less than 1% of all buffers are dirty. This is much faster.
Michael
We're talking about different problems Michael. Colin may indeed be seeing
the problem you describe, but the one he quotes (which is from me) is no
longer a problem because I proved its existence to the IDS development team
and they redesigned the checkpoint process. BTW, you are just wrong about
certain details, see below:
> There is a problem at checkpoint time for online systems with huge
> buffer pools, something like > 8GB of buffers.
>
> Systems with "more than 500 dirty buffers at checkpoint time" are NOT
> affected at all if they don't have a huge buffer pool.
That was a configurable threshhold (environment variable to adjust) built in
to versions through 7.30.
> In those systems that are affected a poll or sql listener thread on cpu
> 1 cannot run while the checkpoint code is collecting all dirty pages.
Exactly. This was supposed to have been fixed! Is it still there.
> This lasts for maybe a few seconds with 8GB of buffers (depending on cpu
Actually it average 1/2 of the checkpoint duration.
> speed) and increases linearly with buffer pool size. If there is no poll
> thread on cpu vp 1, connected clients are still not affected (They DO
> migrate to other vps!). If there is a sql listener waiting, new
> connections must wait for that time.
>
> The problem will NOT go away by tuning LRU_MAXDIRTY, etc.It happens even
> with 0 dirty buffers.
>
> This is what happens: At checkpoint time the main_loop thread loops
> through the whole buffer pool inspecting the DIRTY flag in every page's
Makes no sense. Each LRU has a separate DIRTY queue and CLEAN queue. The
checkpoint code only has to collect every buffer registered in a dirty queue
with a timestamp earlier than the checkpoint. The size of the total buffer
pool should not affect the checkpoint duration. If they recoded that way
they should be shot. I KNOW it was not coded that way through 7.30 because
I had hours of dicussions with the guys who were maintaining that code to
try to help them duplicate the problem.
> buffer header and collects a list of dirty pages in memory which is
> later used by the flushers. This is done in a single thread and without
> yielding. No other thread can run on cpu 1 for that time. When the list
True. I asked them to just shift the checkpoint thread to the ADM VP and
the problem goes away. Alternatively I suggested migrating all threads in
CPU VP #1 to other VPs before starting the checkpoint. Did they listen to
me? No.
> of dirty pages is collected the problem is over. This is BEFORE the
> first flusher even starts doing anything. This is done twice per
> checkpoint. There are two flushes.
Never heard of two flushes. Where is this documented?
> This algorithm has been about like this even in old turbo (version 5)
Yes, but in TURBO each user had his/her own copy of sqlturbo so when the
coordinator process (sqlinit? don't remember anymore) performed the
checkpoint no users were affected by a block.
> times and it has never been a problem until recently when customers can
> afford to configure buffer pools in the GB range. What used to take
> milliseconds even in systems that were huge at the time can now take a
> couple of seconds. It blocks cpu vp 1 and it increases checkpoint times.
>
> The fix will go through the dirty LRU queues to collect all dirty pages
> if less than 1% of all buffers are dirty. This is much faster.
>
> Michael
Art S. Kagel
>
> Colin Dawson wrote:
>
>> Whilst searching for information on DS_HASHSIZE, I found the
>> following, I went in search of the results and summary on CDI but
>> couldn't find it. Two questions arise: 1) Which versions does this
>> apply to? and 2) Does anyone know where the results are located? A
>> search on CDI archive for Checkpoint Stall returned 54 pages of
>> results!!!!!!!
>><SNIP>
HEY! I know that posting, it's mine. It was a problem in versions prior to
7.31. The checkpoint process was redesigned in response to the bug reports
I put in and the extensive test cases I supplied. FYI, Michael Mueller's
posting is incorrect. See my reply.
Art S. Kagel
> Many thanks
>
> Colin Dawson
> www.ladbrokes.com
<SNIP>
to get this straight both of us should make clear what we are talking about.
I was referring to bug 170919 (dup of 169447) fixed in 9.40.UC7 and
10.0.UC2. It occurs in all previous versions and behaves as I described it.
Can you give us the same information about your checkpoint stall problem
(especially the bug numer would help me)? I think this would answer
Colin's original question too.
About the two flushes in a checkpoint:
# get some pages dirty ...
onstat -z
onmode -c
onstat -p # shows numckpts 1, flushes 2
Flush 1 is done without waiting for writers to exit critical sections.
It may miss some dirty buffers. Flush 2 is done after all writers have
left their critical sections and is reliable. Both flushes currently go
through the whole buffer pool to collect dirty pages, which can be a
problem for huge buffer pools. Checkpoint activity (including the two
flushes) can be traced in the message log file by setting env variable
TRACEFUZZYCKPT (in 9.x) before starting oninit.
Michael