We have a highly sporadic performance problem. Sporedically - hasn;t
happened for a couple of months but has started today - our database server
starts exhibiting 2 sec checkpoints (otherwise they are almost always 0s)
and the performance of critical processes dives.
14:01:19 Checkpoint Completed: duration was 2 seconds.
14:16:21 Checkpoint Completed: duration was 2 seconds.
14:31:24 Checkpoint Completed: duration was 2 seconds.
14:46:26 Checkpoint Completed: duration was 2 seconds.
15:01:28 Checkpoint Completed: duration was 2 seconds.
15:16:31 Checkpoint Completed: duration was 2 seconds.
15:31:33 Checkpoint Completed: duration was 2 seconds.
15:46:35 Checkpoint Completed: duration was 2 seconds.
16:01:37 Checkpoint Completed: duration was 2 seconds.
16:16:40 Checkpoint Completed: duration was 2 seconds.
16:31:42 Checkpoint Completed: duration was 2 seconds.
<== HDR turned off at 1642
16:46:43 Checkpoint Completed: duration was 0 seconds.
17:01:42 Checkpoint Completed: duration was 0 seconds.
17:16:43 Checkpoint Completed: duration was 0 seconds.
17:31:43 Checkpoint Completed: duration was 0 seconds.
17:46:44 Checkpoint Completed: duration was 0 seconds.
As before, turning HDR off immediately clears the problem. There are no
obvious performance problems on either PRI or SEC server. It's all a
mystery. The common thread is that shutting off HDR fixes it instantly.
Any observations?
thx
--
Neil Truby t:01932 724027
Director m:07798 811708
Ardenta Limited e:neil....@ardenta.com
HDR's checkpoints are synchronized between the primary and the
secondary. That means the checkpoint occurs on the primary and then on
the secondary before the checkpoint on the primary is considered complete.
In v11, this is addressed by non-blocking checkpoints, which works just
fine with HDR.
>
> thx
It's not the cpoint times themselves that's the problem, it is that this
symptom is accompanied by user-facing issues of slow processing.
"Madison Pruet" <mpr...@verizon.net> wrote in message
news:46B78E32...@verizon.net...
Neil,
I would suggest taking a look at the network connection between the
servers. Like Madison says, the checkpoints are synchronous, even if
you have set up HDR as async.
HTH,
Paul M.
A minor correction. The checkpoint does not have to be totally
completed on the secondary before sending the ACK of the checkpoint back
to the primary. But - there is still a lot of work which must be done
during checkpoint processing which must be done - and the checkpoint
processing on the secondary can cause a backflow issue which can impact
the primary.
>>
>> thx
What is DRINTERVAL set to?
What sort of h/w is on the Primary as compared to the Secondary?
On the secondary server would it help at all to lower the LRUmin/max lower
than the primary server?
Would they keep the dirty pages clean and to a minium and reduce the amount
of time to flush the pages on the secondary?
Kernoal
Madison Pruet
<mpruet1@verizon.
net> To
Sent by: inform...@iiug.org
informix-list-bou cc
nc...@iiug.org
Subject
Re: Performance problem immediately
08/07/2007 11:15 fixed by disabling HDR
AM
>>
>> thx
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list
From the standpoint that it would decrease the risk of backflow - yes.
> What is DRINTERVAL set to?
30
> What sort of h/w is on the Primary as compared to the Secondary?
The primary is a dogs' bollocks 16-core IBM p570.
The secondary is a 4-core IBM p510Q
There's an IBM call open for this now, btw, 33150,019,866.
Try pinging between servers to check for any TCP/NET issues..
Regards
--
Fernando Nunes
Portugal
http://informix-technology.blogspot.com
My email works... but I don't check it frequently...
I can give you three scenarios, where you can see increased
checkpoints in HDR pair:
1. (I saw this many times) Disk array on Secondary switches
from write-back to write-through caching mode
(e.g. one PSU fails in the array)
This immediately shows up as increased HDR checkpoints
in a high-load OLTP environment
2. You are running some write-intense application on Primary
(batch job or OLTP) and read-intense application on Secondary.
In this case, Secondary quickly goes behind Primary, just because
it need to retrieve some data from disk, that Primary has cached.
Long HDR checkpoint becomes inevitable
3. Almost same, as #2, with asymmetric configuration:
either disk array on Secondary is slower, then on Primary, or
Secondary has a smaller Informix cache.
The rule of a thumb to avoid long checkpoints in HDR pair:
SECONDARY MUST BE MORE POWERFUL THEN PRIMARY -
more memory and more powerful disk subsystem,
CPU power is not that important
(unless you run out of CPU resources on Secondary)
-Alexey
"Alexey Sonkin" <ale...@cidc.com> wrote in message
news:mailman.563.118654180...@iiug.org...
I think we did this before when we had the problem (it's gone away now) and
the times were sub-millisecond (the serves are rght next to one another).
We've had it again this evening. Before disabling HDR- which again
instantly cured the problem - I pinged from Pri->Sec - 0ms constantly.
I also did onstat -a and onstat -g all on both servers before disabling if
anyone's interested ......
>
> We've had it again this evening. Before disabling HDR- which again
> instantly cured the problem - I pinged from Pri->Sec - 0ms constantly.
>
> I also did onstat -a and onstat -g all on both servers before disabling if
> anyone's interested ......
>
>
Neil,
Open a case on this and attach the onstats to the case.
Thx
M.P.
Hi Madison
Thanks
There is a case, 33150,019,866.
However, other than by emailing it to UK support, who have gone home, I
can't get the stats attached to it I don't think.
Use the ESR tool on the IBM Website!
I find it very difficult. I guess it's easy if, like you, you have a single
support contract. My IBM ID has access to 20.
IBM Tech Support demand the ICN when you place a call, but the ESR tool uses
a different number, the PPA Site Number. So I'm presented with a list of
multiple Site Numbers which mean nothing to me, and clicking on any of them
them gives no information at all to identify which customer it is!
So I avoid ESR. It's far, far easier to use email or phone to raise calls.
Having said all that, I did find out this customer's number and was able to
upload the files by ESR!