Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Performance problem immediately fixed by disabling HDR

131 views
Skip to first unread message

Neil Truby

unread,
Aug 6, 2007, 4:52:46 PM8/6/07
to
IDS 10.0FC5 on AIX 5.3

We have a highly sporadic performance problem. Sporedically - hasn;t
happened for a couple of months but has started today - our database server
starts exhibiting 2 sec checkpoints (otherwise they are almost always 0s)
and the performance of critical processes dives.

14:01:19 Checkpoint Completed: duration was 2 seconds.
14:16:21 Checkpoint Completed: duration was 2 seconds.
14:31:24 Checkpoint Completed: duration was 2 seconds.
14:46:26 Checkpoint Completed: duration was 2 seconds.
15:01:28 Checkpoint Completed: duration was 2 seconds.
15:16:31 Checkpoint Completed: duration was 2 seconds.
15:31:33 Checkpoint Completed: duration was 2 seconds.
15:46:35 Checkpoint Completed: duration was 2 seconds.
16:01:37 Checkpoint Completed: duration was 2 seconds.
16:16:40 Checkpoint Completed: duration was 2 seconds.
16:31:42 Checkpoint Completed: duration was 2 seconds.

<== HDR turned off at 1642
16:46:43 Checkpoint Completed: duration was 0 seconds.
17:01:42 Checkpoint Completed: duration was 0 seconds.
17:16:43 Checkpoint Completed: duration was 0 seconds.
17:31:43 Checkpoint Completed: duration was 0 seconds.
17:46:44 Checkpoint Completed: duration was 0 seconds.


As before, turning HDR off immediately clears the problem. There are no
obvious performance problems on either PRI or SEC server. It's all a
mystery. The common thread is that shutting off HDR fixes it instantly.

Any observations?

thx
--
Neil Truby t:01932 724027
Director m:07798 811708
Ardenta Limited e:neil....@ardenta.com


Madison Pruet

unread,
Aug 6, 2007, 5:09:59 PM8/6/07
to

HDR's checkpoints are synchronized between the primary and the
secondary. That means the checkpoint occurs on the primary and then on
the secondary before the checkpoint on the primary is considered complete.

In v11, this is addressed by non-blocking checkpoints, which works just
fine with HDR.
>
> thx

Neil Truby

unread,
Aug 6, 2007, 5:36:51 PM8/6/07
to
The checkpoint times are almost invariable 0 seconds at all other times.
When the problem was occurring on Sat 6th June , I could do an onmode -c
every few seconds and it would report 2s:
14:31:35 Checkpoint Completed: duration was 2 seconds.
14:31:35 Checkpoint loguniq 23582, logpos 0x1a16018, timestamp: 0x6718fb
14:31:35 Maximum server connections 991
14:34:09 Logical Log 23582 Complete, timestamp: 0x6ef5ad.
14:34:09 Logical Log 23582 - Backup Started
14:34:11 Logical Log 23582 - Backup Completed
14:36:21 Checkpoint Completed: duration was 2 seconds.
14:36:21 Checkpoint loguniq 23583, logpos 0x1435018, timestamp: 0x753693
14:36:21 Maximum server connections 991
14:36:38 Checkpoint Completed: duration was 2 seconds.
14:36:38 Checkpoint loguniq 23583, logpos 0x16ca018, timestamp: 0x7640b8
14:36:38 Maximum server connections 991
14:36:58 Checkpoint Completed: duration was 2 seconds.

It's not the cpoint times themselves that's the problem, it is that this
symptom is accompanied by user-facing issues of slow processing.


"Madison Pruet" <mpr...@verizon.net> wrote in message
news:46B78E32...@verizon.net...

mos...@wellsfargo.com

unread,
Aug 6, 2007, 5:36:39 PM8/6/07
to inform...@iiug.org, neil....@ardenta.com

Neil,

I would suggest taking a look at the network connection between the
servers. Like Madison says, the checkpoints are synchronous, even if
you have set up HDR as async.

HTH,
Paul M.

Madison Pruet

unread,
Aug 7, 2007, 12:15:44 PM8/7/07
to

A minor correction. The checkpoint does not have to be totally
completed on the secondary before sending the ACK of the checkpoint back
to the primary. But - there is still a lot of work which must be done
during checkpoint processing which must be done - and the checkpoint
processing on the secondary can cause a backflow issue which can impact
the primary.

>>
>> thx

TBP (The Big Potato)

unread,
Aug 7, 2007, 12:18:22 PM8/7/07
to

What is DRINTERVAL set to?

What sort of h/w is on the Primary as compared to the Secondary?

kernoal....@autozone.com

unread,
Aug 7, 2007, 1:47:53 PM8/7/07
to Madison Pruet, informix-l...@iiug.org, inform...@iiug.org

Madison,

On the secondary server would it help at all to lower the LRUmin/max lower
than the primary server?
Would they keep the dirty pages clean and to a minium and reduce the amount
of time to flush the pages on the secondary?


Kernoal


Madison Pruet
<mpruet1@verizon.
net> To
Sent by: inform...@iiug.org
informix-list-bou cc
nc...@iiug.org
Subject
Re: Performance problem immediately
08/07/2007 11:15 fixed by disabling HDR
AM




>>
>> thx
_______________________________________________
Informix-list mailing list
Inform...@iiug.org
http://www.iiug.org/mailman/listinfo/informix-list


Madison Pruet

unread,
Aug 7, 2007, 1:53:31 PM8/7/07
to
kernoal....@autozone.com wrote:
> Madison,
>
> On the secondary server would it help at all to lower the LRUmin/max lower
> than the primary server?
> Would they keep the dirty pages clean and to a minium and reduce the amount
> of time to flush the pages on the secondary?

From the standpoint that it would decrease the risk of backflow - yes.

Neil Truby

unread,
Aug 7, 2007, 2:01:33 PM8/7/07
to
"TBP (The Big Potato)" <T...@NotHere.Co.Uk> wrote in message
news:iX0ui.11814$Db6....@newsfe3-win.ntli.net...
> mos...@wellsfargo.com wrote:

> What is DRINTERVAL set to?

30

> What sort of h/w is on the Primary as compared to the Secondary?

The primary is a dogs' bollocks 16-core IBM p570.
The secondary is a 4-core IBM p510Q

There's an IBM call open for this now, btw, 33150,019,866.


Fernando Nunes

unread,
Aug 7, 2007, 6:58:48 PM8/7/07
to

Try pinging between servers to check for any TCP/NET issues..
Regards

--
Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

Alexey Sonkin

unread,
Aug 7, 2007, 10:56:33 PM8/7/07
to Neil Truby, inform...@iiug.org
Neil,

I can give you three scenarios, where you can see increased
checkpoints in HDR pair:

1. (I saw this many times) Disk array on Secondary switches
from write-back to write-through caching mode
(e.g. one PSU fails in the array)
This immediately shows up as increased HDR checkpoints
in a high-load OLTP environment

2. You are running some write-intense application on Primary
(batch job or OLTP) and read-intense application on Secondary.
In this case, Secondary quickly goes behind Primary, just because
it need to retrieve some data from disk, that Primary has cached.
Long HDR checkpoint becomes inevitable

3. Almost same, as #2, with asymmetric configuration:
either disk array on Secondary is slower, then on Primary, or
Secondary has a smaller Informix cache.

The rule of a thumb to avoid long checkpoints in HDR pair:

SECONDARY MUST BE MORE POWERFUL THEN PRIMARY -
more memory and more powerful disk subsystem,
CPU power is not that important
(unless you run out of CPU resources on Secondary)

-Alexey

Neil Truby

unread,
Aug 7, 2007, 11:37:33 PM8/7/07
to
Thanks for your interest, Alexy.
The strange thing here is that although application processes are slowed
down, the only symptom we see - 2s checkpoints - is really not all that bad
in the scale of things. The checkpoints are also 0,1 or 2 on the (much less
powerful) secondary.
So 2s checkpoints are hardly a matter of concern, except in the context of
theie normal 0s, but the coindental downturn is some application performance
really hits them hard.
We turned HDR off on Monday evening and the performance immediately
improved - put it back on Tue morning and the problem hasn't recurred!

"Alexey Sonkin" <ale...@cidc.com> wrote in message
news:mailman.563.118654180...@iiug.org...

Neil Truby

unread,
Aug 7, 2007, 11:38:45 PM8/7/07
to
"Fernando Nunes" <sp...@domus.online.pt> wrote in message
news:f9atdb$7ev$2...@aioe.org...

> Try pinging between servers to check for any TCP/NET issues..

I think we did this before when we had the problem (it's gone away now) and
the times were sub-millisecond (the serves are rght next to one another).


Neil Truby

unread,
Aug 8, 2007, 3:22:49 PM8/8/07
to
"Fernando Nunes" <sp...@domus.online.pt> wrote in message
news:f9atdb$7ev$2...@aioe.org...

We've had it again this evening. Before disabling HDR- which again
instantly cured the problem - I pinged from Pri->Sec - 0ms constantly.

I also did onstat -a and onstat -g all on both servers before disabling if
anyone's interested ......


Madison Pruet

unread,
Aug 8, 2007, 4:41:46 PM8/8/07
to
Neil Truby wrote:

>
> We've had it again this evening. Before disabling HDR- which again
> instantly cured the problem - I pinged from Pri->Sec - 0ms constantly.
>
> I also did onstat -a and onstat -g all on both servers before disabling if
> anyone's interested ......
>
>

Neil,

Open a case on this and attach the onstats to the case.

Thx

M.P.

Neil Truby

unread,
Aug 8, 2007, 4:54:57 PM8/8/07
to

"Madison Pruet" <mpr...@verizon.net> wrote in message
news:46BA2A8...@verizon.net...

Hi Madison
Thanks
There is a case, 33150,019,866.
However, other than by emailing it to UK support, who have gone home, I
can't get the stats attached to it I don't think.


da...@smooth1.co.uk

unread,
Aug 9, 2007, 4:57:59 PM8/9/07
to
On 8 Aug, 21:54, "Neil Truby" <neil.tr...@ardenta.com> wrote:
> "Madison Pruet" <mpru...@verizon.net> wrote in message

Use the ESR tool on the IBM Website!

Neil Truby

unread,
Aug 9, 2007, 5:54:58 PM8/9/07
to

<da...@smooth1.co.uk> wrote in message
news:1186693079.3...@d30g2000prg.googlegroups.com...

> On 8 Aug, 21:54, "Neil Truby" <neil.tr...@ardenta.com> wrote:
>> "Madison Pruet" <mpru...@verizon.net> wrote in message
>> There is a case, 33150,019,866.
>> However, other than by emailing it to UK support, who have gone home, I
>> can't get the stats attached to it I don't think.
>
> Use the ESR tool on the IBM Website!

I find it very difficult. I guess it's easy if, like you, you have a single
support contract. My IBM ID has access to 20.

IBM Tech Support demand the ICN when you place a call, but the ESR tool uses
a different number, the PPA Site Number. So I'm presented with a list of
multiple Site Numbers which mean nothing to me, and clicking on any of them
them gives no information at all to identify which customer it is!

So I avoid ESR. It's far, far easier to use email or phone to raise calls.

Having said all that, I did find out this customer's number and was able to
upload the files by ESR!

0 new messages