Corrupted sar entries after power cycle of hung nodes

2 views
Skip to first unread message

Joseph Dowell

unread,
Jul 31, 2008, 7:14:51 PM7/31/08
to lnx...@googlegroups.com

 

Periodically we’ve seen nodes hang in a way that they can only be restored to use via an ungraceful power cycle (pm -0 node)…

 

When this happens we see a corrupted sar entry after the node reboots and usually a corrupted and bogus Average entry too (see below).

 

Does anyone know how we can reset the sar counters on startup before cron and accounting run to avoid the corrupt sar entries?

 

Thanks,

Joe

 

00:00:01          CPU     %user     %nice   %system   %iowait     %idle

17:20:01          all      0.09      0.00      0.10      0.20     99.61

17:40:01          all      0.09      0.00      0.10      0.29     99.52

18:00:01          all    106.59    106.59    106.59    106.58      0.00

Average:          all    105.93    105.93    105.93    105.92      0.00

 

Joshua Aune

unread,
Jul 31, 2008, 7:22:22 PM7/31/08
to lnx...@googlegroups.com
Hi Joe,

When I see this generally mens that a users process ran away,
frequently into swap. I suspect the %usages are real (though off a
little bit).

Josh

Joseph Dowell

unread,
Aug 1, 2008, 10:28:08 AM8/1/08
to lnx...@googlegroups.com

Thanks Josh. I think we may be talking about two different situations.

What I am asking about is the bogus sar data reflected after a node has
been power cycled to restore it to use. Power cycled without the benefit
of a clean shutdown prior to the power cycle that is...

We know that the %user logged immediately after the power cycle is not
correct.

What we are interested in doing is resetting the sar counters upon
startup immediately following the power cycle to restore the sar
information to a sane state so only accurate data is reflected w/the sar
command.

I've tried a /etc/rc.d.rc3.d script that runs a '/usr/sbin/sadc -'
function prior to cron or accounting being started (it's run in a S07
script) but that did not do the job.

Thanks,
Joe


-----Original Message-----
From: lnx...@googlegroups.com [mailto:lnx...@googlegroups.com] On Behalf
Of Joshua Aune
Sent: Thursday, July 31, 2008 4:22 PM
To: lnx...@googlegroups.com
Subject: [lnxiug] Re: Corrupted sar entries after power cycle of hung
nodes


Hi Joe,

When I see this generally mens that a users process ran away,
frequently into swap. I suspect the %usages are real (though off a
little bit).

Josh


On Jul 31, 2008, at 5:14 PM, Joseph Dowell wrote:

>
> Periodically we've seen nodes hang in a way that they can only be
> restored to use via an ungraceful power cycle (pm -0 node)...
Reply all
Reply to author
Forward
0 new messages