On SCO 3.2v5.0.5 running on a Compaq Prosignia 500 for several years.
Recently (about 5-6 weeks ago), machine started rebooting on its own at
the same time in the morning, about 3 or 4 times a week.
Anyone else had this happen, or hear of it happening?
Checked all entries of "ps -ef" before the reboots and find nothing out
of sorts. Looked at all rc.d and rc2.d files and find nothing amiss.
Got alot of head scratching going on here.
Any ideas?
Thanks for your wisdom.
Archie Coffman
--
A Coffman
Posted via http://dbforums.com
Is there a cleaning crew in the office at that time of the morning?
--
JP
No, and besides, the timing of the reboots is precisely the same moment
of the day (7:08am). I have watched it occur and it happens without any
prelude or warning of a pending reboot. The single cleaning person we
have varies her routine.
Your reply does reinforce my thinking that this is probably sabotage of
some sort. I've changed root passwords several times, but a script file
could have been planted. There just isn't any trace of one.
Thanks for your help, JP.
Archie
So what's in any crontab that runs at 07:08?
--
JP
When you say a reboot, exactly what is it that happens? Does it do a clean
shutdown and run the scripts in /etc/rc0.d?
In any case, a script that does something like this should be fairly easy to
find. Look for stuff in your crontabs (or even inittab) that doesn't
belong.
Is the server attached to a battery backup device? If so, has it
accidentally been set to do a self-test at 7:08 in the morning? If not, are
you 100% sure that the power is not blinking out for a fraction of a second?
(All things I've encountered at other clients...)
Good luck,
Fabio
If you think some process is doing this (as opposed to a hardware
reboot) see "How do I find out who or what halted my system?" at
http://aplawrence.com/SCOFAQ/scotec1.html#haltcatch
(appended here for your convenience)
First, look in crontab for a call to haltsys or init. Someone may have
added this for silly reasons.
If you think some privileged user or process has run /etc/haltsys, add
these lines to it right after the PATH= line
{
echo $0 `tty` `id`
MYPROC=$$
NEXTPROC=$MYPROC
while [ $NEXTPROC != 0 ]
do
ps -lp $NEXTPROC
MYPROC=$NEXTPROC
NEXTPROC=`ps -p $MYPROC -o "ppid=" `
done
} | mail -s "haltsys was run" root
This will give you a full trace of where it was called from. You can use
a similar technique with /etc/shutdown.
You might also write a "K" script and put it in /etc/rc0.d.
Unfortunately, by that time there isn't as much information to glean
from the system. Adding to /etc/rc0 doesn't gain you much either, but at
least you know it was not a crash and you *might* still see a suspect
process in a ps listing.
If your only concern is when the system went down,
who -a /etc/wtmp | grep uadmin
will give you that. Note that on "out of the box" systems, the
information in /etc/wtmp is cleared out weekly by a cron job that runs
/etc/cleanup; you may want to adjust this script if you need longer records.
Jeff Hyman tells me that the old 3.2v4.2 "last" included shutdown
information, so
last | grep shutdown
would work on those releases. It doesn't on OSR5.
--
Please note new phone number: (781) 784-7547
Tony Lawrence
Unix/Linux Support Tips, How-To's, Tests and more: http://aplawrence.com
Free Unix/Linux Consultants list: http://aplawrence.com/consultants.html
Correct on the assumption. All cron entries *appear* normal. I've just
had an idea to apply subtle changes to some of the starting times for
cron entries to see if the timing of the reboots changes accordingly. If
so, then I'll try to narrow down the offender. I'm not holding out much
hope, though.
All daemons are normal to the system
Thanks, marshallbrown.
are there any peculiar daemons running?
--
Posted via http://dbforums.com
I had that happend before on my HP tc4100 server, I flashed the BIOS and it
stopped from rebooting, the problem, in my situation, was that some process
,that ran at the time it normally rebooted , were so cpu consuming or used
too many resources of which one of those makes the server reboot due to the
corrupted BIOS (that got corrupted in some unknow way) . But in my case it
was rebooting at the same time period +/- 30 minutes.
If you suspect something in cron/at may be causing the problem
you can stop cron for a while, say a 6:58 in the morning and
see if you still have the reboot.
Mike
--
Michael Brown
The Kingsway Group
Are you running any schedule cron jobs around that time, e.g. backup, etc..
Abid
>Originally posted by Jean-Pierre Radley
>> alcthree typed (on Mon, Dec 02, 2002 at 01:56:28PM +0000):
>> | On SCO 3.2v5.0.5 running on a Compaq Prosignia 500 for several years.
>> | Recently (about 5-6 weeks ago), machine started rebooting on its own at
>> | the same time in the morning, about 3 or 4 times a week.
>>
>> Is there a cleaning crew in the office at that time of the morning?
>No, and besides, the timing of the reboots is precisely the same
>moment of the day (7:08am).
Turn the server's clock 2 minute ahead (or before as you like ;-))
Watch if the reboot still occurs a 7:08 (make sure that
there is no clock synchronizing issued while that test is run.)
If the time is still 7:08 it's quite sure on that box itself:
Inspect _all_ crontabs in your system for a program
beeing started before and at that time. (Would not
be a bad idea at all as first attempt.).
If it occurs now at 7:06 an other server or device
(big machine beeing powered on) may be the cause.
Activate "sar", edit it's crontab to make it run arround the clock.
Inspect the sar results after the reboot. (Memory?).
If the server must not be available 24h a day, get a cheap clock switch
and turn of the ethernet hub between 6:00 and 8:00 or so
so it not available to other computers.
If it still reboots, no cron ist running, "sar" says "all is well"
that might be problems in the main power (big machine powered on somewhere).
Get an APC UPS (what is not bad idea at all, IMHO.
Those 300 bucks are very well invested.) or lent one for a fourtnight.
Install the apcd log demon and watch the log file for "brown outs" etc.
If the server must not be available 24h, do an reboot by cron
at say 3:00am. (If an process runs out of memory that could
cause the reboot to disappear or (less likely) to appear
on the main work time...)
Configure the bios(?) in a way that it stops the machine and
does not do an reboot so you have a (slight) chance to read
"last famous words" in the console...
Configure BIOS+SCO to use the serial port as console, not the VGA.
Connect an PC to the serial port and use a terminal emulation
which allows "logging to a file".
>I have watched it occur and it happens
>without any prelude or warning of a pending reboot.
Activate cron logging
>Your reply does reinforce my thinking that this is probably sabotage
>of some sort.
No ;-). I would assume he meant your cleaning personal
a) unplugs the server to connect their hoover/vacuum cleaner...(been there...)
b) that devices they use introduces electrical smog to your server
c) your server have (meanwhile) bad contacts so the vibrations
may cause the reboot. (Take out all RAM stripes, PCI cades and
reinsert them).
> If you think some process is doing this (as opposed to a hardware
> reboot) see "How do I find out who or what halted my system?" at
> http://aplawrence.com/SCOFAQ/scotec1.html#haltcatch
>
> (appended here for your convenience)
>
> First, look in crontab for a call to haltsys or init. Someone may have
> added this for silly reasons.
>
> If you think some privileged user or process has run /etc/haltsys, add
> these lines to it right after the PATH= line
>
> {
> echo $0 `tty` `id`
> MYPROC=$$
> NEXTPROC=$MYPROC
> while [ $NEXTPROC != 0 ]
> do
> ps -lp $NEXTPROC
> MYPROC=$NEXTPROC
> NEXTPROC=`ps -p $MYPROC -o "ppid=" `
> done
> } | mail -s "haltsys was run" root
Change this to:
} | mail -s "$0 $@ was run" root
sync
sleep 5
sync
The sync and pause routine is necessary because mail delivery can take a
while (especially if you've installed spamassassin ;-), you don't want
to fire off mail when you know the very next thing you're doing is
shutting down.
> This will give you a full trace of where it was called from. You can use
> a similar technique with /etc/shutdown.
This is true enough, but misses /etc/reboot as well as /etc/uadmin.
_All_ of the SCO-provided shutdown techniques (init [056], shutdown,
haltsys, reboot) eventually funnel through /etc/uadmin. So the best way
to do this is to move /etc/uadmin to /etc/uadmin.real and use the above
script bit as /etc/uadmin, ending it with:
exec /etc/uadmin.real "$@"
The `ps` chain is cute, but unnecessary -- better to give `ps -elf`
output and let the reader figure out the chaining. The actual cause of
shutdown might not be in the parenthood of the process doing the
shutdown. (e.g. if someone ran `sd shutdown`.) So the entire script
can be reduced to:
#!/bin/sh
{
echo Process $$, on tty `tty`, user `id`, ran:
echo " $0 $@"
echo
ps -elf
} | mail -s "uadmin was run" root
sync; sleep 5; sync
exec /etc/uadmin.real "$@" # "real" /etc/uadmin was renamed /etc/uadmin.real
>Bela<
Thanks for the information, Fabio. You would think that if this were
caused by a script, it would stick out like a sore thumb. The
back-up self test idea is a new angle, and a true possibility. I'll
check that out.
Thanks again.
Archie
Just for fun... have cron change the time at 7:07 to 7:09 and then at
7:10 change it to 7:09.
perhaps a file that runs regularly throughout the day was rewritten as a
script to check for time and reboot at 7:08 am... and if not just run a
renamed version of the original file that was replaced.
I don't know... couldn't that be done with getty?
anyhow.... just a loose straw to grab at.
have you tried verifying files in SCOADMIN?
Reboot is the same file as haltsys, is it not?
>
> _All_ of the SCO-provided shutdown techniques (init [056], shutdown,
> haltsys, reboot) eventually funnel through /etc/uadmin. So the best way
> to do this is to move /etc/uadmin to /etc/uadmin.real and use the above
> script bit as /etc/uadmin, ending it with:
>
> exec /etc/uadmin.real "$@"
Agreed - a better place and now so noted in the faq.
>
> The `ps` chain is cute, but unnecessary -- better to give `ps -elf`
> output and let the reader figure out the chaining.
Perhaps. It can be a lot to look through though..
> The actual cause of
> shutdown might not be in the parenthood of the process doing the
> shutdown. (e.g. if someone ran `sd shutdown`.) So the entire script
> can be reduced to:
>
> #!/bin/sh
> {
> echo Process $$, on tty `tty`, user `id`, ran:
> echo " $0 $@"
> echo
> ps -elf
> } | mail -s "uadmin was run" root
> sync; sleep 5; sync
> exec /etc/uadmin.real "$@" # "real" /etc/uadmin was renamed /etc/uadmin.real
Good points, faq updated.
>If you suspect something in cron/at may be causing the problem
>you can stop cron for a while,
nice trick.
how to do?
What happens with jobs?
Will they be done "en block" or "forgotten" when cron/at is resumed?
maybe a user's cron is running it?
permissions on init and reboot-able files were checked, eh?
Tony(FAQ)>>> } | mail -s "haltsys was run" root
Bela>> This is true enough, but misses /etc/reboot as well as /etc/uadmin.
Tony> Reboot is the same file as haltsys, is it not?
Yes, but the mail it sent would be wrong...
>Bela<
I had a server than ran perfect for more than two years, a 5.0.5. It
started rebooting at night, then time and frequency got worse over a month
or so until it reboot a couple of times a day. I tried a million
possibilities, none worked.
Then one day, while calculating how long it would take, in miliseconds, for
me to hit the street below (:), I decided to install oss497c. I have not
revisited or completed that cacluation since. It mysteriously stopped
rebooting.
>On SCO 3.2v5.0.5 running on a Compaq Prosignia 500 for several years.
>Recently (about 5-6 weeks ago), machine started rebooting on its own at
>the same time in the morning, about 3 or 4 times a week.
>Anyone else had this happen, or hear of it happening?
And if after all the other suggestions have been tried and if they
did not cure it - take an inventory of your surroundings.
Has any new AC equipment been installed in the building. If you
are in a multiple-occupancy check with building maintenance to see
if any major changes have been made, such as new AC, new heating,
any power consuming devices at all.
Look around your building and see if you spot something different
such as a brand new cell phone tower.
And just because you have a UPS on the computer that wont stop
reboots caused by attached equipment if they aren't properly
connected, setup or protected. Terminals used to be prime
candidates, but since theya rapidly disappearing check for things
such as printers. Then check also that any device directly
connected is on the same leg of the power. If not they could be
getting something in form another power consuming device on the
other leg and just passing it through.
Since you didn't not specify the type of building but you said it
was always at the same time, it make me think of timer oriented
devices in a multiple occupancy unit for such things as heaters,
AC, or other equipment that would be timed to start about an hour
before other users arrive.
Lots of luck. If it's external it may take some time to track it
down.
Bill
--
Bill Vermillion - bv @ wjv . com
Someone mentioned the battery backup, but I was skeptical because we
aren't using the "Smart" option. At any rate, I peeked in back of
the unit and found that the alarm had been disabled (probably by me
years ago). At 7:07am, the UPS flickered and the Unix box rebooted.
Mystery solved.
I'm still looking into why this is occuring in precise 24 hour cycles,
but the condition of the battery has been gradually degrading. This is
why, until recently, it had only been occuring once or twice a week (the
battery kept the machine running). Reviewing the APC manual, nothing is
mentioned about a "self-test" but I'll test that and monitor the
electrical outlet as well.
For now, the machine is connected to a newer backup and another power
source and we are happy again.
Thanks to everyone for their input. I can only hope to oneday have
enough systems knowledge to make a similar contribution.
I seem to remember (from an old reading on the group) that someone
witnessed the very same behaviour (unattended reboots at a given time)
and that was due to a "glitch" (sp?) on the power line.
In fact, in a nearby room/office, someone (at the same time) switched on
a very power-consuming device or a device which, for some reason, caused
some electrical problems on the line (I'm not an engineer so the above
could not offer the real cause, only some ideas to crunch :-).
The solution was like yours; replace the UPS (or connect the server to
another electrical line) and that was the trick.
Best,
Roberto
--
---------------------------------------------------------------------
Roberto Zini email : r.z...@strhold.it
Technical Support Manager -- Strhold Evolution Division R.E. (ITALY)
---------------------------------------------------------------------
"Has anybody around here seen an aircraft carrier?"
(Pete "Maverick" Mitchell - Top Gun)
The funniest one I had was a server that "mysteriously" rebooted every
now and then. We had looked at everything, but it kept happening. One
day I was working with the Sysadmin and while we were doing somethiung
else, the UPS chirped, indicating a momentary power dip. The sysadmin
looked at me and said
"You know, every time that darn thing chirps, it's about ten minutes
later that the system reboots."
Duh :-)
So we replaced the UPS and stopped the reboots. I'm not sure what was
going on but I suspect that when it switched to battery something went
wrong when it wanted to switch back and it would drop power to the
machine entirely long enough to crash it. By the time the machine
recovered, it would be back on wall power just fine. Until the next
brownout.
--
Tony Lawrence
Unix/Linux Support Tips, How-To's, Tests and more: http://aplawrence.com
Free Linux Skills Test: ftp://aplawrence.com/pub/linuxquestions.zip