Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Daily shutdown of our alpha server 800

17 views
Skip to first unread message

Dup

unread,
Jan 13, 2009, 3:58:34 AM1/13/09
to
Hi,

I'm new to newsgroup, but i need some help. At work we have a digital
alphaserver 800 and each day this server is powering off. In fact each
24hours plus 4/5 (time for machine reboot).

There is no unix log that helps me to find the trouble, but at boot,
alphaserver bios phase display that dke100 has sense key not ready, i
think this is the time for harddisk to be up.

16:35.34 failed to send Start Unit to dke100.1.0.5.0
16:35.34 sense key = 'Not Ready' (04|02) from dke100.1.0.5.0
16:35.34 sense key = 'Not Ready' (04|03) from dke100.1.0

Between a startup and hot reset there is nothing in /usr/adm/messages log.

Is there a way to determine what cause this behaviour ?

Sorry for my bad english, i'm french.

H Vlems

unread,
Jan 13, 2009, 5:46:20 AM1/13/09
to

What version of unix do you run?
It looks as if one disk fails and somehow this causes the system to
fail.
You write "is powering off". Do you mean that the system is actually
without mains power, no fans, no lights?
Or is the system in console mode, you can still type commands on the
>>> prompt?

How do you restart the system?
- press a button, or
- type: b -fl a

Hans

Dup

unread,
Jan 13, 2009, 8:21:47 AM1/13/09
to
Le Tue, 13 Jan 2009 02:46:20 -0800, H Vlems a écrit :

> On 13 jan, 09:58, Dup <david.du...@groupe3a.fr> wrote:
>> Hi,
>>
>> I'm new to newsgroup, but i need some help. At work we have a digital
>> alphaserver 800 and each day this server is powering off. In fact each
>> 24hours plus 4/5 (time for machine reboot).
>>
>> There is no unix log that helps me to find the trouble, but at boot,
>> alphaserver bios phase display that dke100 has sense key not ready, i
>> think this is the time for harddisk to be up.
>>
>>     16:35.34 failed to send Start Unit to dke100.1.0.5.0 16:35.34
>>     sense key = 'Not Ready' (04|02) from dke100.1.0.5.0 16:35.34
>>     sense key = 'Not Ready' (04|03) from dke100.1.0
>>
>> Between a startup and hot reset there is nothing in /usr/adm/messages
>> log.
>>
>> Is there a way to determine what cause this behaviour ?
>>
>> Sorry for my bad english, i'm french.
>
> What version of unix do you run?

Version of unix i run is : OSF1 V4.0 878 alpha

> It looks as if one disk fails and somehow this causes the system to
> fail.

Yes i agree it looks like a HDD failure but what disturb me is that it
appears each 24 hours.

> You write "is powering off". Do you mean that the system is actually
> without mains power, no fans, no lights? Or is the system in console
> mode, you can still type commands on the
>>>> prompt?

Sorry, i write a mistake, system doesn't poweroff but reboot immediately
(no clean reboot). It seems to be a harddisk failure but

>
> How do you restart the system?
> - press a button, or
> - type: b -fl a

As said just before system restart and recover automatically

>
> Hans

Sorry for saying something bad and thank to you.

David Dupin

unread,
Jan 13, 2009, 8:49:19 AM1/13/09
to
Dup a écrit :


Second mistake, they give me a documentation of alphaserver 800 but what
we have is an alphaserver 4000.
In log i get lot error about SCSI but this is logged after system reboot.

A processor interrupt was generated by the
CACHEA Dynamic Ram controller and
ArBitration engine (DRAB) with an
indication that the CACHE backup battery
has failed or is low (needs charging).


Someone know if it can be a battery problem ?

H Vlems

unread,
Jan 14, 2009, 11:17:35 AM1/14/09
to
> Someone know if it can be a battery problem ?- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

David,
I doubt it is a battery problem because once the system runs it has no
need for the battery.
In another port you wrote that the system is an AS 4000, not an 800.
That was not a problem but it solved the mystery
of the DKE device name. An AS 800 has 4 internal disks and not many
PCI slots to put a SCSI controller in.
It is rather rare for an AS800 to have three SCSI controllers, 5 is
definitely a lot. On a 4000 it is quite possible.
If the system shuts down and reboots then the failure of the disk
somehow affects Unix. So I'd guess that the DKE100 disk
holds data or datastructures of unix, like a pagefile or so.
It is rather odd that this happens every 24 hours. Does that mean it
happens at the same time as well?
And is something happening on the system on that time, like a big
batch job that starts?
Perhaps that job uses DKE100, or it uses a lot of memory which causes
excessive paging to that disk.

On v5.0 there is a utility called sysman station and this shows the
mounted filesystems and the physical disks that
are part of the filesystem. Can you find out to what filesystem DKE100
belongs?
Hans

Dennis Grevenstein

unread,
Jan 14, 2009, 12:29:58 PM1/14/09
to
H Vlems <hvl...@freenet.de> wrote:
>
> I doubt it is a battery problem because once the system runs it has no
> need for the battery.

Many RAID controllers have a battery to back up a write cache.
This is necessary to ensure file system integrity in case of
a power failure.
Some RAID systems refuse to work if their battery is empty,
but I don't know much about the RAID controllers used in
Alphaservers.

Dennis

--
Don't suffer from insanity...
Enjoy every minute of it.

H Vlems

unread,
Jan 15, 2009, 3:18:32 AM1/15/09
to
On 14 jan, 18:29, Dennis Grevenstein <dennis.grevenst...@gmail.com>
wrote:

Dennis, you are correct but the OP reports just one disk that logs
errors.
With an empty battery in a RAID array I'd have expected several disks
to fail.
So my initial response was "bad disk" not "flat battery".
Hans

David Dupin

unread,
Jan 15, 2009, 4:12:40 AM1/15/09
to
H Vlems a écrit :

There is no specific job launched at this time, no crontab too, this is
why i don't understand where it's come from. Our maintainer will come
this afternoon to help detect this problem (if its an hardware problem).

If it was a disk failure, i hope there will be some unix logs and i see
nothing (/usr/adm/messages).

I confirm that dke100 message appears at boot, its like a waiting for
harddisk to be up.

H Vlems

unread,
Jan 16, 2009, 8:53:19 AM1/16/09
to
> harddisk to be up.- Tekst uit oorspronkelijk bericht niet weergeven -

>
> - Tekst uit oorspronkelijk bericht weergeven -

Is DKE100 part of an Avanced Filesystem set?

David Dupin

unread,
Jan 16, 2009, 11:22:28 AM1/16/09
to
Hi,

Yesterday maintainer change our battery and it does actually 1 day and
30mn of uptime, so it seems that battery's change correct our restart
problem.

I ll keep an eye on it.

Kari Uusimäki

unread,
Jan 17, 2009, 1:58:37 PM1/17/09
to


Hello David,

I suggest you get aquainted with Tru64 unix. Many things differ from
other unices, especially the system management.

The online documentation is found at:
http://h30097.www3.hp.com/docs/pub_page/V40G_DOCS/V40G_DOCLIST.HTM

The V4.0G documentation is sufficient for all the V4.x versions

A few notes about finding error information on a Tru64 system.

The syslog, where you can find OS logged information is in the different
logfiles found in /var/adm/syslog.dated/

The binary (mostly hardware) errors are logged in the binary errorlog
which is not a text file and therefore needs to be read with a special
utility called uerf. Its documentation is found at:
http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V40G_HTML/APS2RFTE/RFPXXXXX.HTM

Usually it is easiest to start reading it from the end (last events)
towards the beginning like follows:

# uerf -R

I hope this helps you.


Good luck!

Kari

David Dupin

unread,
Jan 19, 2009, 3:59:07 AM1/19/09
to
Kari Uusimäki a écrit :
Hi Kari,

Thanks for your documentation's link this will help me to maintain our
Tru64 Unix.

To see what binary.errlog report i use dia which is the "new" tool to
read binary.errlog but maybe uerf is better ?

Since battery change in our disk bay, uptime is 3 days, so there were
no restart since this day.

Thanks for help all, and i will do my best to learn Tru 64 and Alpha
server (direction put me on this machine because i know linux, but there
is quite some difference ;).

And apologise for my bad english i'm french :D

David.

0 new messages