Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[HPADM] SUMMARY DISK FAILURE

1 view
Skip to first unread message

REYES, Bienvenido F.

unread,
Jul 2, 2000, 3:00:00 AM7/2/00
to
HP had the FC Card in our server and DF SCSI card in FC SCSI MUX replaced
and our system is now back to normal.
Thank you to all who responded:
-Danilo Israel -Rita Workman
-Joseph S. Clark -Michael Lampi
-Jack Gallagher -Marlou Everson
-Cindy Yoho
especially to Hardeep Bhogal who wrote:


We had the same problem last year, find below an outline of what was
recommended,
it may help.


The following was discovered.

I. Reliability Problems of Fibre Channel Adapter cards

Reliability problems with systems using the
HP-HSC Fibre Channel Adapter (Product number A3404A). The Gigabit
Link Module (GLM) signal quality may decrease over time. Failure
analysis indicates the failure rate increases with the amount of time
that the GLM is installed and operating. GLM's installed nine months
or longer are at a higher risk of failure.

Common symptoms include ALERT, POWERFAIL, and/or LVM messages
displayed on the system console and in the system log file, these
messages can be caused by BOTH software and hardware.

The server may also report a high load average of 100%

Software Defects

Software defects in the operating system can result in false ALERT
and POWERFAIL messages being logged. These are resolved with recent
patches for fibre channel mass storage (FCMS), LVM and SCSI services.
It is recommended that recent versions of these patches be installed.

The following are minimum recommended patch revisions, please ensure
that dependencies are also installed.

HP-UX: PHSS_18326 (FCMS)
PHKL_17547 (LVM)
PHKL_19615 (SCSI Services)

These patches reduce the false POWERFAIL and ALERT messages that may
be logged during normal operation. HP strongly recommend that
customers install the above patches BEFORE proceeding with the
hardware upgrade. Otherwise this could impede troubleshooting in a
true hardware failure.

Hardware Upgrade

Hardware defects results from a degradation of the FC GLM laser
signal and can lead to poor server performance which is characterised
by exceptional load averages. The /opt/fcms/bin/show.fc utility will
also report an ever increasing 'BAD Tx Char Count' indicator on the
defective FC interface. We have also observed that the defective FC
interface does not fail-over in environments where two cards have
been installed for resilience.

We strongly recommend that you contact your local HP supplier to
assess whether the Fibre Channel Gigabit Link Module needs to be
upgraded.

II. Firmware upgrade for HP Model 30 disk Arrays

Users of Model 30 disk arrays ensure that the FLARE
(firmware) is upgraded to revision 9.46.07 or newer. This FLARE
revision resolves false POWERFAIL messages on Model 30 arrays. Please
contact your local HP supplier if you are unsure on whether the FLARE
firmware version is compliant.

III. Hewlett-Packard recall of patch PHSS_17615:

Patch PHSS_17615: (s800 10.20 Fibre Channel Mass Storage Driver
Patch) has been recalled by HP. This has been replaced by PHSS_18326


Patch Dependencies for PHSS_18326 include

PHKL_13015 PHCO_11490 PHKL_17547 PHKL_19615


Warning Description:

This patch has been removed by the HP-UX Patch Administrator.

PHSS_17469 introduces a problem that manifests itself as a Fibre
Channel loop hang for between 35 and 65 minutes. The hang can occur
when a Fibre Channel hub is power cycled or a cable is disconnected
for a few seconds.

When the hang occurs, using fcmutil(1M) on any of the adapters in
the loop will show the driver state as 'OFFLINE' or 'RESETTING',
often cycling between the two states. The hang can be cleared by
disabling all but one of the adapters in the loop. The disabled
cards can then be enabled to restart I/O.

The problem also exists in superseding patch PHSS_17615.

PHSS_17615 also introduces a problem accessing DLT tape devices
connected via a FC SCSI MUX. The DLT tape devices will experience
I/O errors, due to a shortened I/O timer that does not allow the I/O
to complete.

Both problems have been corrected in patch PHSS_18326 HP recommends
that PHSS_17469 and PHSS_17615 be removed from all systems on which
they have been installed. PHSS_18326 should be installed after the
patches have been removed. PHSS_17469 and PHSS_17615 should also be
removed from all depots that contain them.

For system maintenance reasons, HP recommends that PHSS_17469 and
PHSS_17615 be removed before PHSS_18326 is installed

The view expressed in this e-mail are my personal views and not
necessarily the views of DHL Systems Limited or any affiliated
organisation or partners.

"REYES, Bienvenido F." wrote:

> Here are some of the activities done by HP Philippines which unfortunately
> has not solved our problem:
>
> DATE ACTIVITY
>
> 06/20/00 - rebooted system without changing anything,
> POWERFAILED error seen during boot-up process
> - performed mstm self test on HP array
000000136F93
> which is connected to the DB server via FC
cable,
> test was hung
> - reseat all boards inside the FC(Fiber Channel)
> SCSI Mux (A3511AZ), all self test passed on this
> equipment via offline diagnostics in the control
> panel of this equipment
> - swapped J2389-69008 (HSC FC adapter card) in the
> DB server
> - performed "pvchange -t 300 /dev/dsk/cxtxdx" on
all
>
> affected disks as seen by Response Center in
Kmine
> - inserted the FC cable running from DB server to
> FC SCSI Mux to another port (port #9), when the
> server was rebooted, the POWERFAILED error was
> eliminated during boot-up, upon clustering &
> database activation, POWERFAILED error was seen
> again, escalated the call to Singapore
Escalation
> Center
>
> 06/22/00 - installed patch PHCO_21309 in all servers (DB,
APP
>
> & HMI - K380 servers)
> - performed firmware upgrade of HP disk array
>
> (AUTORAID) from firmware version HP32 to HP60 as
> recommended by Singapore Escalation team for
both
> AUTORAID s/n 000000136F93 & 000000137BCA
> - Response Center (Philippines) ask client to do
an
>
> I/O monitoring & submit data for analysis
> - performed "nickel" script in all servers before
&
> after firmware upgrade as recommended by
Singapore
>
> Escalation team to check all patches installed,
> all
> boards installed, all volume groups
> configurations,
> etc. for in-depth analysis
>
> 06/26/00 - installed the ff. patches as recommended by
> Singapore Escalation team based on all the
> "nickel"
> script output:
>
> PHKL_16751, PHKL_16959, PHKL_17858, PHKL_20529,
> PHKL_20611, PHKL_21085, PHKL_21595, PHKL_21661,
> PHNE_20834, PHNE_10607, PHSS_20418, PHCO_16591,
> PHCO_18563 & PHCO_21186
>
>
>
>
> --
> ---> Please post QUESTIONS and SUMMARIES only!! <---
> To subscribe/unsubscribe to this list, contact
majo...@dutchworks.nl
> Name: hpux-...@dutchworks.nl Owner:
owner-hp...@dutchworks.nl
>
> Archives: ftp.dutchworks.nl:/pub/digests/hpux-admin (FTP, browse
only)
> http://www.dutchworks.nl/htbin/hpsysadmin (Web, browse &
search)

hbhogal.vcf
0 new messages