Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Warning: patchadd 137138-09 can leave system unbootable

119 views
Skip to first unread message

Horst Scheuermann

unread,
Nov 26, 2008, 4:27:56 AM11/26/08
to

System X4500 Thumper 5.10 Generic_137112-08 mirroed root swap var

patchadd 137138-09 in single-usermode hanged up the system in
postpatch in the patch's own bootadm - program.

Executing postpatch script...
Creating GRUB menu in /
- searching for UFS boot signatures
- no existing UFS boot signatures

Call 38210519 was opened.

System was not interuptable and had to be powered down.
Any new boot ended in system panic.

I had to install the machine new. I don't think, that my system was
unique, so this could happen to other SUN customers too.

I don't know, why they did not create an alert news.

Martha Starkey

unread,
Nov 30, 2008, 10:36:24 AM11/30/08
to

there are a few Sun Alerts posted about update 6
kernel patches and booting problems:

246206: Solaris 10 Kernel Patches 137137-09/137138-09
May Cause Boot Failure For An MPxIO Enabled ... |

246207: A Lack of Root Filesystem Space When
Installing Solaris 10 Kernel Patch 137137-09/137138-09

Horst Scheuermann

unread,
Dec 1, 2008, 5:30:09 AM12/1/08
to
Martha Starkey <martha....@sun.com> writes:

> Horst Scheuermann wrote:
> > System X4500 Thumper 5.10 Generic_137112-08 mirroed root swap var
> >
> > patchadd 137138-09 in single-usermode hanged up the system in
> > postpatch in the patch's own bootadm - program.
> >
> > Executing postpatch script...
> > Creating GRUB menu in /
> > - searching for UFS boot signatures
> > - no existing UFS boot signatures
> >
> > Call 38210519 was opened.
> >
> > System was not interuptable and had to be powered down.
> > Any new boot ended in system panic.
> >
> > I had to install the machine new. I don't think, that my system was
> > unique, so this could happen to other SUN customers too.
> >
> > I don't know, why they did not create an alert news.
> >

> there are a few Sun Alerts posted about update 6
> kernel patches and booting problems:

> 246206: Solaris 10 Kernel Patches 137137-09/137138-09
> May Cause Boot Failure For An MPxIO Enabled ... |

Symptoms were diffent from those in 246206, MPxIO was not enabled

> 246207: A Lack of Root Filesystem Space When
> Installing Solaris 10 Kernel Patch 137137-09/137138-09

there was plenty of root in the root-Filesystem


--
11. Gebot: Wenn Du eine Fahrradklingel hörst, dreh Dich um, reiße
Mund, Nase und Augen auf, trete aber keinesfalls zur Seite.

Martha Starkey

unread,
Dec 1, 2008, 9:09:55 AM12/1/08
to

There are more sun alerts for the 137137-09 issues:

Document Audience: PUBLIC
Document ID: 244606
Title: Solaris 10 SPARC Kernel patch 137111-01 through 137111-08
Enforces Mutex Alignment Rules and May Cause Some Applications to Fail

Document Audience: PUBLIC
Document ID: 245626
Title: ZFS Pool Corruption May Occur With Sun Cluster 3.2 Running
Solaris 10 with patch 137137-09 or 137138-09

Doesn't look like either of those matchup with this problem.

I looked at your ticket in the system and it's the only one I could find
with this precise error and patch 137138-09. If this comes up again
with you or another customer, we'd obviously have to investigate further
prior to any OS reload.

But at least its now recorded here in Usenet.

ji...@specsol.spam.sux.com

unread,
Dec 6, 2008, 2:15:00 AM12/6/08
to

137138-09 has more problems than that.

According to smpatch on Solaris 10 x86 U5, the following patches are
needed and all require 137138-09:

139573-01
139499-01
139484-01
139552-01
139580-01

But it fails to install with a weird error about not being able to mount
user's home directories to a path under /var/run.


--
Jim Pennino

Remove .spam.sux to reply.

user60...@spamcorptastic.com

unread,
Dec 16, 2008, 9:53:00 AM12/16/08
to
On 26 Nov., 10:27, h...@use-reply-to.invalid (Horst Scheuermann)
wrote:

Hi Horst,
as I went exactly into the same condition as you did - I got out of
that with some sweat and swearing. ;)

For all the others here, who will read this when looking what has
happened to their system, here is how i made it (maybe it will not
work for you, but it's a chance):

My system (a sun fire x4100 - so it's real sun hardware - no chance
for sun to tell "not supported" ;) ) hang like Horst told. My root-
filesystem is about 52GB and I don't use mpxio.

* I switched to the ilom and resetted the hanging system (power off/on
would work as well)
* Any new boot into the standard solaris system made the system crash
* So I did start a "solaris failsave" Session from grub. The system
told me, no installed Solaris was found, as I have meta-devices from
SVM (SDS Rootdisk) Mirroring, that can't be controlled. So I mounted
my boot-device (c0t2d0s0) primary mirror to /mnt:
#mount -F ufs /dev/dsk/c0t2d0s0 /mnt
#update boot-archive /mnt
#sync
#umount /mnt
#reboot
Next boot (not failsave!): (still the old kernel 127128-11 was
displayed in the boot-screen)
msg:
files in / differ from the boot archive: ....
action:

# svcadm clear system/boot-archive
State: system crash while booting up


Next boot (137138-09) !!!! (new Kernel found!)
msg:
....

WARNING: The following files in / differ from the boot archive: ....

(many more than before!!!!)
# svcadm clear system/boot-archive

State: error due to filesystem corruption.
action: fsck (3 times until no more errors where displayed)

# fsck -y -F ufs /dev/rdsk/c0t2d0s0
after that (that's a bit of screen logging here):
# mount
/ on /pci@0,0/pci1022,7450@2/pci1000,3060@3/sd@2,0:a
#
svcadm clear system/boot-archive
svcadm: Instance
"svc:/system/boot-archive:default" is not in a maintenance or degraded

# svcs -xv svc:/system/filesystem/usr:default

(read/write root file systems mounts) State: maintenance since Tue
Dec
16 12:18:52 2008 Reason: Start method exited with
$SMF_EXIT_ERR_FATAL.

See: http://sun.com/msg/SMF-8000-KS See: /etc/svc/volatile/system-
filesystem-usr:default.log Impact: 68
dependent services are not running:
.....(68 services are listed here)
## this was because of the corrupted root-filesystem before, so let's
clear this maintenance-state now
# svcadm clear system/filesystem/usr
Configuring devices.

Loading smf(5) service descriptions: 4/4
No pending job.
Reading ZFS config: done.

SYSTEMNAME console login: root

Password: *************
# showrev -p | grep 137138 Patch: 137138-09 Obsoletes: 118997-10 ....


=> Now another test-reboot -> successful, no crash

So my system is up to date and running again.

What seemed strange was the mirror-disk of root. After a minute it
told me 77% resynced (52 GB!) and some seconds later the resync was
finished. I don't know SVM mirror doing a incremental resync after a
crash I broke up the mirror and started a normal sync when attaching
the mirror-device again (took more than one hour!).

Happy repairing and:
Sun-Support, replace this buggy patch ASAP!

....

Jeffery Small

unread,
Dec 16, 2008, 2:39:06 PM12/16/08
to
On 26 Nov., 10:27, h...@use-reply-to.invalid (Horst Scheuermann) wrote:
> System X4500 Thumper 5.10 Generic_137112-08 mirroed root swap var
>
> patchadd 137138-09 in single-usermode hanged up the system in
> postpatch in the patch's own bootadm - program.
>
> Executing postpatch script...
> Creating GRUB menu in /
> - searching for UFS boot signatures
> - no existing UFS boot signatures
>
> Call 38210519 was opened.
>
> System was not interuptable and had to be powered down.
> Any new boot ended in system panic.
>
> I had to install the machine new. I don't think, that my system was
> unique, so this could happen to other SUN customers too.
>
> I don't know, why they did not create an alert news.

user60...@spamcorptastic.com writes:

>Hi Horst,
>as I went exactly into the same condition as you did - I got out of
>that with some sweat and swearing. ;)

>[...]


I also experienced the same problem on a SPARC system with corresponding
patch #137137-09. The patch looks like it will install properly but then
the system fails to boot. I had a mirrored root filesystem on c0t0d0s0 and
c0t1d0s0 and these were the steps I took to repair things. These are
the instructions for a SPARC system with OBP, not GRB.

1: Upon reboot, when the system comes up with the the error messages, enter
maintenance mode using the root password and then immediately halt the
system:

telinit 0

2: Reboot in failsafe mode and login with the root password:

boot -F failsafe

3: You should get instructions telling you what to do. If you have a mirrored
system they instruct you to fix the "primary" side of the mirror and
reboot - presumptively with md resyncing the other disk to match. However,
this did not work for me, so I had to repeat the process and perform the
operation on both disks [substitute your own disks here]:

mount /dev/dsk/c0t0d0s0 /mnt

bootadm update-archive -R /mnt

umount /mnt

mount /dev/dsk/c0t1d0s0 /mnt

bootadm update-archive -R /mnt

umount /mnt

halt

3: Reboot the system. Since I had installed a bunch of patches along with
this one, I decided to do a "boot -r" just to be on the safe side. However,
I doubt that this is necessary. Surprisingly, the system came up clean
and the mirrors didn't even need to resync! I rebooted a couple of times
after that and everything seems to me OK.

I agree that Sun should have immediately pulled these patches after the first
report. What are they waiting for?

Regards,
--
Jeffery Small

syn_ack

unread,
Dec 17, 2008, 3:30:11 AM12/17/08
to

> I agree that Sun should have immediately pulled these patches after the first
> report. What are they waiting for?
>
> Regards,
> --
> Jeffery Small

jeffery, it seems, yo udid not have to resync, because you wrote the
new boot block on both disks, that build your boot md-devices ;)

Just looking at sunsolve again i found the problem described in
http://sunsolve.sun.com/search/document.do?assetkey=1-1-6772822-1
seems to affect NOT ONLY the MPxIO and FC disks! So maybe generally
installing 125556-01 (for x86!) (125555-01 for Sparc) before
installing this 13713[78]-09 patch would avoid the situation we ran
into.

Regards,
Burkard

0 new messages