Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Live Upgrade Solaris 10 update 9 assertion failure

850 views
Skip to first unread message

Doug

unread,
Oct 14, 2010, 12:13:52 PM10/14/10
to
I have a problem similar to what is described here
http://groups.google.com/group/comp.unix.solaris/msg/16adab950f1cf19a
The other commenters in the thread thoguh the problem was due to SAM-
QFS,
but I disagree.

Here is what I did: I installed Solaris 10 update 8 from DVD onto a
x86 system with a pair of 250GB SATA disks. After installing the root
filesystem onto a UFS disk slice, I used Solaris Volume Manager to
create a mirror metadevice. I also created a mirrored zpool on
another pair of disk slices.

I used lucreate to make a copy of the s10u8 boot environnent onto
another disk slice without problems. I was able to run luactivate and
boot into it without problems.

Now, I want to use Live Upgrade to upgrade to Solaris 10 update 9.

I replaced the SUNWlu packages by running the
Solaris_10/Tools/Installers/liveupgrade20 script on the s10u9 DVD.

That script says I am supposed to install some patches to make live
upgrade work, but I can't find a clear list of which patches I need to
do that. The script said to find infodoc 206844 on sunsolve, but that
doesn't seem to exist anymore. Is it 1004881.1? But that was last
updated 2009-11-12 (before s10u9 was released?)

I installed these patches:

119255-76 SunOS 5.10_x86: Install and Patch Utilities Patch
119253-33 SunOS 5.10_x86: System Administration Applications Patch
119535-19 SunOS 5.10_x86: Flash Archive Patch
120200-16 SunOS 5.10_x86: sysidtool Patch
121431-54 SunOS 5.8_x86 5.9_x86 5.10_x86: Live Upgrade Patch
124629-15 SunOS 5.10_x86: CD-ROM Install Boot Image Patch
124631-44 SunOS 5.10_x86: System Administration Applications, Network,
and C
140915-02 SunOS 5.10_x86: cpio patch

Now, when I try to run lucreate, it just hangs; here is excerpt of
ptree root:

11387 lucreate -n s10u8c -m /:/dev/md/dsk/d40:ufs
11388 /etc/lib/lu/plugins/lupi_zones plugin
11389 /etc/lib/lu/plugins/lupi_svmio plugin
11391 /etc/lib/lu/plugins/lupi_bebasic plugin
11400 /sbin/sh /usr/lib/lu/lucreate -n s10u8c -m /:/dev/md/
dsk/d40:ufs
12012 /sbin/sh /usr/lib/lu/lumake -b 11400 -c -s s10u8a -n
s10u8c -i /etc/lu/INODE.3
12383 /sbin/sh /usr/lib/lu/lupop -i /etc/lu/ICF.3 -p
s10u8a -s /tmp/.liveupgrade.4816
12541 /sbin/sh /usr/lib/lu/lucopy -i /etc/lu/ICF.3 -c
s10u8a -p /etc/lu/ICF.1 -z /tmp
12909 /sbin/sh /usr/lib/lu/lucopy -i /etc/lu/ICF.3 -
c s10u8a -p /etc/lu/ICF.1 -z /tmp
12912 /bin/awk -F: { if ($2 == "/")
{ printf("%s %s\n", $3, $4); }
12913 /sbin/sh /usr/lib/lu/lumk_iconf s10u8b
12927 /usr/lib/lu/lumount -f -Z s10u8b
12929 /etc/lib/lu/plugins/lupi_svmio plugin
12931 /etc/lib/lu/plugins/lupi_zones plugin

I interrupt with ctrl-c. I try to ludelete, it fails with
Assertion failed: ("attempt to free unallocated memory", *ptrKey ==
(unsigned long long)_lu_malloc), file lu_mem.c, line 365

Even "init 6" fails with same message

But, if I "zpool export" and have no zfs filesystem, then lucreate,
etc
commands work. So, I think the zpool/zfs is the cause of this
problem.

If I put back the three SUNWlu packages from the s10u8 DVD, then I am
able to lucreate without problems (with a mounted zfs filesystem). I
then was able to luupgrade to s10u9 (using the s10u8 live upgrade
programs)

The s10u9 environment booted, but there were problems:
$ svcs -xv
svc:/milestone/multi-user:default (multi-user milestone)
State: offline since Thu Oct 14 12:03:42 2010
Reason: Start method is running.
See: http://sun.com/msg/SMF-8000-C4
See: man -M /usr/share/man -s 1M init
See: /var/svc/log/milestone-multi-user:default.log
Impact: 7 dependent services are not running:
svc:/system/boot-config:default
svc:/milestone/multi-user-server:default
svc:/application/autoreg:default
svc:/system/basicreg:default
svc:/system/zones:default
svc:/application/stosreg:default
svc:/application/cde-printinfo:default

$ tail -4 /var/svc/log/milestone-multi-user:default.log

[ Oct 14 12:03:42 Executing start method ("/sbin/rc2 start") ]
Executing legacy init script "/etc/rc2.d/S10lu".
last activated environment: <s10u8a>.
Assertion failed: ("attempt to free unallocated memory", *ptrKey ==
(unsigned long long)_lu_malloc), file lu_mem.c, line 365

I'll submit a support request to Oracle, but any advice in the
meantime?

Thanks

Rainer Orth

unread,
Oct 14, 2010, 12:30:07 PM10/14/10
to
Doug <dy2...@gmail.com> writes:

> $ tail -4 /var/svc/log/milestone-multi-user:default.log
>
> [ Oct 14 12:03:42 Executing start method ("/sbin/rc2 start") ]
> Executing legacy init script "/etc/rc2.d/S10lu".
> last activated environment: <s10u8a>.
> Assertion failed: ("attempt to free unallocated memory", *ptrKey ==
> (unsigned long long)_lu_malloc), file lu_mem.c, line 365
>
> I'll submit a support request to Oracle, but any advice in the
> meantime?

I had exactly the same problem with a Live Upgrade from S10 U7 to S10
U9. I'm not completely sure if the problem is the pure existance of the
zpool is the problem or its name. I suspected that the - in the pool
name was the problem, but it could well be that any pool would case the
same problem. We had a pool called j4500-01, and I worked around the
problem by replacing /sbin/zfs by a wrapper script which simply filters
out that poolname from zfs list output:

#!/bin/sh

base=`basename $0`
case $base in
zfs)
/sbin/zfs.bin $@ > /tmp/zfs.out.$$
;;
*mount)
/sbin/zfs.bin $base $@ > /tmp/zfs.out.$$
;;
esac
status=$?
cat /tmp/zfs.out.$$ | grep -v j4500-01
rm -f /tmp/zfs.out.$$
exit $status

It's a bit complicated because it is either called directly, but
sometimes also via a symlink from /etc/fs/zfs/*mount.

I meant to open a case with Oracle, but haven't gotten arount to it yet.

Hope this helps.

Rainer

--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University

Doug

unread,
Oct 14, 2010, 12:34:26 PM10/14/10
to
On Oct 14, 12:30 pm, Rainer Orth <r...@CeBiTec.Uni-Bielefeld.DE>
wrote:

Thanks for the info. But, I named my zpool "tank" and used "zfs
create tank/scratch" for two zfs filesystems in my example above, so
no hyphen in a zfs/zpool name for me.

Rainer Orth

unread,
Oct 14, 2010, 12:44:40 PM10/14/10
to
Doug <dy2...@gmail.com> writes:

> Thanks for the info. But, I named my zpool "tank" and used "zfs
> create tank/scratch" for two zfs filesystems in my example above, so
> no hyphen in a zfs/zpool name for me.

You could still try the script to filter that pool name from zfs list
output. If this helps, we know that the pool name is irrelevant (which
would speak volumes for the quality of LU testing at Oracle ;-(

Ian Collins

unread,
Oct 14, 2010, 2:51:21 PM10/14/10
to
On 10/15/10 05:13 AM, Doug wrote:
>
> $ tail -4 /var/svc/log/milestone-multi-user:default.log
>
> [ Oct 14 12:03:42 Executing start method ("/sbin/rc2 start") ]
> Executing legacy init script "/etc/rc2.d/S10lu".
> last activated environment:<s10u8a>.
> Assertion failed: ("attempt to free unallocated memory", *ptrKey ==
> (unsigned long long)_lu_malloc), file lu_mem.c, line 365
>
> I'll submit a support request to Oracle, but any advice in the
> meantime?

Have you tried migrating to ZFS root first?

--
Ian Collins

Rainer Orth

unread,
Oct 14, 2010, 2:58:23 PM10/14/10
to
Ian Collins <ian-...@hotmail.com> writes:

I've seen this exact issue *with* ZFS root, so this won't help.

Doug

unread,
Oct 14, 2010, 5:17:13 PM10/14/10
to
On Oct 14, 12:44 pm, Rainer Orth <r...@CeBiTec.Uni-Bielefeld.DE>
wrote:

Did you have more than one zpool on your system? Or did your /sbin/
zfs wrapper script always output an empty list?

Ian Collins

unread,
Oct 14, 2010, 5:34:23 PM10/14/10
to
On 10/15/10 05:30 AM, Rainer Orth wrote:
> Doug<dy2...@gmail.com> writes:
>
>> $ tail -4 /var/svc/log/milestone-multi-user:default.log
>>
>> [ Oct 14 12:03:42 Executing start method ("/sbin/rc2 start") ]
>> Executing legacy init script "/etc/rc2.d/S10lu".
>> last activated environment:<s10u8a>.
>> Assertion failed: ("attempt to free unallocated memory", *ptrKey ==
>> (unsigned long long)_lu_malloc), file lu_mem.c, line 365
>>
>> I'll submit a support request to Oracle, but any advice in the
>> meantime?
>
> I had exactly the same problem with a Live Upgrade from S10 U7 to S10
> U9. I'm not completely sure if the problem is the pure existance of the
> zpool is the problem or its name. I suspected that the - in the pool
> name was the problem, but it could well be that any pool would case the
> same problem. We had a pool called j4500-01, and I worked around the
> problem by replacing /sbin/zfs by a wrapper script which simply filters
> out that poolname from zfs list output:

I've upgraded systems with more than one pool, but never one with hyphen
in the name. So if there is a problem with pool names, it's with the
hyphen.

--
Ian Collins

Ian Collins

unread,
Oct 14, 2010, 5:34:56 PM10/14/10
to
On 10/15/10 10:17 AM, Doug wrote:
> On Oct 14, 12:44 pm, Rainer Orth<r...@CeBiTec.Uni-Bielefeld.DE>
> wrote:
>> Doug<dy2...@gmail.com> writes:
>>> Thanks for the info. But, I named my zpool "tank" and used "zfs
>>> create tank/scratch" for two zfs filesystems in my example above, so
>>> no hyphen in a zfs/zpool name for me.
>>
>> You could still try the script to filter that pool name from zfs list
>> output. If this helps, we know that the pool name is irrelevant (which
>> would speak volumes for the quality of LU testing at Oracle ;-(
>
> Did you have more than one zpool on your system? Or did your /sbin/
> zfs wrapper script always output an empty list?

Have you tried exporting the pool first?

--
Ian Collins

Rainer Orth

unread,
Oct 15, 2010, 5:00:58 AM10/15/10
to
Doug <dy2...@gmail.com> writes:

> Did you have more than one zpool on your system? Or did your /sbin/

Sure, the rpool and the data (j4500-01) pool.

> zfs wrapper script always output an empty list?

No, it still emitted the rpool info, otherwise lu wouldn't have worked.

Doug

unread,
Oct 15, 2010, 9:04:29 AM10/15/10
to
On Oct 14, 2:58 pm, Rainer Orth <r...@CeBiTec.Uni-Bielefeld.DE> wrote:

OK, thanks again for all the good advice.

The root filesystem is on UFS. I also use zpool/zfs for non-OS
files. Some systems use Solaris zones/containers. The zone roots are
sometimes on zfs.

I could "zpool export ..." prior to live upgrade and then import the
zpools again once I have booted into s10u9. But, then the zones would
not work. (I guess I could do a "zone upgrade on attach" and hope
that upgrades the zone.) But I would rather use live upgrade to do
what it is supposed to do.

I think LU has become more fragile and buggy as more Solaris updates
have been released (the opposite of what should be happening as
Solaris 10 "matures") I get that LU has to now deal with UFS, ZFS,
zones, etc. But, I hate having to guess which patches I am supposed
to install before running LU and the suspense of wondering if it will
work or fail with a random shell error message or, in this case, a
pointlessly cryptic message.

cindy

unread,
Oct 15, 2010, 11:52:43 AM10/15/10
to

Hi Doug,

The Superblock message is definitely coming from SAM-QFS, but this
message emitting from lu* and friends:

Assertion failed: ("attempt to free unallocated memory",

is CR 6990618 just recently filed.

It was difficult to reproduce but we hope to have some information
soon.

I've tested 100s of LUs from Solaris 10 10/08, 5/09, 10/09, and 9/10
on
lab systems with low memory and existing ZFS storage pools. Never
saw this and I've seen a lot.

Th


Doug

unread,
Oct 15, 2010, 12:05:41 PM10/15/10
to
Thanks for the info Cindy,

To reproduce, I just started with x86 S10U8 full+OEM install from the
DVD. The root filesystem and swap on SVM mirrored slices. A zfs
filesystem on a zpool mirrored on slices. Then I installed the
patches I mentioned in my first message. Then run the "liveupgrade20"
script from the S10U9 DVD to replace the three SUNWlu* packages.

Finally I ran lucreate to install a boot environment to another
mirrored SVM UFS filesystem. The lucreate hangs (I let it run over an
hour--when I ran the lucreate from S10U8 and without the live ugprade
patch 121431-54 it finished OK in 20 minutes.) After aborting the
lucreate, ludelete & init 6 fail with "Assertain failed..."

Doug

unread,
Oct 17, 2010, 6:37:54 PM10/17/10
to
I experimented more and observed the apparent cause of the problem to
be having the zpool/zfs mountpoint set to "none".

This zfs config consistently triggers the LU problems on s10u9:

$ zfs list -r
NAME USED AVAIL REFER MOUNTPOINT
tank 122K 19.2G 22K none
tank/scratch 21K 19.2G 21K /scratch

I set that up with these commands:
# zfs set mountpoint=none tank
# zfs set mountpoint=/scratch tank/scratch

That will cause lucreate, ludelete, init 6, etc! to fail by hanging or
"Assertion failed" messages.

The workaround is to set a non-null mountpoint for the zfs that
corresponds to the zpool (tank, in this case.)

I bet Rainer's "j4500-01" zfs had mountpoint=none which triggered the
problem.

This LU bug is a regression since the LU that came with s10u8 did not
have a problem with zfs mountpoints.

stuart

unread,
Oct 17, 2010, 10:46:17 PM10/17/10
to
On Oct 17, 3:37 pm, Doug <dy2...@gmail.com> wrote:
> I experimented more and observed the apparent cause of the problem to
> be having the zpool/zfs mountpoint set to "none".
>

Very nice sleuthing. Unfortunately with zvol's there is no option to
set mountpoint to avoid this problem.

0 new messages