After having upgraded from OE 2008/03 to 2009/03 on an rx3600, the
kernel bootet the first time after upgrade was not the expected one (it
lacked the random module, among other things, causing some software to
fail startup). When the correct kernel is booted, the kernel crashes
early during device discovery (at the USB stuff) with a Data TLB Fault
when there is no kernel stack (my interpretation).
The actual words are:
[...]
btlan_load() Loaded Successfully
0 sba
0/0 lba
0/0/1/0 rmp3f01
0/0/1/1 rmp3f01
0/0/1/2 asio0
0/0/2/0 hcd
Bad News: pr == 0x144800c0b10c003d
Bad News: Cannot use the Kernel Stack when interrupted on the ICS.
Bad News: Predicate set: 0x144800c0b10c003d.
******************************************************************************
******************************************************************************
reg_dump(): Displaying register values (in hex) from the save state at
ssp e0000001_002ff800 return_status/reason/flags 0000/0008/00000001
Interruption type: Data TLB Fault
[...]
Most interestingly a kernel that runs fine on an rx6600 also crashes at
the very same spot on that machine. The machine has the latest firmware
installed. The guy at HP said it's not a hardware problem, and the
software guy said without a crashdump it's not possible to analyze the
problem. However I had sent HP the full console output, including the
messages, the register dump and the backtrace with program counters. I
could also send them the (big) kernel. Now if someone says one couldn't
analyze the problem with those data, that one has little knowledge on
the subject. After all, it's not my fault that the combination of HP
hardware and HP software crashes.
Also note that the kernel on the recovery DVD (March 2009) crashes at
the same spot.
The machine has roughly all the critical patches that HP recommended.
For your amusement, I'll display a full "boot & crash" cycle here (If
you have never seen a Itanium2 register dump):
Loading.: HP-UX Primary Boot: 0/4/1/0.0xcb1440d93fc4d71.0x0
Starting: HP-UX Primary Boot: 0/4/1/0.0xcb1440d93fc4d71.0x0
(C) Copyright 1999-2008 Hewlett-Packard Development Company, L.P.
All rights reserved
HP-UX Boot Loader for IPF -- Revision 2.037
Press Any Key to interrupt Autoboot
\EFI\HPUX\AUTO ==> boot vmunix
Seconds left till autoboot - 0
AUTOBOOTING...> System Memory = 24545 MB
loading section 0
...................................................................... (complete)
loading section 1
............. (complete)
loading symbol table
loading System Directory (boot.sys) to MFS
.....
loading MFSFILES directory (bootfs) to MFS
...................
Launching /stand/vmunix
SIZE: Text:35682K + Data:6355K + BSS:22140K = Total:64178K
Console is on Serial Device - via PCDP
Booting kernel...
Stored message buffer up to system crash:
MFS is defined: base= 0xe000000101bd6000 size= 27860 KB
Loaded ACPI revision 2.0 tables.
MMIO on this platform supports Write Coalescing.
MCA recovery subsystem disabled, not supported on this platform.
Using /stand/ext_ioconfig
Memory Class Setup
-------------------------------------------------------------------------
Class Physmem Lockmem Swapmem
-------------------------------------------------------------------------
System : 23346 MB 23346 MB 23346 MB
Kernel : 23346 MB 23346 MB 23346 MB
User : 22263 MB 19737 MB 19814 MB
-------------------------------------------------------------------------
ktracer is off until requested.
Installing Socket Protocol families AF_INET and AF_INET6
64000/0xfa00 esvroot
Kernel EVM initialized
64000/0x0 mass_storage
64000/0x0/0x0 usb_ms_scsi
sec_init(): kernel RPC authentication/security initialization.
secgss_init(): kernel RPCSEC_GSS security initialization.
rpc_init(): kernel RPC initialization.
rpcmod_install(): kernel RPC STREAMS module "rpcmod" installation. ...(driver_install)
NOTICE: nfs_client_pv3_install(): nfs3 File system was registered at index 10.
NOTICE: nfs_client_pv4_install(): nfs4 File system was registered at index 11.
NOTICE: cachefsc_install: cachefs File system was registered at index 13.
btlan_load() Loaded Successfully
0 sba
0/0 lba
0/0/1/0 rmp3f01
0/0/1/1 rmp3f01
0/0/1/2 asio0
0/0/2/0 hcd
Bad News: pr == 0x144800c0b10c003d
Bad News: Cannot use the Kernel Stack when interrupted on the ICS.
Bad News: Predicate set: 0x144800c0b10c003d.
******************************************************************************
******************************************************************************
reg_dump(): Displaying register values (in hex) from the save state at
ssp e0000001_002ff800 return_status/reason/flags 0000/0008/00000001
Interruption type: Data TLB Fault
0- 3 00000000_00000000 e0000001_0062f510 e0000001_00300b90 e0000001_002ee050
4- 7 00000000_00000000 00000000_00000000 00000000_00000000 00000000_00000000
8-11 00000000_00000000 e0000001_379443a8 e0000001_0b04f490 e0000001_0b04f4b0
12-15 e0000001_002ffba0 9fffffff_7f7e8000 e0000000_00ea4f40 e0000001_002ffbe1
16-19 e0000001_04a55c40 e0000001_04a55c48 e0000000_00ea4f40 e0000001_04a53000
20-23 e0000001_04a55000 e0000001_04a55c40 00000000_00000001 00000000_00000000
24-27 e0000001_0b04f480 e0000001_0b04f4a0 00000000_00000001 00000000_00000000
28-31 00000000_00000010 00000000_00000001 e0000000_f0000b90 00000000_000000c0
br_0-3 e0000000_00ea4fa0 00000000_3de51a20 00000000_00000000 00000000_00000000
br_4-7 00000000_00000000 00000000_00000000 e7ffffff_02fa4750 e0000000_0163cc60
6 5 5 3 3 2 22 2 1 000 00
pr bits = ---0-8---4----------------7----2---8---43-1---7----------654--10
pr value = 14400021_11a20073
k0 00000000_00000000 rsc 00000000_00000013 fpsr 0009a04d_0278037f
k1 00000000_00000000 bsp e0000001_002e0370 unat ffffffff_ffffffff
k2 00000000_00000000 bspstore e0000001_002e0178 lc 00000000_00000000
k3 00000000_00000000 bsp_base 00000000_00000000
ppdp e0000001_00300000 dirty 00000000_000001f8 csd 00000000_00000000
ktp e0000001_3eff2000 rnat 00000000_00000000 ssd 00000000_00000000
ksp 9fffffff_7f7e8003 ccv 00000000_00000001
sv e0000001_2088e000 pfs 00000000_00000308 ec 00
iip e0000000_0163cca0/0 ifs 80000000_0000048e [i]psr 00001010_086ae01a
iipa e0000000_0163cc90 tpr 00000000_000000c0 isr 00000804_00000000
ibe sddiimic rtldsdpsddd p i mmaub
andrisdadtcspl tbpbiipphlt kic hlcpe
psr bits = ------------------01000000010000----10000110101-111-------01101-
e snirsn
deioirsparwx <vector>< code >
isr bits = --------------------100000000100--------000000000000000000000000
arg0=00000000_00000000 (cr.ifa)
arg1=00000000_00000000 (unknown)
Dirty Registers:
0178: 00000000_00000288 e0000000_01a36f80 00000000_00000000 00000000_00000000
0198: 00000000_0000000c 00000000_00000000 00000000_00000000 00000000_00000000
01b8: 00000000_00000000 00000000_00000000 e0000001_002ffc00 e0000001_002ffbf0
01d8: e0000001_002e01c8 9fffffff_7f7e8003 14400021_11a20033 e0000001_002ffc00
01f8: 00000000_00000000 00000000_000003ff e0000001_002ffbe0 00000000_00000010
0218: 00000000_00000288 e0000000_01dc1780 14400021_11a20033 e0000000_f0000000
0238: 00000000_00000010 00000000_00000000 e0000001_002ffc00 e0000001_002ffbe0
0258: 00000000_00000001 e0000000_f0000b90 e0000001_00627168 e0000001_00627174
0278: 00000000_00000003 00000000_00000001 e0000001_04c41080 e0000001_04a55000
0298: 00000000_00000000 e0000001_00627158 00000000_00000000 00000000_000000df
02b8: e0000000_f0000fd0 e0000001_0b04f490 00000000_000000c4 e0000001_379238d0
02d8: 00000000_00000d9f e0000000_00ea4060 14400021_11a20073 e0000001_37944380
02f8: 00000000_00000000
out28: e0000001_37930800 00000000_00000308 e0000000_00ea4fa0 14400021_11a20073
out2c: e0000001_002ffbb0 00000000_00050149 e0000001_00b6a840 e0000001_002ffbb4
out30: e0000001_002ffbb0 00000000_00000000 00000000_00000000 00000000_00000000
out34: 00000000_00000000 e0000001_00300000
Message buffer contents after system crash:
panic: Bad News!
Stack Trace:
IP Function Name
0xe000000001dbafb0 bad_news+0x3b0
0xe000000001dc1780 bubbledown+0x0
0xe00000000163cca0 hcd_isr+0x40
0xe000000000ea4fa0 sapic_interrupt+0x60
0xe000000000ea4060 external_interrupt+0x4b0
0xe000000001dc1780 bubbledown+0x0
0xe0000000012d6380 cec_cfg_in8+0xa0
0xe000000001a36f80 elroy_rd_cfg_b+0x60
0xe000000001d14f00 h2p_rd_cfg_b+0x60
0xe00000000140a800 ehcd_attach+0xe0
0xe0000000015378a0 fclp_ifc_attach+0x4c0
0xe00000000163e050 hcd_attach+0x710
0xe000000001737830 intl100_attach+0x90
0xe000000001815a60 ipmi_pci_attach+0x70
0xe000000001eaf120 rmp3f01_pci_attach+0x80
0xe000000001eec1e0 sac_attach+0xc0
0xe000000001f6c420 side_attach+0xa0
0xe000000001f708e0 side_multi_pci_attach+0xc0
0xe0000000021b6100 hp_attach+0x60
0xe00000000219c5b0 run_attach+0x50
0xe00000000219eac0 wsio_claim+0x400
0xe000000001d22500 pci_cdio_config+0x3c0
0xe000000001602710 gio_scan_subtree+0x1030
0xe0000000015fffe0 gio_scan_subtree_real+0x280
0xe000000001601840 gio_scan_subtree+0x160
0xe0000000015fffe0 gio_scan_subtree_real+0x280
0xe000000001601840 gio_scan_subtree+0x160
0xe0000000015fffe0 gio_scan_subtree_real+0x280
0xe000000001601840 gio_scan_subtree+0x160
End of Stack Trace
linkstamp: Mon Jul 13 17:39:20 METDST 2009
_release_version: @(#) $Revision: vmunix: B.11.31_LR FLAVOR=perf
Calling function e000000001819a60 for Shutdown State 1 type 0x2
i 0 pfn 0x1 pages 0x3dcb3
i 1 pfn 0x3e19a pages 0x38e
i 2 pfn 0x3fc00 pages 0x162
i 3 pfn 0x3fd64 pages 0x4
i 4 pfn 0x3fdca pages 0x6
i 5 pfn 0x100000 pages 0x500000
i 6 pfn 0x10040000 pages 0xbf1fe
i 7 pfn 0x100ff200 pages 0xe00
*** A system crash has occurred. (See the above messages for details.)
*** The system is now preparing to dump physical memory to disk, for use
*** in debugging the crash.
ERROR: Your system crashed before I/O and dump configuration was complete.
This system does not support a crash dump under these circumstances.
Contact your HP support representative for assistance.
Enjoy!
Regards,
Ulrich
The message said it couldn't take a crash dump and contact the Response
Center. That's the question you need to give Support.
Any reason you are asking questions here instead of the Response Center?
> Now if someone says one couldn't
> analyze the problem with those data, that one has little knowledge on
> the subject.
No, you have to have a lot of knowledge to even begin to make any
statements or wild assumptions about kernel triage.
When I get one of those, I do binary board swapping, since it is
something we changed in the compiler.
> Booting kernel...
> Stored message buffer up to system crash:
You may have to boot with -vs.
> Bad News: pr == 0x144800c0b10c003d
> Bad News: Cannot use the Kernel Stack when interrupted on the ICS.
> Bad News: Predicate set: 0x144800c0b10c003d.
> panic: Bad News!
> Stack Trace:
> 0xe000000001dbafb0 bad_news+0x3b0
> 0xe000000001dc1780 bubbledown+0x0
> 0xe00000000163cca0 hcd_isr+0x40
> 0xe000000000ea4fa0 sapic_interrupt+0x60
> 0xe000000000ea4060 external_interrupt+0x4b0
> 0xe000000001dc1780 bubbledown+0x0
> 0xe0000000012d6380 cec_cfg_in8+0xa0
There may be something they can get from the trace.
> ERROR: Your system crashed before I/O and dump configuration was complete.
> This system does not support a crash dump under these circumstances.
> Contact your HP support representative for assistance.
Here you really really need to get a dump. Or have a maintenance panel
to look at memory and registers.
> Ulrich Windl wrote:
>> the software guy said without a crashdump it's not possible to analyze the
>> problem.
>
> The message said it couldn't take a crash dump and contact the Response
> Center. That's the question you need to give Support.
>
> Any reason you are asking questions here instead of the Response Center?
Yes: Although it's not official here, the audience is partly more
competent than the similarily named center. Also I thought the group
could need a little traffic ;-) Finally the HP patch database had no
matches for the problem, and Google also did not. Soon there will be at
least this match ;-)
>
>> Now if someone says one couldn't
>> analyze the problem with those data, that one has little knowledge on
>> the subject.
>
> No, you have to have a lot of knowledge to even begin to make any statements
> or wild assumptions about kernel triage.
Yes, but the only ones that can with closed source software is HP, and
if they are not willing to look at their buggy code, I'm getting a bit
impatient at least.
>
> When I get one of those, I do binary board swapping, since it is something we
> changed in the compiler.
>
>> Booting kernel...
>> Stored message buffer up to system crash:
>
> You may have to boot with -vs.
Never heard of that; is that "verbose"? "-vs" is not documented, only
"-vm" is. If that is single-user mode, that will not help any thing,
because the OS crashes way before init is started. It even crashes way
before any disk device is discovered.
>
>> Bad News: pr == 0x144800c0b10c003d
>> Bad News: Cannot use the Kernel Stack when interrupted on the ICS.
>> Bad News: Predicate set: 0x144800c0b10c003d.
>> panic: Bad News!
>> Stack Trace:
>> 0xe000000001dbafb0 bad_news+0x3b0
>> 0xe000000001dc1780 bubbledown+0x0
>> 0xe00000000163cca0 hcd_isr+0x40
>> 0xe000000000ea4fa0 sapic_interrupt+0x60
>> 0xe000000000ea4060 external_interrupt+0x4b0
>> 0xe000000001dc1780 bubbledown+0x0
>> 0xe0000000012d6380 cec_cfg_in8+0xa0
>
> There may be something they can get from the trace.
>
>> ERROR: Your system crashed before I/O and dump configuration was complete.
>> This system does not support a crash dump under these circumstances.
>> Contact your HP support representative for assistance.
>
> Here you really really need to get a dump. Or have a maintenance panel to
> look at memory and registers.
The typical customer is unable to get a dump, and I don't have that
fancy machinery.
However if I have the kernel binary and that dump, I could write a
parser that puts the register values into the disassembly of the
kernel. And with the source at hand, one could easily find the line of
code that causes the problem.
As a matter of fact, 24 hours after having reported this critical
problem and having supplied a lots of information to HP, I did not get
any better proposal than: Restore the previous state of the system,
upgrade the OS again, and see whether the problem happens again.
I might do that forMS- Windows if I don't have any support, but the case
is different here.
Regards,
Ulrich
During kernel build (kconfig -i) there is a message about an unresolved
dependency of usbd for mass_storage. This problem however does not
prevent the tool from installing the kernel for next boot.
I guess that the kernel has an unresolved reference in it which will
cause a NULL-pointer reference when the first USB mass storage device is
detected. Unfortunately pulling out the DVD drive did not help.
Probably during update-ux the problem of the bad kernel was not handled
properly, so we have this bad situation.
Regards,
Ulrich
> Hi there!
>
> After having upgraded from OE 2008/03 to 2009/03 on an rx3600, the
> kernel bootet the first time after upgrade was not the expected one
> (it lacked the random module, among other things, causing some
> software to fail startup). When the correct kernel is booted, the
> kernel crashes early during device discovery (at the USB stuff) with
> a Data TLB Fault when there is no kernel stack (my interpretation).
The reason the system did not boot the new kernel after the update is
that something went wrong during the kernel build. The lack of several
expected modules is the key clue. Booting a misconfigured kernel is
almost always destined to end badly -- which you've experienced here.
The cause of the update failure can't be diagnosed from the
information you provided in your message, but the Response Center can
help you diagnose it from the update logs on the system.
>
> The actual words are:
> [...]
> btlan_load() Loaded Successfully
> 0 sba
> 0/0 lba
> 0/0/1/0 rmp3f01
> 0/0/1/1 rmp3f01
> 0/0/1/2 asio0
> 0/0/2/0 hcd
> Bad News: pr == 0x144800c0b10c003d
> Bad News: Cannot use the Kernel Stack when interrupted on the ICS.
> Bad News: Predicate set: 0x144800c0b10c003d.
This message doesn't mean there's no kernel stack, just that the fault
occurred on the ICS so the kernel couldn't switch back to the kernel
stack.
>
> ******************************************************************************
> ******************************************************************************
>
> reg_dump(): Displaying register values (in hex) from the save state
> at ssp e0000001_002ff800 return_status/reason/flags
> 0000/0008/00000001
>
> Interruption type: Data TLB Fault
> [...]
>
> Most interestingly a kernel that runs fine on an rx6600 also crashes
> at the very same spot on that machine. The machine has the latest
> firmware installed. The guy at HP said it's not a hardware problem,
> and the software guy said without a crashdump it's not possible to
> analyze the problem. However I had sent HP the full console output,
> including the messages, the register dump and the backtrace with
> program counters. I could also send them the (big) kernel. Now if
> someone says one couldn't analyze the problem with those data, that
> one has little knowledge on the subject. After all, it's not my
> fault that the combination of HP hardware and HP software crashes.
>
> Also note that the kernel on the recovery DVD (March 2009) crashes
> at the same spot.
Well that's more than a little troubling. I don't know enough about
creation and use of the recovery DVD to help you troubleshoot it but
opening a call with the Response Center will get your problem directed
to the right people.
>
> The machine has roughly all the critical patches that HP
> recommended. For your amusement, I'll display a full "boot & crash"
> cycle here (If you have never seen a Itanium2 register dump):
>
> [ deleted because I've seen more of these than I care to as part of
> my day job. :-) ]
>
> Enjoy!
> Regards,
> Ulrich
--
Carl E. Davidson (carl.d...@hp.com)
Hewlett-Packard Company, Cupertino, CA
You can't please all of the people any of the time.
Has it gotten to the Expert Center yet?
> Yes, but the only ones that can with closed source software is HP, and
> if they are not willing to look at their buggy code
As I said, they need to read the message at the bottom about it can't
create a dump and to contact themselves. ;-)
> Never heard of that; is that "verbose"? "-vs" is not documented, only
> "-vm" is.
I think it's just -v. It provides progress messages in the kernel.
>>> ERROR: Your system crashed before I/O and dump configuration was complete.
>>> This system does not support a crash dump under these circumstances.
>>> Contact your HP support representative for assistance.
> The typical customer is unable to get a dump, and I don't have that
> fancy machinery.
Right. It says that HP is suppose to be able to magically help. :-)
> However if I have the kernel binary and that dump, I could write a
> parser that puts the register values into the disassembly of the
> kernel. And with the source at hand, one could easily find the line of
> code that causes the problem.
It may not be that simple. I.e. you have to look at data structures and
figure out how they got that way. And the fact that you have to unwind
the stack.
> Restore the previous state of the system,
> upgrade the OS again, and see whether the problem happens again.
> Ulrich
This is elementary binary board swapping. And lots easier to try to do
than analyze your bad kernel with that that unsat you mention later.
I hope Carl has pointed you in the right direction.
> After having upgraded from OE 2008/03 to 2009/03 on an rx3600, the
> kernel bootet the first time after upgrade was not the expected one (it
> lacked the random module, among other things, causing some software to
> fail startup). When the correct kernel is booted, the kernel crashes
> early during device discovery (at the USB stuff) with a Data TLB Fault
> when there is no kernel stack (my interpretation).
I noticed a very similar behavior, although my boxes rx8620 and rx8640
only complained about an unsupported kernel and advised me to reboot
a supported kernel. I did only see the one message, no TLB Fault like
you.
After finishing the boot I also noticed that rng depended systems like
WBEM, iCAP or Serviceguard did not run correctly.
Simply trying like advised, I rebooted and the kernel "magically" went
supported again, rng was loaded, ServiceGuard worked again, just WBEM forgot
about nearly all registered providers and the complete configuration.
I suspect something went wrong at installation time and it was "corrected"
while the swmodify after the reboot was running.
I did this upgrade several times and noticed this bug only on (some) physical
machines like rx2660, rx8620 and rx8640. It does not happen, when a vPar or
HP-VM is upgraded and I also don't see it when I upgrade from 11.23 to 11.31
(with update-ux).
> Most interestingly a kernel that runs fine on an rx6600 also crashes at
> the very same spot on that machine. The machine has the latest firmware
> installed. The guy at HP said it's not a hardware problem, and the
> software guy said without a crashdump it's not possible to analyze the
> problem. However I had sent HP the full console output, including the
> messages, the register dump and the backtrace with program counters. I
> could also send them the (big) kernel. Now if someone says one couldn't
> analyze the problem with those data, that one has little knowledge on
> the subject. After all, it's not my fault that the combination of HP
> hardware and HP software crashes.
I had the same experience with HP here. They couldn't tell me anything
and asked me to tell them the exact console output (which I unfortunately
did not capture) otherwise they can't do anything, even if it happened
multiple times to me.
My next planned update of a physical host is in a few weeks, so I can only
wait for the update to crash again.
So if anybody of HP reads this, please connect both cases and at least describe
a workaround! Until know it was just inconvenient for me to reinstall the
whole WBEM stuff and reconfigure RSP, but when I read this posting, I'm a
little afraid to do the next update...
Regards,
Armin
This might be a known problem with IA updates to March 2009 OE ...
In rare cases, when doing an update from September 2008 OE to March 2009 OE,
it is possible upon first reboot after running update-ux for the pre-update
version of vmunix to be used rather than the vmunix created during the
update.
This can result in a system panic or other problems during the reboot after
the update.
This has been occasionally observed on IPF systems and mostly when updating
using the tui or gui interface of update-ux (update-ux -i).
To recover and get the update to complete properly, stop the system at the
bootloader (HPUX> prompt) and type:
boot /stand/current/vmunix
The system should then boot the the correct, new matched set of vmunix and
dlkm's (dynamically loaded kernel modules) created by the update, and it
should boot up normally then and complete the update operation successfully.
This problem has been fixed for the upcoming September 2009 OE. For
reference, the HP CR number is QXCR1000894737.
> Ulrich Windl wrote:
>> Yes: Although it's not official here, the audience is partly more
>> competent than the similarly named center.
>
> Has it gotten to the Expert Center yet?
Yes, and while the problem was reported as "critical", the Level 2 is
waiting for Level 3 for about two weeks now, without even having go any
reasonable question fro L2/L3. Last week I was asking to change the L3,
but still no news since that (other than that L2 is waiting for L3).
I wonder who does kernel programming at HP...
>
>> Yes, but the only ones that can with closed source software is HP, and
>> if they are not willing to look at their buggy code
>
> As I said, they need to read the message at the bottom about it can't create a
> dump and to contact themselves. ;-)
>
>> Never heard of that; is that "verbose"? "-vs" is not documented, only
>> "-vm" is.
>
> I think it's just -v. It provides progress messages in the kernel.
OK!
>
>>>> ERROR: Your system crashed before I/O and dump configuration was complete.
>>>> This system does not support a crash dump under these circumstances.
>>>> Contact your HP support representative for assistance.
>
>> The typical customer is unable to get a dump, and I don't have that
>> fancy machinery.
>
> Right. It says that HP is suppose to be able to magically help. :-)
>
>> However if I have the kernel binary and that dump, I could write a
>> parser that puts the register values into the disassembly of the
>> kernel. And with the source at hand, one could easily find the line of
>> code that causes the problem.
>
> It may not be that simple. I.e. you have to look at data structures and
> figure out how they got that way. And the fact that you have to unwind the
> stack.
>
>> Restore the previous state of the system,
>> upgrade the OS again, and see whether the problem happens again.
>> Ulrich
>
> This is elementary binary board swapping. And lots easier to try to do than
> analyze your bad kernel with that that unsat you mention later.
Yes, but why are we paying for this type of support. The same suggestion
you get for free for Microsoft Windows!
>
> I hope Carl has pointed you in the right direction.
I feel this is a good example to demonstrate whete HP support (and
software quality) went to. The bosses are already asking "Why do you
buy HP?"
Regards,
Ulrich
> Ulrich Windl wrote:
>
>> After having upgraded from OE 2008/03 to 2009/03 on an rx3600, the
>> kernel bootet the first time after upgrade was not the expected one (it
>> lacked the random module, among other things, causing some software to
>> fail startup). When the correct kernel is booted, the kernel crashes
>> early during device discovery (at the USB stuff) with a Data TLB Fault
>> when there is no kernel stack (my interpretation).
>
> I noticed a very similar behavior, although my boxes rx8620 and rx8640
> only complained about an unsupported kernel and advised me to reboot
> a supported kernel. I did only see the one message, no TLB Fault like
> you.
OK, now if you examine the situation, you will find that your kernel in
/stand will be the one before upgrade, so effectively you are missing
all the updates in the kernel. Good that the rest works for you. Now if
you build a new kernel, trying to boot it, it's quite likely that you
end up where I am already. So have a good backup kernel ready!
>
> After finishing the boot I also noticed that rng depended systems like
> WBEM, iCAP or Serviceguard did not run correctly.
Yes, That's what I had. So you need a consistent kernel, then do a
"swconfig" to fix those.
> Simply trying like advised, I rebooted and the kernel "magically" went
> supported again, rng was loaded, ServiceGuard worked again, just WBEM forgot
> about nearly all registered providers and the complete configuration.
See above.
>
> I suspect something went wrong at installation time and it was "corrected"
> while the swmodify after the reboot was running.
>
> I did this upgrade several times and noticed this bug only on (some) physical
> machines like rx2660, rx8620 and rx8640. It does not happen, when a vPar or
> HP-VM is upgraded and I also don't see it when I upgrade from 11.23 to 11.31
> (with update-ux).
Good.
[...]
> This might be a known problem with IA updates to March 2009 OE ...
Hi Gary!
I wonder why you seem to know more about that issue than my L3
"Expert". Are you level 999 ? ;-)
>
> In rare cases, when doing an update from September 2008 OE to March 2009 OE,
> it is possible upon first reboot after running update-ux for the pre-update
> version of vmunix to be used rather than the vmunix created during the
> update.
>
> This can result in a system panic or other problems during the reboot after
> the update.
>
> This has been occasionally observed on IPF systems and mostly when updating
> using the tui or gui interface of update-ux (update-ux -i).
Actually I used update-ux -i, but believe me or not: None of those
L1/L2/L3 "engineers" ever asked for the update logs since I had reported
the problem.
>
> To recover and get the update to complete properly, stop the system at the
> bootloader (HPUX> prompt) and type:
>
> boot /stand/current/vmunix
Well I'd wish I knew that earlier.
>
> The system should then boot the the correct, new matched set of vmunix and
> dlkm's (dynamically loaded kernel modules) created by the update, and it
> should boot up normally then and complete the update operation
> successfully.
if that "current" kernel configuration is still there. Unfortunately my
attempts to recover did destroy it. At the moment I restarted with the
old system file, editing it until "mk_kernel" was happy (mk_kernel seems
to be the only program that actually links a new kernel).
>
> This problem has been fixed for the upcoming September 2009 OE. For
> reference, the HP CR number is QXCR1000894737.
Thanks a lot for this information!
Regards,
Ulrich
Ulrich
I'm usually quite satisfied with HP software support, but this time I agree
with you. HP should improve their communication. It's ridiculous to pay a support
contract, receive nothing for weeks, stumble upon some other persons post by
accident and get an answer next day for free.
Armin
An update: Having installed Base-OE September/2009 still exhibited the
bug! (Un-o?)fortunately one of our two rx3600 (the older one) did not
show the defect, while the other one did. The only solution was to
install an unofficial "intrumentation" patch from HP (PHKL_40226). That
patch prints a few lines of text every (USB?) interrupt on the 9600 baud
serial console, meaning that even after 30 minutes the kernel hasn't
finished device initialization. You'll have to use adb to modify the
kernel to stop printing those. It's an interesting execise how to do
that when your kernel doesn't boot).
OK, summary: With the patch mentioned above the system runs fine;
without it it doesn't.
It's completely incomprehesible why HP did not fix the problem if they
new about it roughly four months before the next release of their
software.
At the time when I installed, there was no patch in the database to fix
that problem. (HP had the confirmation at least six weeks before that
the instrumented kaernel did not crash).
Regards,
Ulrich