meaning of CheckSum: 0x02 * * Invalid * * in hw output ?

Terry Paterson

unread,

Dec 18, 2002, 11:52:09 AM12/18/02

to

Hi There,

I have a server in the field which is occassionally panicing, with a
k_trap (..00E )message.

we are trying to have the hardware checked out - running diagnostics
on the RAM, etc ..

however - whilst looking around I spotted the following in the output
from the hw command can anyone tell me what it means ? and does anyone
have any thoughts as to what might be causing the panics ? there
does'nt seem to be any obvious pattern - sometimes there is nobody
logged-in, sometimes 1-2 hundred users, it has an Intel PRO100/B NIC
which I seem to remember has had quite a few driver updates - so I
might check that one out too.

<snip>

Adapter BIOS ROM
Address: 0xc8000 - 0xc87ff
Size: 2Kb
CheckSum: 0x00 (As expected)

Adapter BIOS ROM
Address: 0xc8800 - 0xcc7ff
Size: 16Kb
CheckSum: 0x02 * * Invalid * *

System BIOS ROM
Address: 0xe0000 - 0xfffff
Size: 128Kb
BIOS Date: 05/17/01
BIOS Category: IBM PC/XT-286
BIOS ID: PhoenixBIOS 4.06.43 RK

<snip>

output from hwconfig :

%kernel - - - rel=3.2v5.0.6 kid=2000-07-27
%cpu - - - unit=1 family=6 type=gt PentIII
%cpuid - - - unit=1 vend=GenuineIntel
tfms=0:6:10:4(3)
%fpu - 13 - unit=1 type=80387-compatible
%pci 0x0CF8-0x0CFF - - am=1 sc=0 buses=8
%PnP - - - nodes=0
%clock - - - type=TSC/700036964Hz
%serial 0x03F8-0x03FF 4 - unit=0 type=Standard nports=1
fifo=yes
%serial 0x02F8-0x02FF 3 - unit=1 type=Standard nports=1
fifo=yes
%console - - - unit=vga type=0 num=12 scoansi=1
scroll=50
%adapter 0x5800-0x58FF 5 - type=alad ha=0 bus=0 id=7 fts=sto
%floppy 0x03F2-0x03F7 6 2 unit=0 type=135ds18
%adapter 0x01F0-0x01F7 14 - type=IDE ctlr=primary dvr=wd
%adapter - 5 - type=hpnraid unit=0
%hptx0 0x1800-0x181F 16 - type=HP 10/100 00:30:6e:02:10:35
%OlyTune - - - Pacific CodeWorks, Inc. System TuneUp
%OlyTune - - - Olympus TuneUp Release 6.0.0N
%eeE0 0x5C00-0x5C1F 17 - type=EE PRO/100+ 00:d0:b7:72:1a:8d
%cd-rom - - - type=IDE ctlr=pri cfg=mst
dvr=Srom->wd
%tape - - - type=S ha=0 id=2 lun=0 bus=0 ht=alad
%disk - - - type=S ha=0 id=0 lun=0 bus=0
ht=hpnraid
%Sdsk - - - cyls=8850 hds=255 secs=63 fts=sdb
%cpu - 255 - unit=2 family=6 type=gt PentIII
%cpuid - - - unit=2 vend=GenuineIntel
tfms=0:6:10:4(3)
%fpu - - - unit=2 type=80387-compatible
%cpu - 255 - unit=3 family=6 type=gt PentIII
%cpuid - - - unit=3 vend=GenuineIntel
tfms=0:6:10:4(3)
%fpu - - - unit=3 type=80387-compatible
%cpu - 255 - unit=4 family=6 type=gt PentIII
%cpuid - - - unit=4 vend=GenuineIntel
tfms=0:6:10:4(3)
%fpu - - - unit=4 type=80387-compatible

<snip>
Hewlett-Packard Network Drivers (ver 1.4.10.3)
SCO OpenServer Enterprise System (ver 5.0.6j)
SCO SendMail (ver 8.11.0)
SCO Symmetrical Multiprocessing (ver 1.1.1Ga)
RS506A: Release Supplement for SCO OpenServer Release 5.0.6
(ver rs506a)
<snip>

Terry Paterson

unread,

Dec 18, 2002, 11:52:09 AM12/18/02

to

Bela Lubkin

unread,

Dec 19, 2002, 5:28:15 PM12/19/02

to sco...@xenitec.on.ca

Terry Paterson wrote:

> I have a server in the field which is occassionally panicing, with a
> k_trap (..00E )message.
>
> we are trying to have the hardware checked out - running diagnostics
> on the RAM, etc ..
>
> however - whilst looking around I spotted the following in the output
> from the hw command can anyone tell me what it means ? and does anyone
> have any thoughts as to what might be causing the panics ? there
> does'nt seem to be any obvious pattern - sometimes there is nobody
> logged-in, sometimes 1-2 hundred users, it has an Intel PRO100/B NIC
> which I seem to remember has had quite a few driver updates - so I
> might check that one out too.

My car doesn't work, do you have any ideas on why? Not enough
information.

Panics often (not always) have an informational message before the
k_trap message; you should include that. More importantly, there is
always a way to get information about _where_ in the kernel the panic
happened. If a dump is written to disk, you can get kernel stack
tracebacks with any of `sysdump`, `scodb` or `crash`. If not, you can
link scodb into the kernel and get an interactive traceback during a
panic.

> Adapter BIOS ROM
> Address: 0xc8000 - 0xc87ff
> Size: 2Kb
> CheckSum: 0x00 (As expected)
>
> Adapter BIOS ROM
> Address: 0xc8800 - 0xcc7ff
> Size: 16Kb
> CheckSum: 0x02 * * Invalid * *
>
> System BIOS ROM
> Address: 0xe0000 - 0xfffff
> Size: 128Kb
> BIOS Date: 05/17/01
> BIOS Category: IBM PC/XT-286
> BIOS ID: PhoenixBIOS 4.06.43 RK

OpenServer makes very little use of BIOS. `hw` presents BIOS
information so that you'll know more about your machine, not because it
is terribly important to OSR5. For instance, the "BIOS ID" above tells
you whose motherboard BIOS you have, which sometimes helps trace down
bugs (especially boot bugs).

There is a standard for BIOS ROMs which includes a method of
checksumming them to verify their integrity. Whatever adapter ROM that
is, its checksum is bad -- the ROM is either a defective unit, or the
vendor did not follow the protocol for proper checksumming. I don't
know how common it is for BIOS ROMs to ignore the checksumming protocol
(I've never noticed one like that before, but I wasn't looking).

If you run `hw -v -r rom`, it will search each ROM for text strings.
This will help you figure out which device's ROM has the bad checksum.

But it's a red herring. The BIOS checksum is just an aside, like if my
mechanic told me "yeah, it won't start, and I'm working on that; by the
way, did you notice that the left rear running light is out?". Not the
cause of the problem.

Go get us some good information about the panic.

>Bela<

Terry Paterson

unread,

Dec 20, 2002, 8:14:01 AM12/20/02

to

Bela Lubkin <be...@caldera.com> wrote in message news:<2002121914...@mammoth.ca.caldera.com>...

Apologies for not including this previously :

Unexpected trap in kernel mode:
cr0 0x8001003B cr2 0x00000040 cr3 0x00002000 tlb 0x00000000
ss 0x0000C9D0 uesp 0x00000000 efl 0x000C0283 ipl 0x00000000
cs 0x00000158 eip 0xF00E11D9 err 0x00000000 trap 0x0000000E
eax 0x00000000 ecx 0x00000020 edx 0xF0843C00 ebx 0xF0876000
esp 0xE000007C ebp 0xE0000DC0 esi 0x00000001 edi 0xF0885500
ds 0x00000160 es 0x00000160 fs 0x00000000 gs 0x00000000
cpu 0x00000001

PANIC: k_trap - Kernel mode trap type 0x0000000E
Cannot dump 196493 pages to dumpdev hd (1/41) : space for only 131072 pages
Dump not completed
Warning: aac system coming down.

Bela Lubkin

unread,

Dec 20, 2002, 9:12:13 AM12/20/02

to sco...@xenitec.on.ca

Terry Paterson wrote:

> > > I have a server in the field which is occassionally panicing, with a
> > > k_trap (..00E )message.
> > >
> > > we are trying to have the hardware checked out - running diagnostics
> > > on the RAM, etc ..

> Unexpected trap in kernel mode:

> cr0 0x8001003B cr2 0x00000040 cr3 0x00002000 tlb 0x00000000
> ss 0x0000C9D0 uesp 0x00000000 efl 0x000C0283 ipl 0x00000000
> cs 0x00000158 eip 0xF00E11D9 err 0x00000000 trap 0x0000000E
> eax 0x00000000 ecx 0x00000020 edx 0xF0843C00 ebx 0xF0876000
> esp 0xE000007C ebp 0xE0000DC0 esi 0x00000001 edi 0xF0885500
> ds 0x00000160 es 0x00000160 fs 0x00000000 gs 0x00000000
> cpu 0x00000001
>
> PANIC: k_trap - Kernel mode trap type 0x0000000E
> Cannot dump 196493 pages to dumpdev hd (1/41) : space for only 131072 pages
> Dump not completed
> Warning: aac system coming down.

Well, that's a little bit of info. One bit is that you have 768MB of
RAM but only 512MB of swap. That isn't the cause of the panic, but it
prevents a panic dump from being written to /dev/swap, and that makes
further debugging difficult.

You got a trap E at kernel code address 0xF00E11D9 (eip), due to a
reference to address 0x00000040 (cr2). Now run:

# echo ts F00E11D9 | crash
# echo dis F00E11D9-20 20 | crash

and post the results.

>Bela<

Terry Paterson

unread,

Dec 23, 2002, 4:40:37 AM12/23/02

to

Bela Lubkin <be...@caldera.com> wrote in message news:<2002122006...@mammoth.ca.caldera.com>...

dumpfile = /dev/mem, namelist = /unix, outfile = stdout
ikntrput + 0x12d

# echo dis F00E11D9-20 20 | crash

dumpfile = /dev/mem, namelist = /unix, outfile = stdout
ikntrput+0x10d incl %ebp
ikntrput+0x10e cld
ikntrput+0x10f movl 0x4(%eax),%edx
ikntrput+0x112 orl $0x8,%edx
ikntrput+0x118 movl %edx,0x4(%eax)
ikntrput+0x11b pushl %eax
ikntrput+0x11c call 0x0c266b <0xf01a3838> [wakeup]
ikntrput+0x121 addl $0x8,%esp
ikntrput+0x124 jmp 0x017a <0xf00e134f> [ikntrput+0x2a3]
ikntrput+0x129 xchgl %eax,%eax
ikntrput+0x12a movl 0xf8(%ebp),%eax
ikntrput+0x12d data16
ikntrput+0x12e movw 0x40(%eax),%ax
ikntrput+0x131 data16
ikntrput+0x132 testw $0x100,%ax
ikntrput+0x135 je 0x0145 <0xf00e132c> [ikntrput+0x280]
ikntrput+0x13b movl $0x1,0xec(%ebp)
ikntrput+0x142 testl %esi,%esi
ikntrput+0x144 je 0x3d <0xf00e122f> [ikntrput+0x183]
ikntrput+0x146 cmpl $0x1,%esi

iknt ? in-kernel network terminal ??? something to do with telnet
sessions ? pseudo ttys ?

Bela Lubkin

unread,

Dec 23, 2002, 6:18:52 AM12/23/02

to sco...@xenitec.on.ca

Terry Paterson wrote:

> > You got a trap E at kernel code address 0xF00E11D9 (eip), due to a
> > reference to address 0x00000040 (cr2). Now run:

> # echo ts F00E11D9 | crash

iknt does character I/O on behalf of rlogind and/or telnetd, avoiding
the need to wake up a user process for every character of I/O. The
panic you're seeing has been seen before; I will see if a fix is
available.

>Bela<