In <
9f01d198-d807-4ff8...@googlegroups.com>, on 07/26/22
> OK... trap file uploaded....
Thanks. We got lucky. The traps is at the same location
TRAP SCREEN INFORMATION
OS/2 Kernel Revision 14.203_SMP
Exception in module: SOFFICE
TRAP 000e ERRCD=0000 ERACC=**** ERLIM=******** CPU=01
EAX=f36f3914 EBX=f9c46996 ECX=f36f3914 EDX=00000000 ESI=00000000
EDI=f36f3914 EBP=000052b8 FLG=00010286
CS:EIP=0168:00000000 CSACC=c09b CSLIM=ffffffff
SS:ESP=1530:00005244 SSACC=1097 SSLIM=0000449f
DS=0160 DSACC=c093 DSLIM=ffffffff CR0=8001003b
ES=0160 ESACC=c093 ESLIM=ffffffff CR2=00000000
FS=0000 FSACC=**** FSLIM=********
GS=0000 GSACC=**** GSLIM=********
No Symbols Found
but it triggered on CPU 1, so dumping the ring0 stack we get
Location Address Symbol Description
-------- ------- ------ -----------
%f8a9ad2c 0108:8020 PDD Data Segment for ___HLP$
%f8a9ad32 1000:36bc _tkPanicMsgLen + 104
%f8a9ad38 %fff0f667 _FormatRegisters
%f8a9ad4c %fff0f667 _FormatRegisters
%f8a9ad52 0100:ea60 PDD Code Segment for ___HLP$
%f8a9ad58 %fff0e3f7 KernelFaultEntry + 183
%f8a9ada0 %00010285 SIEGE _array_pop + 1
%f8a9adf2 0400:0000 _PSDStack
%f8a9ae1c %f6105bf0 SOCKETS recvit + 19a
%f8a9ae3e %0001e564 SIEGE _load_conf + e0b
%f8a9ae48 %f6103440 SOCKETS soo_winsock_select + a49
%f8a9ae4e %000103ca SIEGE _array_print + 156
%f8a9aea0 %f6105950 SOCKETS recvmsg + 87
%f8a9aece %000103ca SIEGE _array_print + 156
%f8a9aee8 %f610ff40 SOCKETS GenIOCtrl32 + dc
%f8a9af00 %fff115cd _MPR0SubSysEnter + d
%f8a9af08 %f610fa27 SOCKETS KEECallgateEntry + 42
%f8a9af22 0b20:fe38 PDD Code Segment for UNICODE$
Stack Frame Address f8a9af44 reached
There is no IPI, which simplifies analysis.
Looking at the last call on the stack, we find
# %lnc recvit + 19a
Looking for call instruction related to recvit + 19a
SOCKETS soreceive:
%f610c951 55 push ebp
ln %f610c951
%f610c951 SOCKETS soreceive
This tells me I need to figure out where and how soreceive got lost.
BTW, I thought I had explained the SOFFICE reference, but I don't see
anything in mail archives. The trap address is reported to the kernel as
168:00000000 because something "jumped to zero." When the kernel attempts
to convert this to a module name it determines that 168 is a FLAT
selector, so it searches the 32-bit objects of the loaded modules for one
that includes offset 00000000 and since every .exe that contains 32-bit
code includes an object that contains offset 00000000 it reports the first
one it finds. The output is not technically correct, but no one noticed
it when Scott added the code. Other than looking odd, the output is not a
problem in practice.
The .p# command reports the correct module:
# .p#
Slot Pid Ppid Csid Ord Sta Pri pTSD pPTDA pTCB Disp SG
Name *009a# 0059 0054 0059 0014 run 0300 f8a9a000 fe3c83f8 fe380a80 0d88
14 SIEGE
and the r command p field shows us the CPU the thread was running on
# r
eax=00000050 ebx=00004f7b ecx=00000000 edx=0520ec64 esi=0520ed64
edi=0520ec90 eip=1e130027 esp=0520ec50 ebp=0520ec78 iopl=0 -- -- -- nv up
ei ng nz na po cy cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000
cr2=00000000 cr3=00211000 p=00 005b:1e130027 c3 retd
The p values are off by one, so this is really CPU 1. This is a pmdf nit.