Re: ICAT 'XCPT_BAD_ACCESS' - what does it mean?

16 views
Skip to first unread message

Lars Erdmann

unread,
Dec 27, 2017, 7:53:29 AM12/27/17
to
I would not worry about it too much.
ICAT is not a very stable tool which also might be due to uncoordinated
changes in the debug kernel.
Unfortunately, "XCPT_BAD_ACCESS" is not listed as exception in the
Control Programming guide, therefore it is not exactly clear what it means.
It's possible that code or data needs to be paged in.
In that case, you could also try the ".i" KDB instruction in the
passthru window. Look at the mixed source/assembly output and pass a
linear address to ".i" somewhere into the code you intend to run.
".i" will always load full 4k pages aligned to 4k addresses.

Lars


On 27.12.17 11.30, Andi B. wrote:
> I'm looking for a problem in xwlan widget init code and while remotely
> debugging with ICAT I came across XCPT_BAD_ACCESS. This is with a strcpy
> operation which I do not see any coding error. strcpy works as expected,
> pointers seem to be valid and code works as (I) expect. Anyway ICAT pops
> up this exception. ICAT let me run the exception handler and all is
> fine. strcpy operation fills the memory as expected.
>
> Now I ask myself if this is really a problem with the code or this is
> the way it works (and should work, uncommited memory?). I slowly get the
> feeling I'm chasing the wrong track.
>
> Call stack shows _validate_ptr, _get_stack_trace in module memport,
> _add_item 1a3, _int_uheap_verify in module rdbg and _chk_if_heap in
> module dbgstr. Picture attached but in case attaching attaching does not
> work...
>
> Can someone confirm this is well expected and nothing I should take care
> any further? Otherwise if this is a real problem in the code I'll show
> the details of course.
>
> Regards, Andreas
>
>


Lars Erdmann

unread,
Dec 27, 2017, 7:55:18 AM12/27/17
to
... or you need to use ".i" to page in the data that is accessed by the
strcpy routine.

Lars

Steven Levine

unread,
Dec 27, 2017, 3:23:21 PM12/27/17
to
On Wed, 27 Dec 2017 10:30:38 UTC, "Andi B." <and...@gmx.net> wrote:

Hi Andi,

> I came across XCPT_BAD_ACCESS. This is with a strcpy operation which I do not see any
> coding error. strcpy works as expected, pointers seem to be valid and code works as (I)
> expect. Anyway ICAT pops up this exception. ICAT let me run the exception handler and all
> is fine. strcpy operation fills the memory as expected.

There may be no error here. The reference to XCPT_BAD_ACCESS is in
gam5lde.msg:

Message: PMD0210
XCPT_BAD_ACCESS

I suspect that the developers got lazy when they created the .msg
file. Since you mention uncommitted memory, I am going to guess that
the actual exception was

#define XCPT_ACCESS_VIOLATION 0xC0000005

with ExceptionInfo[ 0 ] set to one of:

/* ExceptionInfo[ 0 ] - Access Code: XCPT_READ_ACCESS
XCPT_WRITE_ACCESS
XCPT_SPACE_ACCESS
XCPT_LIMIT_ACCESS
XCPT_UNKNOWN_ACCESS */

To verify this you need to look at the content of the
ExceptionReportRecord.

> Now I ask myself if this is really a problem with the code or this is the way it works
> (and should work, uncommited memory?). I slowly get the feeling I'm chasing the wrong track.

It could be you are chasing a red herring. The labels shown in that
stack trace are all part of the VAC 3.65 runtime, unless I am
misinterpreting something.

What is the widget doing when this exception occurs? What is the
strcpy destination? An uncommitted stack guard page? Is your process
running when the exception occurs?

Some of this is just how the kernel debugger works? It appears you
have the debugger configured to capture/report ring3 exceptions.

Steven


--
---------------------------------------------------------------------
Steven Levine <ste...@earthlink.bogus.net>
DIY/Warp/BlueLion etc. www.scoug.com www.arcanoae.com www.warpcave.com
---------------------------------------------------------------------

Andi B.

unread,
Dec 29, 2017, 4:46:27 AM12/29/17
to
Hi,
thanks both of you.

Steven Levine schrieb:
> On Wed, 27 Dec 2017 10:30:38 UTC, "Andi B." <and...@gmx.net> wrote:
>
> Hi Andi,
>
>> I came across XCPT_BAD_ACCESS. This is with a strcpy operation which I do not see any
>> coding error. strcpy works as expected, pointers seem to be valid and code works as (I)
>> expect. Anyway ICAT pops up this exception. ICAT let me run the exception handler and all
>> is fine. strcpy operation fills the memory as expected.
>
> There may be no error here. The reference to XCPT_BAD_ACCESS is in
> gam5lde.msg:
>
> Message: PMD0210
> XCPT_BAD_ACCESS
>
> I suspect that the developers got lazy when they created the .msg
> file. Since you mention uncommitted memory, I am going to guess that
> the actual exception was
>
> #define XCPT_ACCESS_VIOLATION 0xC0000005
>
> with ExceptionInfo[ 0 ] set to one of:
>
> /* ExceptionInfo[ 0 ] - Access Code: XCPT_READ_ACCESS
> XCPT_WRITE_ACCESS
> XCPT_SPACE_ACCESS
> XCPT_LIMIT_ACCESS
> XCPT_UNKNOWN_ACCESS */
>
> To verify this you need to look at the content of the
> ExceptionReportRecord.

Have to learn about.

>
>> Now I ask myself if this is really a problem with the code or this is the way it works
>> (and should work, uncommited memory?). I slowly get the feeling I'm chasing the wrong track.
>
> It could be you are chasing a red herring. The labels shown in that
> stack trace are all part of the VAC 3.65 runtime, unless I am
> misinterpreting something.

Yes. All in the runtime and I do not find a way back to the caller. Based on my very
limited knowledge in that area it seems to me the stack is trashed and so ICAT does not
find the way back to the calling code in my app. But as said guessing. I know that i know
nothing.

>
> What is the widget doing when this exception occurs? What is the
> strcpy destination? An uncommitted stack guard page? Is your process
> running when the exception occurs?

I can narrow down the problem to code like this -

typedef struct _DIM {
HMODULE hmod;
ULONG ulModuleId;
CHAR szModuleBaseName[32];
ULONG ulDriverCount;
PFNIDS pfnids;
<SNIP>
} DIM, *PDIM;

static PDIM padim = NULL;

padim = malloc( ulDimTableSize); // actually about 215 bytes in my case
if (!padim)
{
rc = ERROR_NOT_ENOUGH_MEMORY;
break;
}
memset( padim, 0xAA, ulDimTableSize);
strcpy( padim->szModuleBaseName, "TestStringAB_TEST");

The strcpy triggers the exception in ICAT.

I do not see why this should trash the stack so my above assumption is probably wrong.
Moreover this is old code in xwlan/wlanstat and works since many years. The good thing is
- while being there I found out that http://trac.netlabs.org/wpstk does no compile with
VAC anymore and needs attention too. So no problem finding new tasks.

>
> Some of this is just how the kernel debugger works? It appears you
> have the debugger configured to capture/report ring3 exceptions.

I've 'set CAT_KDB_INIT="vsf *"'. At the first sight I did not even find any reference to
vsf and vc except on your page. And not much info in ICAT files. If I ever would find the
time to learn more about these basics in debugging....

Maybe I should add exceptq to wlanstat an let your trap tool decode what's going wrong
then playing endless hours with ICAT and trying to decode myself. Moreover the starting
problem seems to be unrelated to what I'm looking here anyway. Maybe you want to have a
look at - http://trac.netlabs.org/xwlan/ticket/46 which is the reason why I'm started to
play ICAT.

Andi

Steven Levine

unread,
Dec 30, 2017, 1:05:16 PM12/30/17
to
On Fri, 29 Dec 2017 09:46:30 UTC, "Andi B." <and...@gmx.net> wrote:

HI Andi,

> > To verify this you need to look at the content of the
> > ExceptionReportRecord.

> Have to learn about.

The structures are well documented, with the exception of the FP
specific data. A pointer to the Exception Report Record is passed to
the handler. If you dump the data as dwords, it's readable if you
works out the field offsets.

> Yes. All in the runtime and I do not find a way back to the caller.

Most likely because some of the code is not using standard stack
frames. It's also possible the stack is corrupted. What you need to
do in this case is dump the stack as dwords and walk the stack by
hand.

> I can narrow down the problem to code like this -
> typedef struct _DIM {
> HMODULE hmod;
> ULONG ulModuleId;
> CHAR szModuleBaseName[32];
> ULONG ulDriverCount;
> PFNIDS pfnids;
> <SNIP>
> } DIM, *PDIM;
>
> static PDIM padim = NULL;
>
> padim = malloc( ulDimTableSize); // actually about 215 bytes in my case
> if (!padim)
> {
> rc = ERROR_NOT_ENOUGH_MEMORY;
> break;
> }
> memset( padim, 0xAA, ulDimTableSize);
> strcpy( padim->szModuleBaseName, "TestStringAB_TEST");
>
> The strcpy triggers the exception in ICAT.

Can I assume that this is the code near src\lib\drvapi\drvaccess.c:86?

FWIW, I've implemented xwlan fixes in the past so I am somewhat
familiar with the code.

The buffer size is defined by:

ulDimTableSize = ulModuleCount * sizeof( DIM);

Did you check ulModuleCount? If WtkLoadModules returns 0 modules, the
memset will succeed, but the strcpy will trap.

If that's not it, I would switch icat to assembly mode and step though
the strcpy code. When you get to the movs instruction, look at ESI
and EDI, the source and destination addresses respectively. ECX will
be the copy count.

> I do not see why this should trash the stack so my above assumption is probably wrong.

This looks much more like a heap issue, than a stack issue.

The good thing is
> - while being there I found out that http://trac.netlabs.org/wpstk does no compile with
> VAC anymore and needs attention too.
> So no problem finding new tasks.

:-)

> I've 'set CAT_KDB_INIT="vsf *"'. At the first sight I did not even find any reference to
> vsf and vc except on your page.

The V command is pretty much fully documented in the OS/2 Debugging
Handbook.

>And not much info in ICAT files.

I would not expect the ICAT docs to cover this in much detail, since
it is covered elsewhere. icatfaq.html does show how to use SET
CAT_KDB_INIT to do what is typically done in kdb.ini

What you want to use is:

CAT_KDB_INIT="vsf *;vce"

to let the kernel handle page faults normally.

>If I ever would find the
> time to learn more about these basics in debugging....

Necessity is the mother of invention, as they say. :-)

> Maybe I should add exceptq to wlanstat an let your trap tool decode what's going wrong
> then playing endless hours with ICAT and trying to decode myself.

It's likely to be better in the long run, especially if an issue comes
up on someone else's system.

>Moreover the starting
> problem seems to be unrelated to what I'm looking here anyway.

Yes, I tend to agree. If I knew EDI at the time of the trap, I would
probably know for sure. Since the code is continues normally after
the exception this is likely. You can see the registers if you do

r

in the PassThru window. This will also tell you exactly what the
kernel debugger thinks the trap is.

>Maybe you want to have a
> look at - http://trac.netlabs.org/xwlan/ticket/46 which is the reason why I'm started to
> play ICAT.

This one is definityly definitely stack corruption and exceptq will
help because if you have symbols installed, it will give you a name
for the EIP address.

Andi B.

unread,
Dec 31, 2017, 6:02:46 AM12/31/17
to
Steven Levine schrieb:
Yes.

>
> FWIW, I've implemented xwlan fixes in the past so I am somewhat
> familiar with the code.
>
> The buffer size is defined by:
>
> ulDimTableSize = ulModuleCount * sizeof( DIM);
>
> Did you check ulModuleCount? If WtkLoadModules returns 0 modules, the
> memset will succeed, but the strcpy will trap.

Yes. Moreover I added the strcpy above by myself to reassure. You may notice my comment
about the 215 bytes which malloc successfully allocated a few lines above in the code I
posted here (slightly changed to drvaccess). malloc allocates successfully, memset sets it
correctly, my added strcpy line triggers the exception but letting the exception handler
running strcpy worked as expected.

My code now is -

TraceAB("ulDimTableSize=%d\n", ulDimTableSize);
_interrupt(3); // ICAT stops here as expected
padim = malloc( ulDimTableSize); // ulDimTableSize is 216
if (!padim) // padim is 0x00494130 = valid
{
rc = ERROR_NOT_ENOUGH_MEMORY;
break;
}
memset( padim, 0xAA, ulDimTableSize);// padim including the string region is filled
correctly
TraceAB("padim=0x%08X\n", padim); // additional trace messages writes to file and com1
TraceAB("padim->szModuleBaseName=0x%08X\n", padim->szModuleBaseName);
strcpy( padim->szModuleBaseName, "TestStringAB_TEST"); // <--- this triggers exception
TraceAB("padim=0x%08X\n", padim);

I've uploaded the the passtru window content here '
https://www.pic-upload.de/view-34567254/icat_xwlan_trap.png.html ' as the newsgroup does
not allow attachments. This is when I tried to 'Step over' the strcpy line then in the
exception dialog 'Examine....' and the reading from passtru. Register monitor and call
stack window says the same.

The TraceAB function logs the printf style message to a file and in parallel sends it out
at com1. So in parallel to running ICAT a see the TraceAB messages in pmdf (or zoc) at the
same time on the host. Just to assure this is really this special strcpy which triggers
the problem (with or without running ICAT).

For completeness here what pmdf says when running the above code (to rule out ICAT)
including my TraceAB messages -

wlanDriverAccessInitialize
Symbols linked (genmac)
Symbols linked (genprism)
TrcMsgV len=48 (XWLAN: 0: Loading Driver Modules, count 2 )
WtkLoadModules done
ulDimTableSize=216
eax=000b0a6b ebx=00000000 ecx=00485020 edx=000003f8 esi=00000000 edi=00000000
eip=0003fe91 esp=000f7300 ebp=000f756c iopl=0 -- -- -- up ei pl nz ac pe nc
cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=414c5758 cr3=00225000 p=00
005b:0003fe91 cc int 3
##g
padim=0x00494130
padim->szModuleBaseName=0x00494138
Trap 13 (0DH) - General Protection Fault 0000
eax=61767264 ebx=61767264 ecx=61767264 edx=00000008 esi=00000000 edi=00000008
eip=0006427c esp=000f71d0 ebp=000f71ec iopl=0 rf -- -- nv up ei pl nz na po nc
cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=414c5758 cr3=00225000 p=00
005b:0006427c 8a19 mov bl,byte ptr [ecx] ds:61767264=invalid
##k
005b:000642ef 01010101 01010101 01010101 01010101 _get_stack_trace + 5f
005b:0005ff0e 00480000 004970e4 000f7244 000665c4 _int_uheap_verify + 6de
005b:0005fc1d 00480000 00000000 00081f3c 000001a9 _int_uheap_verify + 3ed
005b:00061fbb 00480000 00081f3c 000001a9 00000001 _chk_if_heap + bb
005b:0005a537 00494138 000f760c 00081f3c 000001a9 _debug_strcpy + 47
005b:73656363 00632e73 6d75645f 6e6f4370 7463656e
##

>
> If that's not it, I would switch icat to assembly mode and step though
> the strcpy code.

I've done that before and went down _chk_if_heap(dbgstr) / _int_uheap_verify(rdbg) /
_add_item 1a3(rdbg) / _get_stack_trace(memport) / _validate_ptr(memport). I then decided
this all is more a 'debug kernel' or debugging (ICAT - pmdf) problem/behavior than a real
application problem.

> When you get to the movs instruction, look at ESI
> and EDI, the source and destination addresses respectively. ECX will
> be the copy count.

IIRC these all worked fine (memory display proves the string is copied correctly) but
afterwards the _validate_ptr thinks there is something wrong (while I think it isn't).

We can go down this road again if you like. But I think we need realtime IRC chat in
parallel.

>
>> I do not see why this should trash the stack so my above assumption is probably wrong.
>
> This looks much more like a heap issue, than a stack issue.

ok.

>
> The good thing is
>> - while being there I found out that http://trac.netlabs.org/wpstk does no compile with
>> VAC anymore and needs attention too.
>> So no problem finding new tasks.
>
> :-)
>
>> I've 'set CAT_KDB_INIT="vsf *"'. At the first sight I did not even find any reference to
>> vsf and vc except on your page.
>
> The V command is pretty much fully documented in the OS/2 Debugging
> Handbook.
>
>>And not much info in ICAT files.
>
> I would not expect the ICAT docs to cover this in much detail, since
> it is covered elsewhere. icatfaq.html does show how to use SET
> CAT_KDB_INIT to do what is typically done in kdb.ini
>
> What you want to use is:
>
> CAT_KDB_INIT="vsf *;vce"
>
> to let the kernel handle page faults normally.

I found this on your page and set it that way. Although I still didn't read the debugging
handbook and don't really understand what the v* command does. But I still hope I can live
without knowing the deeper details ;-). Idebug has a list box with the various exceptions
to be selected. Something I didn't find in ICAT.

>
>>If I ever would find the
>> time to learn more about these basics in debugging....
>
> Necessity is the mother of invention, as they say. :-)
>
>> Maybe I should add exceptq to wlanstat an let your trap tool decode what's going wrong
>> then playing endless hours with ICAT and trying to decode myself.
>
> It's likely to be better in the long run, especially if an issue comes
> up on someone else's system.

Done although not yet tested. Problem is what I see here only happens with the debug
kernel. And with that I never can run wlanstat to the point where exceptq does its thing.
Running the same wlanstat app (without the int3) on the retail kernel runs without problems.

>
>>Moreover the starting
>> problem seems to be unrelated to what I'm looking here anyway.
>
> Yes, I tend to agree. If I knew EDI at the time of the trap, I would
> probably know for sure. Since the code is continues normally after
> the exception this is likely. You can see the registers if you do
>
> r
>
> in the PassThru window. This will also tell you exactly what the
> kernel debugger thinks the trap is.
>
>>Maybe you want to have a
>> look at - http://trac.netlabs.org/xwlan/ticket/46 which is the reason why I'm started to
>> play ICAT.
>
> This one is definityly definitely stack corruption and exceptq will
> help because if you have symbols installed, it will give you a name
> for the EIP address.

To my eyes this looks very similar to what I see here.

Andreas

>
> Steven
>

Lars Erdmann

unread,
Dec 31, 2017, 3:30:33 PM12/31/17
to
Hi Andy,

I think you are right about functions not creating a proper stack frame.
I had these issues with ICAT as well.

Lars

Paul Ratcliffe

unread,
Jan 3, 2018, 3:01:01 PM1/3/18
to
On Sun, 31 Dec 2017 12:02:50 +0100, Andi B. <and...@gmx.net> wrote:

> padim=0x00494130
> padim->szModuleBaseName=0x00494138
> Trap 13 (0DH) - General Protection Fault 0000
> eax=61767264 ebx=61767264 ecx=61767264 edx=00000008 esi=00000000 edi=00000008
> eip=0006427c esp=000f71d0 ebp=000f71ec iopl=0 rf -- -- nv up ei pl nz na po nc
> cs=005b ss=0053 ds=0053 es=0053 fs=150b gs=0000 cr2=414c5758 cr3=00225000 p=00
> 005b:0006427c 8a19 mov bl,byte ptr [ecx] ds:61767264=invalid
> ##k
> 005b:000642ef 01010101 01010101 01010101 01010101 _get_stack_trace + 5f
> 005b:0005ff0e 00480000 004970e4 000f7244 000665c4 _int_uheap_verify + 6de
> 005b:0005fc1d 00480000 00000000 00081f3c 000001a9 _int_uheap_verify + 3ed
> 005b:00061fbb 00480000 00081f3c 000001a9 00000001 _chk_if_heap + bb
> 005b:0005a537 00494138 000f760c 00081f3c 000001a9 _debug_strcpy + 47
> 005b:73656363 00632e73 6d75645f 6e6f4370 7463656e

That value in ECX is ASCII from somewhere ("drva").
You need to work out how it got there from the preceding code and then why.
I expect there's a buffer overrun somewhere corrupting something, but it can be
a nightmare to find, even with a decent debugger.

FWIW, using strcpy() is almost bound to end up with problems like this at
some point. You really shouldn't be using it any more.
Reply all
Reply to author
Forward
0 new messages