Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Debugging invalid handles (in a critical section) with application verifier and windbg

1,040 views

Skip to first unread message

James S

unread,

Feb 27, 2008, 11:23:30 AM2/27/08

Hi,

I know in theory this should not be a hard exercise to at least
start. Turn on the handle and lock checks for the process, launch it
with windbg, catch the resulting exception or app verifier generated
break and go from there. Well the first problem is that this is only
reproducible on a customer system on the other side of the world that
has no external connectivity at all. The second is that using app
verifier on the process results in the problem being no longer
reproducible. I am instructing them to use the GUI tool to turn on
the verifier (version 3.4.0158) and windbg 6.8.4.0.

Here are the interesting bits when caught without app verifier (this
is easily reproducible on the customer site and I have a number of
virtually identical dumps):

eax=027cfc18 ebx=00000000 ecx=027cfc60 edx=7c90eb94 esi=01ec3eec
edi=00000000
eip=7c964ed1 esp=027cfc18 ebp=027cfc68 iopl=0 nv up ei pl zr
na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
efl=00000246
ntdll!RtlRaiseStatus+0x26:
7c964ed1 c9 leave

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 7c964ed1 (ntdll!RtlRaiseStatus+0x00000026)
ExceptionCode: c0000008 (Invalid handle)
ExceptionFlags: 00000001
NumberParameters: 0
Thread tried to close a handle that was invalid or illegal to close

ERROR_CODE: (NTSTATUS) 0xc0000008 - An invalid HANDLE was specified.

NTGLOBALFLAG: 0

APPLICATION_VERIFIER_FLAGS: 0

CRITICAL_SECTION: 01ec3eec -- (!cs -s 01ec3eec)

BLOCKING_THREAD: 00001388

DERIVED_WAIT_CHAIN:

Dl Eid Cid WaitType
-- --- ------- --------------------------
11 1e60.1388 Critical Section

WAIT_CHAIN_COMMAND: ~11s;k;;

DEFAULT_BUCKET_ID: BusyHang

PRIMARY_PROBLEM_CLASS: BusyHang

LAST_CONTROL_TRANSFER: from 7c94243c to 7c964ed1

FAULTING_THREAD: 0000000b

STACK_TEXT:
027cfc68 7c94243c c0000008 01ec3ee8 00000000 ntdll!RtlRaiseStatus+0x26
027cfce8 7c90104b 00ec3eec 0040b5be 01ec3eec ntdll!
RtlpWaitForCriticalSection+0x204
027cfcf0 0040b5be 01ec3eec 0043e5f8 01ec2b60 ntdll!
RtlEnterCriticalSection+0x46
027cfd08 0041ec2e 00000000 00000000 027cfd28 HDXDataSourceManager!
CDataObjectProxy::OnDataSourceValueChange+0x50
<snip>
normal expected call stack
<snip>
027cffec 00000000 00fbf55c 01def270 00000000 kernel32!BaseThreadStart
+0x37

BUGCHECK_STR: APPLICATION_FAULT_BusyHang_STATUS_INVALID_HANDLE

STACK_COMMAND: ~11s ; kb

--------

0:011> !cs -s 01ec3eec
-----------------------------------------
Critical section = 0x01ec3eec (+0x1EC3EEC)
DebugInfo = 0x01ec3ea0
LOCKED
LockCount = 0x20003
OwningThread = 0x00000000
RecursionCount = 0x10C0110
LockSemaphore = 0x360188
SpinCount = 0x00020003
ntdll!RtlpStackTraceDataBase is NULL. Probably the stack traces are
not enabled.

Each object uses a single CComAutoCriticalSection to protect its
members, so they *should* all be correctly created, initialized and
destroyed only once. We can not reproduce any errors using app
verifier locally.

However, as stated above the customer problem can no longer be
reproduced as soon as the verifier is switched on. To reduce the
likelihood of timing being the cause the reproduction was one with
only locks and handles checks enabled (and then with only one of each
enabled) but to no avail.

In desperation I removed the call to lock and unlock that this call
stack points to and this time was able to catch something in the
debugger with all the basic checks (including heaps - aka full page
heap) enabled. The problem is that as far as I can tell this is just
a normal exception (eventually thrown in the verifier code itself) and
not a verifier stop or an exception generated by the verifier.

eax=68646a18 ebx=68646a00 ecx=68646a18 edx=7ffaf000 esi=0043c4c0
edi=00000000
eip=00395d22 esp=1df0f8c8 ebp=1df0f8d0 iopl=0 nv up ei pl nz
ac po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000
efl=00010212
vfbasics!AVrfpVerifyInitializedCriticalSection+0x12:
00395d22 8338ff cmp dword ptr [eax],0FFFFFFFFh ds:
0023:68646a18=????????

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 00395d22 (vfbasics!
AVrfpVerifyInitializedCriticalSection+0x00000012)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 68646a18
Attempt to read from address 68646a18

DEFAULT_BUCKET_ID: APPLICATION_FAULT

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at "0x%08lx"
referenced memory at "0x%08lx". The memory could not be "%s".

READ_ADDRESS: 68646a18

BUGCHECK_STR: ACCESS_VIOLATION

NTGLOBALFLAG: 2000100

APPLICATION_VERIFIER_FLAGS: 80000181

LAST_CONTROL_TRANSFER: from 00396415 to 00395d22

STACK_TEXT:
1df0f8d0 00396415 68646a18 00000001 00000000 vfbasics!
AVrfpVerifyInitializedCriticalSection+0x12 [e:\verifier\source\base
\avrf\avrf30\providers\basics\critsect.c @ 831]
1df0f8fc 00397449 68646a18 1df0f930 00412be1 vfbasics!
AVrfpRtlEnterCriticalSection1+0x45 [e:\verifier\source\base\avrf
\avrf30\providers\basics\critsect.c @ 1480]
1df0f908 00412be1 68646a18 0043b280 00000000 vfbasics!
AVrfpRtlEnterCriticalSection+0x39 [e:\verifier\source\base\avrf
\avrf30\providers\basics\critsect.c @ 3189]
1df0f930 0040b5e9 68646a18 43aa2868 00000000 HDXDataSourceManager!
CDataSourceSite::OnDataSourceValueChange+0x7f
1df0f958 0041ec3e 0041eb34 00000000 00000000 HDXDataSourceManager!
CDataObjectProxy::OnDataSourceValueChange+0x7b
<snip>
normal expected call stack
<snip>
1df0ffb4 7c80b50b 1d4aafe0 7c9268ad 0039bb0b vfbasics!
AVrfpStandardThreadFunction+0x6f [e:\verifier\source\base\avrf
\avrf30\providers\basics\thread.c @ 652]
1df0ffec 00000000 003953e0 1d4aafe0 00000000 kernel32!BaseThreadStart
+0x37

FAULTING_THREAD: 00002218

STACK_COMMAND: ~11s; .ecxr ; kb

Continuing caused an app verifier stop, but the stop code was 13 as if
this was a buffer overrun on the heap. Th address could not be found
in any heap. If 68646a18 is a garbage critical section address, why
didn't the handle or lock check pick it up?

thanks,
James.

Dan Mihai [MSFT]

unread,

Feb 28, 2008, 11:52:34 AM2/28/08

The first step is to figure out the AVrfpVerifyInitializedCriticalSection
access violation you mentioned below. It AV-ed trying to address the address
@eax. EAX was 68646a18. That is the parameter specified by
HDXDataSourceManager!CDataSourceSite::OnDataSourceValueChange when it called
EnterCriticalSection. So the question is: why is OnDataSourceValueChange
using a bogus critical section address?

If that critical section was allocated in a heap memory block, maybe
"!avrf -hp -a 68646a18" will be able to show that the heap block has been
freed already.

Otherwise look harder at the source code and at the debugger and try to
figure out how the app ended up with that bad pointer.

Dan

--
This posting is provided "AS IS" with no warranties, and confers no rights.

"James S" <James...@gmail.com> wrote in message
news:19a05fd5-bcac-4f2f...@s12g2000prg.googlegroups.com...

0 new messages