I have been running NT since the 3.1 days and I have never had a
serious crash like I experienced today. I need some help on this one.
Today I upgraded from NT 3.51 WS sp5 to NT 4 Server sp1. On a Pentium
100 with 32M RAM (Dell Optiplex XMT 5100). Everything worked fine -
installed Website 1.1f web server and the FTP Publisher. Then I went
home and logged (via standard tcp/ip) into the machine for an FTP
transfer. That is when the system crashed completely - Ping on the
machine resulted in a time out.
When I got back to the office the system had a blue screen and was
completely locked and required a power off to reset. And Netscape -
which was running on the server when the system crashed - was wiped
off the machine and had to be reinstalled.
The blue screen displayed the following message:
============
*** STOP: 0x0000000A ( ... SOME MORE NUMBERS HERE)
IRQL_NOT_LESS_OR_EQUAL *** Address 80116bf4 has base at
80100000 - ntoskernal.exe
CPUID some more stuff
Beginning dump of physical memory ... more text
============
What is the story here? What causes this kind of crash and dump?
Will it happen frequently? I have NEVER experienced a dump or blue
screen before in 3 years of production NT work.
Thanks. Bob.
-----------------------------
Robert Goldschmidt, Ph.D.
Sr. Computer Systems Analyst - Webmaster
National Institutes of Health
Office of the Director
Email: bob_gol...@nih.gov
Ah, welcome to NT 4.0. This is the exact same problem I wrote about a
few days ago and it happens to me at least 11 or 12 times a day. It has
happened at least 6 times since I sat down to work this evening around
7.
If anyone finds a solution to this serious NT bug PLEASE write it here!
This is _extremely_ annoying and is really not acceptable performance
for NT.
Thanks.
Chris
Regards
Andrew Sword, MVP
Dumpexam reads a memory dump file, executes debugger commands on it, and
writes the output in a text file, called Memory.txt, by default. The same
debugger commands are executed on each memory dump file.
A full interpretation of the output requires knowledge of Windows NT kernel
processes and the ability to read assembly language; however, there are some
guidelines you can follow to get an idea of what the output means. This
section first describes each part of the memory dump file output, giving
sample output and a description. Then several common traps are discussed,
along with guidelines on which sections of the Memory.txt file can help you
determine what caused the kernel STOP error.
Because the primary purpose of the dumpexam utility is to create a text file
to send to support personnel, the descriptions in this section do not provide
complete details of the contents of the Memory.txt file.
The following sections of the Memory.txt file each occur once, as they
include information that applies to the whole system. These sections are
listed in the order in which they appear in Memory.txt.
Windows NT Crash Dump Analysis
The first section of output is Windows NT Crash Dump Analysis, which looks
like the following:
****************************************************************
**
** Windows NT Crash Dump Analysis
**
****************************************************************
*
Filename . . . . . . .c:\temp\dumps\mac.dmp
Signature. . . . . . .PAGE
ValidDump. . . . . . .DUMP
MajorVersion . . . . .free system
MinorVersion . . . . .1057
DirectoryTableBase . .0x0006f005
PfnDataBase. . . . . .0x83fce000
PsLoadedModuleList . .0x800ee5c0
PsActiveProcessHead. .0x800ee590
MachineImageType . . .alpha
NumberProcessors . . .2
BugCheckCode . . . . .0x0000002e
BugCheckParameter1 . .0x00000000
BugCheckParameter2 . .0x00000000
BugCheckParameter3 . .0x00000000
BugCheckParameter4 . .0x00000000
ExceptionCode. . . . .0x80000003
ExceptionFlags . . . .0x00000001
ExceptionAddress . . .0x800bc140
Most of the information here is useful only for determining whether the
memory dump file is corrupted. The following items are most important,
especially if you did not record any information from the blue screen
generated when the computer trapped:
Parameter Meaning
BugCheckCode This code lists the number of the stop that occurred. The
stop code can be used by support personnel to determine what trap occurred.
For information on bug check codes, see Chapter 4, “Message Reference,” in
Windows NT Messages. Descriptions of the STOP code message start on page 441
in chapter 4 and are in numerical order. In the preceding example, the code
was 0x0000002e, which is a DATA_BUS_ERROR.
BugCheckParameters These are the four parameters that are normally
included with each STOP code. The description of the STOP code in Windows NT
Messages includes the meaning of the parameters for some of the kernel STOP
Errors.
Symbol File Load Log
This section of the Memory.txt file includes any errors that were generated
when the symbols were loaded. If no errors were generated, this section will
be blank.
!drivers
The !drivers command is a debug command that you use to list information on
all the device drivers loaded on the system. The information for the device
drivers looks like this:
****************************************************************
** !drivers
****************************************************************
Loaded System Driver Summary
Base Code Size Data Size Driver Name Creation Time
80080000 f76c0 (989 kb) 1f100 (124 kb) ntoskrnl.exe Fri May 26 15:13:00 1995
80400000 d980 ( 54 kb) 4040 ( 16 kb) hal.dll Tue May 16 16:50:34 1995
80654000 3f00 ( 15 kb) 1060 ( 4 kb) ncrc810.sys Fri May 05 20:07:04 1995
8065a000 a460 ( 41 kb) 1e80 ( 7 kb) SCSIPORT.SYS Fri May 05 20:08:05 1995
The following information can be determined from the above output:
Parameter Meaning
Base The starting address of the device driver code, in hexadecimal. When
the code that causes a trap falls between the base address for a driver and
the base address for the next driver in the list, then that driver is
frequently the cause of the fault. For instance, the base for Ncrc810.sys is
0x80654000. Any address between that and 0x8065a000 belongs to this driver.
Code Size The size in kilobytes of the driver code, in both hexadecimal
and decimal.
Data Size The amount of space in kilobytes allocated to the driver for
data, in both hexadecimal and decimal.
Driver Name The driver filename.
Creation Time The link date of the driver. Do not confuse this with the
file date of the driver, which can be set by external utilities. The link
date is set by the compiler when a driver or executable file is compiled. It
should be close to the file date, but it will not always be the same.
!locks
The !locks command is a debugger command that displays all locks held on
resources by threads. A lock can be shared or exclusive, which means no other
threads can access that resource. This information is useful when a deadlock
occurs on a system, because a deadlock is caused when one nonexecuting thread
holds an exclusive lock on a resource needed by an executing thread.
****************************************************************
** !locks -p -v -d
****************************************************************
*
**** DUMP OF ALL RESOURCE OBJECTS ****
KD: Scanning for held locks.................
Resource @ 0xffb6ed14 Shared 2 owning threads
Threads: ffb3bb70-01
0012fb50: Unable to read ThreadCount for resource
Resource @ 0xffb6ecdc Shared 2 owning threads
Threads: ffb3bb70-02
0012fb50: Unable to read ThreadCount for resource
!memusage
The !memusage command gives a short description of the current memory use of
the system. Then it gives a much longer listing of the memory usage summary.
The output looks something like this:
****************************************************************
** !memusage
****************************************************************
*
loading PFN database...................................................
Zeroed: 405 ( 3240 kb)
Free: 0 ( 0 kb)
Standby: 3242 ( 25936 kb)
Modified: 135 ( 1080 kb)
ModifiedNoWrite: 0 ( 0 kb)
Active/Valid: 4410 ( 35280 kb)
Transition: 0 ( 0 kb)
Unknown: 0 ( 0 kb)
TOTAL: 8192 ( 65536 kb)
Usage Summary in KiloBytes (Kb):
Control Valid Standby Dirty Shared Locked PageTables name
80975548 0 56 0 0 0 0 mapped_file(oemnxpip.inf)
80975248 0 16 0 0 0 0 mapped_file(oemnxpnb.inf)
8096aa68 0 160 0 0 0 0 mapped_file(SFMATALK.SY_)
80974f48 0 104 0 0 0 0 mapped_file(oemnxpsm.inf)
809758e8 0 96 0 0 0 0 mapped_file(utility.inf)
This section provides information for some memory leak issues, but it is more
useful to refer to the !vm section for memory information for most common
kernel STOP errors.
!vm
The !vm command lists the system’s virtual memory usage. The output of !vm
looks like this:
****************************************************************
** !vm
****************************************************************
*
*** Virtual Memory Usage ***
Physical Memory: 32784 (131136 Kb)
Available Pages: 27435 (109740 Kb)
Modified Pages: 33 ( 132 Kb)
NonPagedPool Usage: 461 ( 1844 Kb)
PagedPool 0 Usage: 1519 ( 6076 Kb)
PagedPool 1 Usage: 125 ( 500 Kb)
PagedPool 2 Usage: 149 ( 596 Kb)
PagedPool Usage: 1793 ( 7172 Kb)
Shared Commit: 173 ( 692 Kb)
Process Commit: 254 ( 1016 Kb)
PagedPool Commit: 1793 ( 7172 Kb)
Driver Commit: 321 ( 1284 Kb)
Committed pages: 4261 ( 17044 Kb)
Commit limit: 80792 (323168 Kb)
All memory usage is listed in pages and in kilobytes. The most useful
information in the !vm section for diagnosing problems is:
Parameter Meaning
Physical Memory The total physical memory in the system.
Available Pages The number of pages of memory available on the system, both
virtual and physical. If this is low, it might indicate a problem with a
process allocating too much virtual memory.
NonPagedPool Usage The amount of pages allocated to the nonpaged pool.
The nonpaged pool is memory that cannot be swapped out to the pagefile, so it
must always occupy physical memory. This number should rarely be larger than
10% of the total physical memory. If it is larger, this is usually an
indication that there is a memory leak somewhere in the system.
!errlog
The debugger sometimes keeps track of kernel errors logged by the system when
a problem occurs. The !errlog section contains a dump of this log. In most
cases, the error log is empty. If it is not empty, you can sometimes use it
to determine the component or process that caused the blue screen.
!irpzone full
An Interrupt Request Packet (IRP) is a data structure used by device drivers
and other kernel mode modules to communicate information to each other. The
!irpzone full command displays a list of all the pending IRPs on the system.
The following information is displayed in this section:
****************************************************************
** !irpzone full
****************************************************************
*
Small Irp list
Irp is from zone and active with 1 stacks 1 is current
No Mdl System buffer = fb564000 Thread fb5688a0: Irp stack trace.
cmd flg cl Device File Completion-Context
> d 0 1 fb56a030 fb56cd48 00000000-00000000 pending \FileSystem\MacSrv
Args: 00001000 00000000 00121020 00000000
Large Irp list
Irp is from zone and active with 4 stacks 5 is current
No Mdl Thread fb4b6860: Irp is completed. Pending has been returned
cmd flg cl Device File Completion-Context
0 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
0 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
0 0 0 00000000 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
d 0 0 fb5e3020 00000000 f8a8c711-fb48df10
\FileSystem\Ntfs SrvCompleteRfcbClose
Args: 00000000 00000000 00000000 00000000
Each entry lists information about a different IRP and points to the driver
that currently owns the IRP. This information can be useful when the trap
analysis (which occurs later in the Memory.txt file) points to a problem with
a corrupted or bad IRP. The IRP listing usually contains several entries in
both the small and large IRP lists.
!process 0 0
This command lists all processes and their headers. The process header list
will contain entries like the following:
****************************************************************
** !process 0 0
****************************************************************
*
**** NT ACTIVE PROCESS DUMP ****
PROCESS fb667a00 Cid: 0002 Peb: 00000000 ParentCid: 0000
DirBase: 00030000 ObjectTable: e1000f88 TableSize: 112.
Image: System
PROCESS fb5edde0 Cid: 0018 Peb: 7ffdf000 ParentCid: 0002
DirBase: 01587000 ObjectTable: e11d59a8 TableSize: 48.
Image: SMSS.EXE
The important information in the !process 0 0 section is:
Parameter Meaning
Process ID The 8-character hexadecimal number after the word PROCESS is
the process ID. This is used by the system to track the process. For the
first process in the example, this is fb667a00.
Image The name of the module that owns the process. In the above example,
the first process is owned by System, the second by Smss.exe.
!process 0 7
This command also lists process information. But instead of just listing the
process header, the !process 0 7 command lists all information about the
process, including all threads owned by each process. This is a very long
listing because each system has a large number of processes and each process
has one or more threads. In addition, if the stack from a thread is resident
in kernel memory (as opposed to swapped to the page file), it is listed after
the thread information. Most process and thread listings look like the
following:
****************************************************************
** !process 0 7
****************************************************************
*
**** NT ACTIVE PROCESS DUMP ****
PROCESS fb667a00 Cid: 0002 Peb: 00000000 ParentCid: 0000
DirBase: 00030000 ObjectTable: e1000f88 TableSize: 112.
Image: System
VadRoot fb666388 Clone 0 Private 4. Modified 9850. Locked 0.
FB667BBC MutantState Signalled OwningThread 0
Token e10008f0
ElapsedTime 15:06:36.0338
UserTime 0:00:00.0000
KernelTime 0:00:54.0818
QuotaPoolUsage[PagedPool] 1480
Working Set Sizes (now,min,max) (3, 50, 345)
PeakWorkingSetSize 118
VirtualSize 1 Mb
PeakVirtualSize 1 Mb
PageFaultCount 992
MemoryPriority BACKGROUND
BasePriority 8
CommitCharge 8
THREAD fb667780 Cid 2.1 Teb: 00000000 Win32Thread: 80144900 WAIT:
(WrFreePage) KernelMode Non-Alertable
80144fc0 SynchronizationEvent
Not impersonating
Owning Process fb667a00
WaitTime (seconds) 32278
Context Switch Count 787
UserTime 0:00:00.0000
KernelTime 0:00:21.0821
Start Address Phase1Initialization (0x801aab44)
Initial Sp fb26f000 Current Sp fb26ed00
Priority 0 BasePriority 0 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
fb26ed18 80118efc c0502000 804044b0 00000000 KiSwapThread+0xb5
fb26ed3c 801289d9 80144fc0 00000008 00000000 KeWaitForSingleObject+0x1c2
The following entries in the process information can be important:
Parameter Meaning
UserTime Lists the amount of time the process has been running in user
mode. If the value for UserTime is exceptionally high, it might identify a
process that is taking up all the resources and starving the system.
KernelTime Lists the amount of time the process has been running in
kernel mode. If the value for KernelTime is exceptionally high, it might
identify a process that is taking up all the resources and starving the
system.
Working Set Size Lists the current, minimum, and maximum working set
size for the process, in pages. An exceptionally large working set size can
also be a sign of a process that is leaking memory or using too many system
resources.
QuotaPoolUsage Entries List the paged and nonpaged pool used by the process.
On a system with a memory leak, looking for excessive nonpaged pool usage on
all the processes can tell you which process has the memory leak.
In addition to the process list information, the thread information also
contains a list of the resources on which the thread has locks. This
information is listed right after the thread header. In this example, the
thread has a lock on one resource, a SynchronizationEvent with an address of
80144fc0. By comparing this address to the list of locks shown in the !locks
section, you can determine which threads have exclusive locks on resources.
Processor-Specific Information in Memory.txt
The following sections in the Memory.txt file occur once for each processor
on the system. In a four-processor system, these sections will be repeated
for processors 0 through 3. In addition, some traps generate a few extra
sections, such as STOP 0x0000001E.
Register Dump for Processor #x
A dump of the state of all registers at the time of the trap is included in
this section. For an x86-based system, it appears as follows:
****************************************************************
** Register Dump For Processor #0
****************************************************************
*
eax=ffdff13c ebx=00000000 ecx=00000000 edx=fb5a7db4 esi=00000d31 edi=00000d31
eip=8013b446 esp=f88b6de4 ebp=f88b6df8 iopl=0 nv up di pl nz na pe nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00000286
cr0=8001003b cr2=00000d31 cr3=00030000 dr0=00000000 dr1=00000000 dr2=00000000
dr3=00000000 dr6=ffff0ff0 dr7=00000400 cr4=00000000
gdtr=80036000 gdtl=03ff idtr=80036400 idtl=07ff tr=0028 ldtr=0000
For a RISC-based system, the register dump varies from processor type to
processor type. The following example is from a DEC Alpha system:
v0=80006000 t0=00000000 t1=00000000 t2=800ef538
t3=00000008 t4=00000000 t5=800ec440 t6=00000000
t7=00000000 s0=c53f2000 s1=00000002 s2=00000001
s3=00000000 s4=00000001 s5=0018da83 fp=fc90f940
a0=00000002 a1=c53f2000 a2=c53f2000 a3=00000000
a4=00000000 a5=00000002 t8=800ed580 t9=80a4752c
t10=c53f2000 t11=80a4752c ra=8009b0bc t12=80a61ecc
at=a0000000 gp=800ed430 sp=fc90f890 zero=00000000
pcr=0000000008000000 softfpcr=0000000000000000 fir=800bf2fc
psr=0000000a
mode=0 ie=1 irql=2
In general, the register dump is valuable only if you are skilled in reading
assembly language on the system you are debugging.
Stack Trace for Processor x
The next section includes a trace of the stack for that processor. The stack
trace is important because it tells you what functions were called. You can
use it to trace back from a trap to determine why it happened. Included right
after each stack trace is a section of disassembled code from the area in
memory around the last instruction in the stack. This information also looks
different, depending on platform.
The first example is an excerpt from an x86-based computer on which a STOP
0x0000000A occurred:
****************************************************************
** Stack Trace
****************************************************************
*
ChildEBP RetAddr Args to Child
f88b6e00 f89805b0 fb55ea88 fb55e988 fb55ea88 KiTrap0E+0x252 (FPO: [0,0,0])
f88b6df8 fb4a71a0 fb4a6028 f89805b0 fb55ea88 NTSend+0x142
8013B430: 8B 4D 64 mov ecx,dword ptr [ebp+64h]
8013B433: 83 E1 02 and ecx,2
8013B436: D1 E9 shr ecx,1
8013B438: 8B 75 68 mov esi,dword ptr [ebp+68h]
8013B43B: 56 push esi
8013B43C: 51 push ecx
8013B43D: 50 push eax
8013B43E: 57 push edi
8013B43F: 6A 0A push 0Ah
8013B441: E8 00 C6 FD FF call KiTrap0E+24Eh
--->8013B446: F7 45 70 00 00 02 test dword ptr [ebp+70h],offset
KiTrap0E+255h
00
8013B44D: 74 0D je KiTrap0E+268h
8013B44F: 83 3D EC 05 14 80 cmp dword ptr [KiTrap0E+25Dh],0
00
8013B456: 0F 85 29 FE FF FF jne KiTrap0E+264h
8013B45C: 83 3D 38 49 14 80 cmp dword ptr [KiTrap0E+26Ah],0
00
8013B463: 0F 85 1C FE FF FF jne KiTrap0E+271h
8013B469: 83 3D C0 4D 14 80 cmp dword ptr [KiTrap0E+277h],0
00
8013B470: 0F 85 0F FE FF FF jne KiTrap0E+27Eh
8013B476: B8 FF 00 00 00 mov eax,offset KiTrap0E+283h
8013B47B: EB AC jmp KiTrap0E+235h
8013B47D: A1 52 F0 DF FF mov eax,[KiTrap0E+28Ah]
8013B482: C6 05 52 F0 DF FF mov byte ptr [KiTrap0E+290h],0
The arrow (--->) indicates the line in the assembly code at which the system
trap occurred.
The most important information here is the stack trace at the top. This tells
you in which part of the code the system trapped. Each line of a stack trace
is a different instruction that has been pushed on the stack, with the first
line being the last thing pushed on the stack. The following information is
included in each line of an x86 stack trace:
Parameter Meaning
ChildEBP The base pointer. This is an address on the stack.
RetAddr The return address. This is the address that the processor returns to
when it finishes executing the current thread. This is also the address of
the instruction on the next line of the stack.
Args to Child The first three arguments passed to the function when it was
called. These are usually pointers, but can also be other values.
Function name and offset The final piece of information is a function
name and an offset into that function that identifies the location, in code,
whose address was pushed on the stack.
The next example is from a DEC Alpha system that experienced STOP 0x0000002E:
Callee-SP Arguments to Callee Call Site
fc8e4f90 80403e08 : 80ae1060 00000000 00000000 00000000 KeBugCheckEx+0x58
fc8e5290 800c3ce8 : 80ae1060 00000000 00000000 00000000 HalMachineCheck+0x198
fc8e52d0 800c33b8 : 80ae1060 00000000 00000000 00000000 KiMachineCheck+0x28
fc8e52e0 800c1c20 : 80ae1060 00000000 00000000 00000000
KiDispatchException+0x68
fc8e55e0 800c1bcc : 80ae1060 00000000 00000000 00000000
KiExceptionDispatch+0x50
fc8e5680 80409d4c : 80ae1060 00000000 00000000 00000000
KiGeneralException+0x4
fc8e5880 f7361344 : 80ae1060 00000000 00000000 00000000
READ_REGISTER_UCHAR+0x6c
fc8e5880 f71313c4 : 80ae1060 00000000 00000000 00000000
AtalkReceiveIndication+0x654
fc8e5930 f71361a4 : 80ae1060 00000000 00000000 00000000
EthFilterDprIndicateReceive+0x234
fc8e5990 f713218c : 80ae1060 00000000 00000000 00000000
MiniportSendLoopback+0xb14
fc8e5a30 f71308d8 : 80ae1060 00000000 00000000 00000000
MiniportSyncSend+0x20c
fc8e5a70 f73628c0 : 80ae1060 00000000 00000000 00000000 NdisMSend+0x158
800BC12C: B21DF170 stl a0,KeBugCheckEx+80x4(gp)
800BC130: 0000001C call_pal rdpcr
800BC134: A0000CA0 ldl v0,KeBugCheckEx+80x4(v0)
800BC138: 22000060 lda a0,KeBugCheckEx+80x5(v0)
800BC13C: D3406778 bsr ra,RtlCaptureContext
--->800BC140: 0000001C call_pal rdpcr
800BC144: A0000CA0 ldl v0,KeBugCheckEx+t0x5(v0)
800BC148: 22000060 lda a0,KeBugCheckEx+t0x6(v0)
800BC14C: D34006DC bsr ra,KiSaveProcessorControlState
800BC150: 0000001C call_pal rdpcr
800BC154: 45299801 xor s0,76,t0
800BC158: 221E00D0 lda a0,KeBugCheckEx+o0x7(sp)
800BC15C: A0000CA0 ldl v0,KeBugCheckEx+o0x7(v0)
800BC160: 223F0230 mov KeBugCheckEx+E0x78,a1
800BC164: 22400060 lda a2,KeBugCheckEx+o0x7(v0)
800BC168: D340803D bsr ra,OtsMove
800BC16C: 47EB0402 mov s2,t1
In an Alpha stack trace, the Callee-SP parameter serves the same purpose as
the ChildEBP parameter in the x86 stack. The number right after the Callee-SP
is the return address, and the next four numbers are the arguments that were
pushed on the stack. The values for these are usually 0 because a RISC-based
system uses special registers and does not pass arguments on the stack.
!process
A !process command without any parameters lists information on the process
currently running on the active processor. Its output looks exactly the same
as the output in the !process 0 7 section, except that it is only for one
process, and no thread information is listed.
!thread
A !thread command without any parameters behaves exactly as a !process
command without any parameters, and lists the thread that is currently
running. The thread output looks exactly the same as the output in the
!process 0 7 section.
Note There are three very similar versions of the same information so it is
easier to find which thread(s) are currently executing. A !process 0 7
command lists all process and thread information, which results in 10–15
pages of data just for the process and thread output. Picking out the process
or thread that is currently running from this long list can be difficult.
Dump Analysis Heuristics for Bugcode
This section appears in a dump for the processor that actually caused the
trap only. This section includes information specific to the STOP code and
can be very important. The exact information presented in this section varies
for different STOP codes, but it lists the address at which the STOP occurred
and any more information that is available.
This an example from STOP 0x0000000A:
****************************************************************
** Dump Analysis Heuristics for Bugcode IRQL_NOT_LESS_OR_EQUAL
****************************************************************
*
Invalid Address Referenced: 0x00000020
IRQL: 2
Access Type: Write
Code Address: 0xfa6325a5
This example is from a STOP 0x0000001E:
****************************************************************
** Dump Analysis Heuristics for Bugcode KMODE_EXCEPTION_NOT_HANDLED
****************************************************************
*
Exception Code: 0xc0000005
Address of Exception: 0x801704a7
Parameter #0: 0x00000001
Parameter #1: 0x00000001
Common STOP Codes
By looking through the Memory.txt output of common STOP codes, you can
sometimes identify the module or driver that caused the problem. Given this
information, you might be able to determine whether a service pack or update
to Windows NT will fix the problem. In many cases, you will still need to
contact support personnel, but looking at the Memory.txt output gives you an
idea about what is wrong.
STOP 0x0000000A IRQL_NOT_LESS_OR_EQUAL
STOP 0x0000000A indicates that a kernel mode process or driver attempted to
access a memory address that it did not have permission to access. The most
common cause of this error is a bad or corrupted pointer to an incorrect
location in memory. A pointer is a variable used by a program to refer to a
block of memory. If the variable has an incorrect value in it, then the
program tries to access memory that it should not be using.
When this occurs in a user-mode application, it generates an access
violation.
When it occurs in kernel mode, it generates a STOP 0x0000000A message. This
trap can be caused by either hardware or software. Contact support personnel
to determine the exact cause.
To determine the general cause of a STOP 0x0000000A message, look at the
Stack Trace for Processor X section of the Memory.txt file. If you have a
multiprocessor system, check the output for all processors and look for a
stack trace that has a line similar to the following at the top of the stack:
ChildEBP RetAddr Args to Child
f88b6e00 f89805b0 fb55ea88 fb55e988 fb55ea88 KiTrap0E+0x252 (FPO: [0,0,0])
This is the processor on which the trap occurred. After the stack trace
section, additional information on the trap appears in the Dump Analysis
Heuristics section. To determine the module that caused the trap, look at the
line on the stack trace occurring immediately after the line in the preceding
example. This line is usually the line of code that caused the trap. From
this information, you can identify the module in which the trap occurred. For
example, the top lines of the stack trace can read:
ChildEBP RetAddr Args to Child
fa679758 fa6325a5 fcdb0b58 fccd3770 02611e6c KiTrap0E+0x252
fa6797e0 fa63ae8e fcc37528 fa67992e fccd3770 FindNameOrQuery+0x141
fa679838 fa6444a5 fa679854 fa6a33d0 fa6798d0 NbtConnect+0x3ae
fa679860 fa630393 fccd3770 fcdb2e08 fa679900 NTConnect+0x2b
The first line of the stack trace contains the reference to KiTrap0E and the
second line contains FindNameOrQuery+0x141, which means that the processor
trap occurred in the function FindNameOrQuery.
STOP 0x0000001E KMODE_EXCEPTION_NOT_HANDLED
STOP 0x0000001E can also be caused by either hardware or software. It is
caused by hardware more often than a STOP 0x0000000A is, but can be caused by
software.
When looking at dumpexam output from STOP 0x0000001E, you see two stack trace
listings for the processor on which the STOP occurred. The first listing is
the stack after the trap occurred, which shows only the kernel calls made to
handle the trap and does not include any information about what code caused
the trap.
The second listing shows the stack just before the trap occurred. This is the
listing you use for your analysis. The register dump for the processor is
also duplicated, with the first dump showing the status of the registers
after the trap and the second showing the state of the registers when the
trap occurred. These two sets of information are separated by a section that
looks like the following:
****************************************************************
** !exr fca49c20
****************************************************************
*
Exception Record @ FCA49C20:
ExceptionCode: c0000005
ExceptionFlags: 00000000
Chained Record: 00000000
ExceptionAddress: 801704a7
NumberParameters: 00000002
Parameter[0]: 00000001
Parameter[1]: 00000001
This section includes the following information:
Parameter Meaning
ExceptionCode A status code that identifies what type of exception
occurred. In this case, the code is c0000005, which indicates an access
violation. To find out what a particular status code means, contact support
personnel.
ExceptionAddress The address of the instruction that caused the STOP.
The first stack trace from STOP 0x0000001E, the one that does not provide any
useful information, looks like the following:
ChildEBP RetAddr Args to Child
fca49968 8013387e fca49990 801367ab fca49998
PspUnhandledExceptionInSystemThread+0x18 (FPO: [0,0,0])
fca49970 801367ab fca49998 00000000 fca49998 PspSystemThreadStartup+0x4a
(FPO: [0,0,0])
fca49f7c 8013e452 fca54bae 00000001 00000000 _except_handler3+0x47
00000000 00000000 00000000 00000000 00000000 KiThreadStartup+0x16
To determine where the trap occurred, ignore this stack and look at the
second listing, after the !exr entry. The first line in this listing
indicates the location in code that caused the trap.
With STOP 0x0000001E, it is also useful to compare the exception address
listed in the !exr section to the list of device drivers in the !drivers
section of the Memory.txt file. If the trap was caused by a specific driver,
this address falls into the address range in the drivers list. If this is the
case, it can indicate a problem either with the device that the driver
controls or with the driver itself. Here is an example:
FramePtr RetAddr Param1 Param2 Param3 Function Name
fa1bcda4 8010e244 fcff3940 00000000 00000220 NT!PsReturnPoolQuota+0xe
fa1bcdd4 80117085 fcbee668 fcddf648 fcbff020 NT!ExFreePool+0x16c
fa1bce24 8011c60b fcddf648 fa1bce58 fa1bce54 NT!IopCompleteRequest+0xbd
fa1bce5c 8013de15 00000000 00000000 00000000 NT!KiDeliverApc+0x83
fa1bce7c 8011a1ce 00000000 00000000 80179a01 NT!@KiSwapThread@0+0x15d
fa1bcea0 80179b3f fcc4bf60 00000006 80179a01 NT!KeWaitForSingleObject+0x1c2
fa1bcef0 80139b09 00000114 00000001 00000000 NT!NtWaitForSingleObject+0xaf
fa1bcef0 77f893eb 00000114 00000001 00000000 NT!KiSystemService+0xa9
00000000 00000000 00000000 00000000 00000000 NTDLL!ZwWaitForSingleObject+0xb
STOP 0x0000007F UNEXPECTED_KERNEL_MODE_TRAP
STOP 0x0000007F usually occurs in the processor itself and almost always
indicates a hardware fault. There are several kinds of STOP 0x0000007F, which
you can determine by the first parameter of the STOP code, found in the
Windows NT Crash Dump Analysis section at the beginning of the Memory.txt
file.
The following are common kernel mode traps:
First Parameter Meaning
0x00000000 Divide by zero error
0x00000004 Arithmetic overflow
0x00000006 Invalid opcode
0x00000008 Double fault
A divide by zero error is caused when a DIV instruction is executed and the
divisor is 0. This can be caused by problems which need to be investigated
further, such as memory corruption, hardware problems, or software failures.
Here’s an example of a divide by zero error:
ChildEBP RetAddr Args to Child
8019d778 8013cdcc fe483688 00000000 00000000 NT!_KiSystemFatalException+0xe
(FPO: [0,0] TrapFrame @ 8019d778)
8019d7e8 fbb053be 0001440d 000004a9 000004a9
NT!_RtlEnlargedUnsignedDivide+0xc
(FPO: [4,0,0])
8019d80c 8010f613 0001440d 000004a9 fe482bd0 bhnt!_BhStationQueryTimeout+0x44
(FPO: [4,0,1])
8019d820 fb910aa6 fe50a000 fe44255a fe44254c NT!_KeSetTimer+0x8f
8019d85c fb9409b3 fe4820c8 fe44255a fe44254c
NDIS!_EthFilterDprIndicateReceive+0x111
8019d894 fb94044a fe482b98 fe483688 ffdff401 netflx!NetFlexProcessEthRcv+0x85
8019d8ac fb910ba1 fe482aa8 fb910b30 00000001
netflx!_NetFlexHandleInterrupt+0x4a
8019d8c4 80137c06 fe482bac fe482b98 00000000 NDIS!_NdisMDpc+0x71 (FPO: [EBP
0xfb910b30] [4,0,4])
fb910b30 18247c8b 8b34778b 4e8d106f d015ff30 NT!_KiIdleLoop+0x5a
kd> !trap 8019d778
eax=0001440d ebx=00000003 ecx=8019d81c edx=000004a9 esi=fe4820c8 edi=fe46a188
eip=8013cdcc esp=8019d7ec ebp=8019d820 iopl=0 nv up ei pl zr na po nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010246
ErrCode = 00000000
8013cdcc f774240c div dword ptr [esp+0xc]
An arithmetic overflow error occurs when the result of a multiplication
operation is larger than a 32-bit integer. This error can be caused by a
software failure, but it is also frequently a hardware problem.
An invalid opcode error occurs when the processor attempts to execute an
instruction that is not defined. This error is almost always caused by
hardware memory corruption. If you receive this error, run memory diagnostics
on your regular memory and both L1 and L2 cache memory.
A double fault trap occurs when two kernel-mode traps occur simultaneously
and the processor is unable to handle them. This trap is almost always caused
by hardware failure.
If a particular trap can be caused by either software or hardware, more
analysis is required to determine which is the cause. If you suspect a
hardware problem, try the following hardware troubleshooting steps:
1. Run diagnostic software to test the RAM in the computer. Replace any
RAM reported to be bad. Also, make sure that all the RAM in the computer is
the same speed.
2. Try removing or swapping controllers, cards, or other peripherals.
3. Try a different motherboard on the computer.
> > Today I upgraded from NT 3.51 WS sp5 to NT 4 Server sp1. On a Pentium
> > 100 with 32M RAM (Dell Optiplex XMT 5100). Everything worked fine -
> > installed Website 1.1f web server and the FTP Publisher. Then I went
> > home and logged (via standard tcp/ip) into the machine for an FTP
> > transfer. That is when the system crashed completely - Ping on the
> > machine resulted in a time out.
...
> If anyone finds a solution to this serious NT bug PLEASE write it here!
> This is _extremely_ annoying and is really not acceptable performance
> for NT.
Known bug... B^( Service Pack 2 or AFD hotfix take care of that.
--
With best wishes,
-Boris Molodyi,
Marco Consulting Group
I get a VERY similar dump screen as soon as I get an access to the
web service on my NT 4.0/SP1 WS.
--
.--. .--.
| | | | .-. .--------------------------------------.
| | | |/ / | Rickard Borgmäster |
.-^ | .--. | < | rbo...@rbk.sollentuna.se |
( o | ( () ) | |\ \ | http://www.rbk.sollentuna.se/~rbo_95 |
`-----' `--' `--' `--' `--------------------------------------'
Rickard Borgmäster <rbo...@rbk.sollentuna.se> wrote in article
<01bbefa8$c0d7c150$cd5434c2@viper>...
=========================================
We are experiencing persistent "blue screen" crashes on our system which
is:
Dual Pentium Pro 200
128 mB RAM (4 32mB EDO SIMMs)
9 gB SCSI wide hardrrive
NT 4.0 Server (with SP1 installed.
The consistent crash message is:
The computer has rebooted from a bugcheck. The bugcheck was: 0x0000000a
(0x0000015a, 0x0000001c, 0x00000000, 0x80117f3b). Microsoft Windows NT
[v15.1381].
The system runs 24-hours a day, using a RAS dialup to our Internet
Service Provider (PPP) The system operates a SMTP/POP3 suite (IMail)
and two Netscape Enterprise 2.01 servers (posrt 80 and 443). The
crashes "seem" to be associated with accesses to our web servers, but
that may be a coincidence. There are no Web server error log messages
that point to the problem.
When I have been able to observe the crashes, there is just a blue
screen message with no associated Dr. Watson messages. I have save one
full memory dump if that would help.
Exactly the same software has been running without a hitch on our older
machine (486DX4/100 32mB RAM NT4.0 Server with SP1) The new server will
run for 24 hours without a crash, as long os the Web servers aren't
connected.
Any ideas what's causing this? Is the consistent bugcheck message of
specific significance?
HELP!!!!
Ken Bass
VP/CIO
Bass & Company
http://www.bassandco.com
Kevin -
See attached. Is this the same problem since we are expeeriencing it
using RAS?
Ken
This is the same error that we saw when we upgraded to NT 4 from 3.51.
We ARE using a DIGI Board.... Believe it or not... All we did was
changed the MODEM drivers installed on the DIgibord ports and that fixed
the problem. Since we actually still wanted to use the older ones again,
we tried changing them back and the Blue death screen appear with the
same error as above again. So, recommendations, try changing the Modem
types if you are using a digibord.
The crashes started happening after I installed SP2.
IRQL_NOT_LESS_OR_EQUAL etc. I won't swear by it but it looks as if it
happens sometimes when I try to access the floppy drive.
Any clues? Also, would you folks please send me a copy by e-mail as I may
be out of town for a while?
_____
Vipul
vi...@xcaliber.com
Ken Bass <K...@bassandco.com> wrote in article
<32C050...@bassandco.com>...
> Kevin Behr wrote:
> >
> > you aren't by chance using a digi board or ras are you?
> >
> > Rickard Borgmäster <rbo...@rbk.sollentuna.se> wrote in article
> > <01bbefa8$c0d7c150$cd5434c2@viper>...
> > > Robert Goldschmidt <bob_gol...@nih.gov> wrote in article
> > > <MPG.d15f137a...@msnews.microsoft.com>...
> > > > Hi All:
> > > >
Mine is the same. I was reading a floppy then BOOM BSOD. This is the
second time and I just installed SP2.
--
Richard Beyea
SoftStuf Software
- deleted -
NL <leong...@ccmail.ahlstrom.com> wrote in article
<01bbfba9$7e13c410$13d57f8d@masingaporep1>...
I have a diferent situation but causing a similar efect:
- CRASH
My NT4 Server, HP LS 5/133 64MB, crashes every time the server clock is
equal to 9.23 AM. It gets very slow and do not alow me to close or start
any application or even make Ctrl-Alt-Del.
Can imagine what kind of process or agent or whatever make this?
Thanks in advance,
Francisco