Below is the program which does the following
1. Lauches a given executable in suspended form so that the primary
thread
of the process sleeps.
2. Creates a remote thread in the target process which
calls "ExitThread".
3. Resumes the primary thread of the process.
========================================================================
==
#include <windows.h>
#include <stdio.h>
main(int argc, char **argv)
{
STARTUPINFO Si;
PROCESS_INFORMATION Pi;
BOOL rc;
DWORD ThreadId;
HANDLE hThread;
if (argc!=2) {
printf("Usage: %s <Exename>\n", argv[0]);
return 0;
}
memset(&Si, 0, sizeof(Si));
memset(&Pi, 0, sizeof(Pi));
Si.cb=sizeof(Si);
rc=CreateProcess(NULL,
argv[1],
NULL,
NULL,
FALSE,
CREATE_SUSPENDED,
NULL,
NULL,
&Si,
&Pi);
if (rc==FALSE) {
printf("CreateProcess failed, rc=%d\n", GetLastError());
return 0;
}
hThread=CreateRemoteThread(Pi.hProcess,
NULL,
0,
(LPTHREAD_START_ROUTINE)
(GetProcAddress
(GetModuleHandle("KERNEL32.DLL"),
"ExitThread")),
0,
0,
&ThreadId);
if (hThread==NULL) {
printf("Unable to create remote thread\n");
TerminateProcess(Pi.hProcess, 0);
goto Exit;
}
WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);
ResumeThread(Pi.hThread);
WaitForSingleObject(Pi.hProcess, INFINITE);
Exit:
CloseHandle(Pi.hThread);
CloseHandle(Pi.hProcess);
return 0;
}
========================================================================
==
Now this program works fine under Windows NT 4.0. However under Windows
2000, for some applications this program fails to work.
e.g
If you start notepad.exe or regedt32.exe, it works fine.
If you start regedit.exe, regedit.exe starts but quits immediately
if you start taskmgr.exe, the task manager crashes with the access
violation
error.
AFAIK, the loader initializations, implicitly linked DLLs loading etc.
happen as part of the first thread of the process. In the above case,
the DLL initializations happen as part of secondary thread. It seems
that Windows 2000 does not like this fact.
Can anybody on the list throw some light on this behaviour?
Thanks.
-Prasad
Sent via Deja.com http://www.deja.com/
Before you buy.
Why dou you want to call ExitThread(). It kills the remote thread itself,
but it's the ExitThread.
It seems a very strange thread suicide for me :)
Zolee
<pda...@my-deja.com> wrote in message news:9000hh$dt6$1...@nnrp1.deja.com...
He's probably trying a minimal-impact test just to lay the groundwork
for a more daring "exploit" ;-)
> <pda...@my-deja.com> wrote in message news:9000hh$dt6
$1...@nnrp1.deja.com...
> > Hello,
> >
> > Below is the program which does the following
> >
> > 1. Lauches a given executable in suspended form so that the primary
> > thread
> > of the process sleeps.
> > 2. Creates a remote thread in the target process which
> > calls "ExitThread".
> > 3. Resumes the primary thread of the process.
> >
--
Content below this point not provided by me.
<pda...@my-deja.com> schrieb im Newsbeitrag
news:9000hh$dt6$1...@nnrp1.deja.com...
> Hello,
>
> Below is the program which does the following
>
> 1. Lauches a given executable in suspended form so that the primary
> thread
> of the process sleeps.
> 2. Creates a remote thread in the target process which
> calls "ExitThread".
> 3. Resumes the primary thread of the process.
>
Note: Once I had a problem and I had to put a little "sophisticated"
Sleep(...) before the CreateRemoteThread :))
Zoltan
#include <windows.h>
#include <stdio.h>
#include <process.h>
main(int argc, char **argv)
{
STARTUPINFO Si;
PROCESS_INFORMATION Pi;
BOOL rc;
DWORD ThreadId;
HANDLE hThread;
if (argc!=2) {
printf("Usage: %s <Exename>\n", argv[0]);
return 0;
}
memset(&Si, 0, sizeof(Si));
memset(&Pi, 0, sizeof(Pi));
Si.cb=sizeof(Si);
rc=CreateProcess(NULL,
argv[1],
NULL,
NULL,
FALSE,
0,
NULL,
NULL,
&Si,
&Pi);
if (rc==FALSE) {
printf("CreateProcess failed, rc=%d\n", GetLastError());
return 0;
}
hThread=CreateRemoteThread(Pi.hProcess,
NULL,
0,
(LPTHREAD_START_ROUTINE)
(GetProcAddress(GetModuleHandle("KERNEL32.DLL"),"ExitThread")),
0,
0,
&ThreadId);
if (hThread==NULL) {
printf("Unable to create remote thread\n");
TerminateProcess(Pi.hProcess, 0);
goto Exit;
}
WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);
WaitForSingleObject(Pi.hProcess, INFINITE);
Exit:
CloseHandle(Pi.hThread);
CloseHandle(Pi.hProcess);
return 0;
}
<pda...@my-deja.com> wrote in message news:9000hh$dt6$1...@nnrp1.deja.com...
>This is working! The problem was the suspended process.
>There was a long discussion about this problem :)
Thank you, Zoltan, you saved me some typing.
Slava
"Zoltan Csizmadia" <zoltan_c...@yahoo.com> schrieb im Newsbeitrag
news:OyL5N#YWAHA.263@cppssbbsa03...
> This is working! The problem was the suspended process.
I know that removing CREATE_SUSPENDED solves this problem. However, I
dont want this kind of solution. In my real application, I want to
create a remote thread in the target process, which hooks some API
calls and I dont want to miss any call. I want to trap the calls right
from the entry point of the executable. Putting sleep is not going to
solve the problem in a perfect way.
-Prasad
In article <OyL5N#YWAHA.263@cppssbbsa03>,
>Hello,
>
>I know that removing CREATE_SUSPENDED solves this problem. However, I
>dont want this kind of solution. In my real application, I want to
>create a remote thread in the target process, which hooks some API
>calls and I dont want to miss any call. I want to trap the calls right
>from the entry point of the executable. Putting sleep is not going to
>solve the problem in a perfect way.
Prasad,
create process with DEBUG_PROCESS flag and wait until the loader
finishes.
Slava
Slava,
>
> create process with DEBUG_PROCESS flag and wait until the loader
> finishes.
Once I attach as a debugger, I cant detach. Also, I am hooking API
calls on a system wide basis. I cant attach as a debugger to multiple
processes right?
>
> Slava
>
The interesting thing is: The original code which I posted works
perfectly fine on Windows NT 4.0. I would call it as a Windows 2000 bug
OR am I missing something here?
The only difference the code snippet (as against normal CreateProcess)
makes is: the initialization code (DLL_PROCESS_ATTACH) for implicitly
linked DLLs executes in the context of secondary thread which would
normally execute in the context of primary thread.
You can, but you need a thread for each debuggee.
>The interesting thing is: The original code which I posted works
>perfectly fine on Windows NT 4.0. I would call it as a Windows 2000 bug
>OR am I missing something here?
Well, some people do think it *is* a bug, but I don't think so.
CreateRemoteThread() is a part of Win32 and is therefore guaranteed to
work only with Win32 processes, and until the loader has initialized
the application and it has connected to CSRSS it is not Win32. For all
practical reasons it is a bug, but because the whole load/init process
is not documented, you're supposed to expect all kinds of weird
things.
>The only difference the code snippet (as against normal CreateProcess)
>makes is: the initialization code (DLL_PROCESS_ATTACH) for implicitly
>linked DLLs executes in the context of secondary thread which would
>normally execute in the context of primary thread.
That makes a lot of difference to CSRSS because it expects the first
thread to connect first. It may potentially make a lot of difference
to the initialization code.
Slava
Try this:
1. Create the process suspended.
2. Query the context of the initial thread (using the thread handle returned
by CreateProcess).
3. Call VirtualAllocEx to allocate a chunk of memory in the target process.
4. Use WriteProcessMemory() to copy some **carefully** written machine code
to memory allocated in the target process.
5. Modify the initial thread's context so that EIP points to the machine
code you just copied over.
6. Resume the initial thread.
The code written to the target process should do something like this:
1. Preserve registers.
2. Do its "thing" (for example, load a DLL).
3. Restore registers.
4. Jump to the EIP from the original thread context (before it was modified
in step 5 above).
It may sound whacky, but I've tried it and it works great.
KM
You are right. I have already implemented this kind of solution.
However, I was looking for a more cleaner way of doing this.
I have hooked CreateProcess call and I am using this hook to propogate
the hooks to the newly started process.
In this case, sychronization becomes tricky, if the caller initially
specified the process to be in suspended form. Because, I want to put
the primary thread again to sleep after propogating my hooks. So
effectively, I want to resume the primary thread only to the point it
reaches its entry point. I acheived the synchronization using pair of
events, however it's not perfect.
Also, note that when the thread is created, EAX register actually
contains the real entry point of the thread and the EIP register holds
the address of BaseThreadStartThunk.
Anyway thanks for the answer.
-Prasad
In article <POGV5.21392$II2.2...@newsread2.prod.itd.earthlink.net>,
So, application coding errors excepted, there should be no problem in calling
CreateRemoteThread to create a thread into a process that was created with
CREATE_SUSPEND and which has not been resumed.
The problem observed by Prasad (and others including EliCZ) is due to the
tracking of resource usage by win32k.sys. Consider the following program:
#include <windows.h>
#include <stdio.h>
volatile ATOM a;
LRESULT CALLBACK WindowProc(HWND hwnd, UINT message, WPARAM wParam, LPARAM lParam)
{
if (message == WM_DESTROY) PostQuitMessage(0);
return DefWindowProc(hwnd, message, wParam, lParam);
}
ULONG WINAPI ThreadProc(PVOID h)
{
WaitForSingleObject(HANDLE(h), INFINITE);
HWND hwnd = CreateWindow(PSTR(a), "", WS_OVERLAPPEDWINDOW,
CW_USEDEFAULT, CW_USEDEFAULT, 10, 10, 0, 0, GetModuleHandle(0), 0);
printf("hwnd = %lx, gle = %ld, atom = %lx\n", hwnd, GetLastError(), a);
if (hwnd != 0) {
ShowWindow(hwnd, SW_SHOW);
MSG msg; while (GetMessage(&msg, 0, 0, 0) != 0) DispatchMessage(&msg);
}
return 0;
}
int main()
{
WNDCLASSEX wndclassex = {sizeof wndclassex};
wndclassex.lpfnWndProc = WindowProc;
wndclassex.hInstance = GetModuleHandle(0);
wndclassex.lpszClassName = "Test";
a = RegisterClassEx(&wndclassex);
HANDLE h;
DuplicateHandle(GetCurrentProcess(), GetCurrentThread(), GetCurrentProcess(), &h,
0, FALSE, DUPLICATE_SAME_ACCESS);
ULONG tid;
CreateThread(0, 0, ThreadProc, h, 0, &tid);
// ThreadProc(0);
ExitThread(0);
return 0;
}
Just before CreateThread is called the win32k PROCESSINFO structure appears
thus:
---PPROCESSINFO @ 0xE2604EA8 for process 43c(yz.exe):
ppiNext @0x00000000
rpwinsta @0xFE51C348
hwinsta 0x0000003c
amwinsta 0x000f037f
ptiMainThread @0x00000000
cThreads 0x00000001
rpdeskStartup @0xFE398408
hdeskStartup 0x00000040
pclsPrivateList @0xA0346C20
pclsPublicList @0xA0345CF8
flags W32PF_CLASSESREGISTERED | W32PF_CONSOLEAPPLICATION |
W32PF_FORCEOFFFEEDBACK | W32PF_INITIALIZED | W32PF_IOWINSTA | W32PF_PROCESSCONNECTED |
W32PF_READSCREENACCESSGRANTED | W32PF_STARTGLASS | W32PF_THREADCONNECTED
dwHotkey 0x00000000
pWowProcessInfo @0x00000000
luidSession 0x00000000:0x0000c217
dwX,dwY (0x0,0x0)
dwXSize,dwYSize (0x0,0x0)
dwFlags 0x00000001
wShowWindow 0x0001
pCursorCache 0x00000000
dwLpkEntryPoints 0
Desktop views:
pdesk = fe398408, ulClientDelta = 9feb0000
--------
with the following window classes registered for the process:
Classes for process e2604ea8:
Private class PCLS @ 0xa0346c20 (Test)
Private class PCLS @ 0xa0345f90 (DDEMLUnicodeServer)
Private class PCLS @ 0xa0345f08 (DDEMLAnsiServer)
Private class PCLS @ 0xa0345e78 (DDEMLUnicodeClient)
Private class PCLS @ 0xa0345df0 (DDEMLAnsiClient)
Private class PCLS @ 0xa0345d70 (DDEMLMom)
Public class PCLS @ 0xa0345cf8 (Static)
Public class PCLS @ 0xa0345c88 (IME)
Public class PCLS @ 0xa0345c08 (MDIClient)
Public class PCLS @ 0xa0345b90 (ListBox)
Public class PCLS @ 0xa0345b18 (Edit)
Public class PCLS @ 0xa0345aa8 (#32770)
Public class PCLS @ 0xa0345a28 (ComboLBox)
Public class PCLS @ 0xa03459a8 (ComboBox)
Public class PCLS @ 0xa0345930 (Button)
Public class PCLS @ 0xa03458b8 (Message)
Public class PCLS @ 0xa0345858 (DDEMLEvent)
Public class PCLS @ 0xa03457f8 (#32772)
Public class PCLS @ 0xa0345798 (#32774)
Public class PCLS @ 0xa0344b48 (ScrollBar)
Public class PCLS @ 0xa0344ad8 (#32768)
Public class PCLS @ 0xa0344a68 (#32771)
Public class PCLS @ 0xa03449f8 (#32769)
(i.e. some standard classes and the class Test created by the program)
Just after WaitForSingleObject returns the win32k PROCESSINFO structure
appears thus:
---PPROCESSINFO @ 0xE2604EA8 for process 43c(yz.exe):
ppiNext @0x00000000
rpwinsta @0xFE51C348
hwinsta 0x0000003c
amwinsta 0x000f037f
ptiMainThread @0x00000000
cThreads 0x00000000
rpdeskStartup @0xFE398408
hdeskStartup 0x00000040
pclsPrivateList @0x00000000
pclsPublicList @0x00000000
flags W32PF_CONSOLEAPPLICATION | W32PF_FORCEOFFFEEDBACK | W32PF_INITIALIZED |
W32PF_IOWINSTA | W32PF_PROCESSCONNECTED | W32PF_READSCREENACCESSGRANTED | W32PF_STARTGLASS |
W32PF_THREADCONNECTED
dwHotkey 0x00000000
pWowProcessInfo @0x00000000
luidSession 0x00000000:0x0000c217
dwX,dwY (0x0,0x0)
dwXSize,dwYSize (0x0,0x0)
dwFlags 0x00000001
wShowWindow 0x0001
pCursorCache 0x00000000
dwLpkEntryPoints 0
Desktop views:
pdesk = fe398408, ulClientDelta = 9feb0000
--------
with the following window classes registered for the process:
Classes for process e2604ea8:
(i.e. none - they have been all been unregistered)
Just after CreateWindow returns the win32k PROCESSINFO structure appears
thus:
---PPROCESSINFO @ 0xE2604EA8 for process 43c(yz.exe):
ppiNext @0x00000000
rpwinsta @0xFE51C348
hwinsta 0x0000003c
amwinsta 0x000f037f
ptiMainThread @0x00000000
cThreads 0x00000001
rpdeskStartup @0xFE398408
hdeskStartup 0x00000040
pclsPrivateList @0xA0345B38
pclsPublicList @0xA03458A0
flags W32PF_CLASSESREGISTERED | W32PF_CONSOLEAPPLICATION |
W32PF_FORCEOFFFEEDBACK | W32PF_INITIALIZED | W32PF_IOWINSTA | W32PF_PROCESSCONNECTED |
W32PF_READSCREENACCESSGRANTED | W32PF_STARTGLASS | W32PF_THREADCONNECTED
dwHotkey 0x00000000
pWowProcessInfo @0x00000000
luidSession 0x00000000:0x0000c217
dwX,dwY (0x0,0x0)
dwXSize,dwYSize (0x0,0x0)
dwFlags 0x00000001
wShowWindow 0x0001
pCursorCache 0x00000000
dwLpkEntryPoints 0
Desktop views:
pdesk = fe398408, ulClientDelta = 9feb0000
--------
with the following window classes registered for the process:
Classes for process e2604ea8:
Private class PCLS @ 0xa0345b38 (DDEMLUnicodeServer)
Private class PCLS @ 0xa0345ab0 (DDEMLAnsiServer)
Private class PCLS @ 0xa0345a20 (DDEMLUnicodeClient)
Private class PCLS @ 0xa0345998 (DDEMLAnsiClient)
Private class PCLS @ 0xa0345918 (DDEMLMom)
Public class PCLS @ 0xa03458a0 (Static)
Public class PCLS @ 0xa0345830 (IME)
Public class PCLS @ 0xa03457b0 (MDIClient)
Public class PCLS @ 0xa0346fa0 (ListBox)
Public class PCLS @ 0xa0346f28 (Edit)
Public class PCLS @ 0xa0346eb8 (#32770)
Public class PCLS @ 0xa0346e38 (ComboLBox)
Public class PCLS @ 0xa0346db8 (ComboBox)
Public class PCLS @ 0xa0346d40 (Button)
Public class PCLS @ 0xa0346ce0 (Message)
Public class PCLS @ 0xa0346c80 (DDEMLEvent)
Public class PCLS @ 0xa0346c20 (#32772)
Public class PCLS @ 0xa0344b60 (#32774)
Public class PCLS @ 0xa0344b00 (ScrollBar)
Public class PCLS @ 0xa0344aa0 (#32768)
Public class PCLS @ 0xa0344a40 (#32771)
Public class PCLS @ 0xa03449e0 (#32769)
(i.e. some standard classes but not the class Test created by the program)
The particular problems observed by Prasad are due to window classes being
registered by DllMain routines in response to DLL_PROCESS_ATTACH which are
then discarded by win32k when the count of active win32k client threads falls
to zero. Code that expects the classes to exist often fails in unpredictable
ways.
There are a number of possible workarounds - here are two that come to mind:
1) First call CreateRemoteThread to create a thread that just waits on its
own thread object. This thread will ensure that the count of win32k clients
never falls to zero. Alternatively the thread could just Sleep for some
suitable period of time or could attempt to determine when the initial
(suspended) thread is resumed and has called some win32k function. Futher
calls to CreateRemoteThread could now be used to load additional libraries,
patch code, etc..
2) Divert the initial thread to first call the routine which would have been
executed by CreateRemoteThread before it executes the entry point of the
executable. Here is an example of the code that I use:
#include <windows.h>
VOID InsertCall(HANDLE process, HANDLE thread, PAPCFUNC function, ULONG data)
{
CONTEXT context = {CONTEXT_FULL};
GetThreadContext(thread, &context);
ULONG stack[14];
stack[13] = context.EFlags;
stack[12] = context.SegCs;
stack[11] = context.Eip;
stack[10] = context.Eax;
stack[9] = context.Ecx;
stack[8] = context.Edx;
stack[7] = context.Ebx;
stack[6] = context.Esp - 12;
stack[5] = context.Ebp;
stack[4] = context.Esi;
stack[3] = context.Edi;
stack[2] = 0xCF6158;
stack[1] = data;
stack[0] = context.Esp - 48;
context.Esp -= sizeof stack;
context.Eip = ULONG(function);
SetThreadContext(thread, &context);
WriteProcessMemory(process, PVOID(context.Esp), stack, sizeof stack, 0);
}
int main(int argc, char *argv[])
{
PROCESS_INFORMATION pi;
STARTUPINFO si = {sizeof si};
if (argc != 2) return 0;
CreateProcess(0, argv[1], 0, 0, FALSE, CREATE_SUSPENDED, 0, 0, &si, &pi);
InsertCall(pi.hProcess, pi.hThread, Sleep, 2000);
ResumeThread(pi.hThread);
// WaitForSingleObject(pi.hProcess, INFINITE);
CloseHandle(pi.hThread);
CloseHandle(pi.hProcess);
return 0;
}
Gary
>You are right. I have already implemented this kind of solution.
>However, I was looking for a more cleaner way of doing this.
There is in fact a cleaner way:
#define _WIN32_WINNT 0x0400
#include <windows.h>
int main(int argc, char **argv)
{
STARTUPINFO si = {sizeof(si)};
PROCESS_INFORMATION pi;
CreateProcess(
0,
argv[1],
0,
0,
0,
CREATE_SUSPENDED,
0,
0,
&si,
&pi
);
QueueUserAPC(
(PAPCFUNC)&ExitProcess,
pi.hThread,
0
);
ResumeThread(pi.hThread);
return 0;
}
Slava
>The kernel queues a user mode APC (that runs ntdll!LdrInitializeThunk) to
>every newly created thread. If the loader has already been initialized (as
>determined by fields in the PEB), then this routine just returns.
... after calling all of the DLL init routines with DLL_THREAD_ATTACH,
right?
>So, application coding errors excepted, there should be no problem in calling
>CreateRemoteThread to create a thread into a process that was created with
>CREATE_SUSPEND and which has not been resumed.
[...]
>The problem observed by Prasad (and others including EliCZ) is due to the
>tracking of resource usage by win32k.sys. Consider the following program:
>(i.e. some standard classes but not the class Test created by the program)
>
>The particular problems observed by Prasad are due to window classes being
>registered by DllMain routines in response to DLL_PROCESS_ATTACH which are
>then discarded by win32k when the count of active win32k client threads falls
>to zero. Code that expects the classes to exist often fails in unpredictable
>ways.
... which suggests "don't that", at least to me. I think that orderly
DLL init. sequence is an integral part of Win32 which should not be
modified in any way. It may be a pain in the neck to preserve it and
still be able to do what people usually do with CRT(), but there are
ways to deal with that, your method #2 and QueueUserAPC(). I even
think these methods are better than CRT(), because they are much less
intrusive.
Slava
>2. Query the context of the initial thread (using the thread handle returned
>by CreateProcess).
>3. Call VirtualAllocEx to allocate a chunk of memory in the target process.
>4. Use WriteProcessMemory() to copy some **carefully** written machine code
>to memory allocated in the target process.
>5. Modify the initial thread's context so that EIP points to the machine
>code you just copied over.
>6. Resume the initial thread.
The only problem with that is your code will be executed before the
LdrInitializeThunk() APC so you should be very careful about what you
can call. In particular, I would not call LoadLibrary()... But I think
this is what you mean by **carefully** written.
Slava
Matt Pietrek published some related code in Microsoft Systems Journal, March
2000
Project: DelayLoadProfile
Maybe you can make use of it?
"Slava M. Usov" <stripit...@usa.net> schrieb im Newsbeitrag
news:vtnf2tg8bcfv3r4iv...@4ax.com...
>
> On Fri, 01 Dec 2000 05:37:51 GMT, "keithmo"
> <kei...@earthlink.SPAM.FREE.net> wrote:
>
> >2. Query the context of the initial thread (using the thread handle
returned
> >by CreateProcess).
> >3. Call VirtualAllocEx to allocate a chunk of memory in the target
process.
> >4. Use WriteProcessMemory() to copy some **carefully** written machine
code
> >to memory allocated in the target process.
> >5. Modify the initial thread's context so that EIP points to the machine
> >code you just copied over.
> >6. Resume the initial thread.
>
You can, indeed, call LoadLibrary(). ~3 years ago, I wrote a tool that uses
this technique to inject a DLL into a newly created process. It worked
great. (The LdrInitializeThunk() APC is invoked *before* the real entrypoint
is called.)
KM
>You can, indeed, call LoadLibrary(). ~3 years ago, I wrote a tool that uses
>this technique to inject a DLL into a newly created process. It worked
>great. (The LdrInitializeThunk() APC is invoked *before* the real entrypoint
>is called.)
This is not true, at least anymore. Now the first thing that gets
executed in ring-3 of a new process is ntdll.dll!KiUserApcDispatcher()
which passes control to ntdll.dll!LdrInitializeThunk(), which loads
all of the DLLs, calls ntdll.dll!DbgBreakPoint() and then calls all of
the DLL init routines, then returns to
ntdll.dll!KiUserApcDispatcher(), which then calls NtContinue(); the
context with which this NtContinue() is invoked is pushed on stack
before ring-3 execution starts, and EIP points to
kernel32.dll!BaseProcessStartThunk(), with the Win32 execution address
in EAX [this is why, BTW, all of Win32 processes *must* link to
kernel32.dll, otherwise it is not loaded and passing execution to
kernel32.dll!BaseProcessStartThunk() results in an AV].
kernel32.dll!BaseProcessStartThunk(), in turn, establishes a SEH
frame, calls NtSetInformationThread() with
ThreadQuerySetWin32StartAddress and the address originally passed in
EAX, which is the start address specified in the image of the
executable.
If you create the thread suspended, its context will be just that,
ntdll.dll!LdrInitializeThunk() APC pending with the "real" context
pushed on stack. If you modify this context
>
>KM
>
[..]
>If you create the thread suspended, its context will be just that,
>ntdll.dll!LdrInitializeThunk() APC pending with the "real" context
>pushed on stack. If you modify this context
Sorry, hit the wrong key and away it went. So, if this context is
modified, the code will run before ntdll.dll!LdrInitializeThunk().
The fact that the context is pre-LdrInitializeThunk() is demonstrated
by the code referenced in my message in thread "debug registers
problem". If you build that proggie, you can set a breakpoint on
ntdll.dll!LdrInitializeThunk() and it *will* fire.
Slava
Your "debug register" code launches the child process with the DEBUG_PROCESS
flag set. This may change the dynamics a bit.
LdrInitializeThunk() is the first user-mode code run in the newly created
process. It is never invoked directly; rather, it is an APC queued on the
initial thread created for the process. As the initial thread makes its
first transition from kernel- to user-mode, all queued APCs are invoked
sequentially. After the APCs are run, the entrypoint (the one you see in EIP
in the thread context) is invoked. For a Win32 app, EIP points to a routine
in KERNEL32.DLL. You may also note that EAX contains the actual "image"
entrypoint (typically, the C runtime initialization code that eventually
calls either main() or WinMain()).
KM
Whoops, I sent this in response to your "Sorry, hit the wrong key..."
message without first reading the partial message. My bad.
You are correct in that the first bit of user-mode code to execute in a
process is LdrInitializeThunk() which, as you pointed out, loads DLLs,
invokes init routines, and generates the first breakpoint (for processes
that are being debugged). However, since the initial thread is created
suspended, the APC queue is not run down (and therefore LdrInitializeThunk()
is not invoked and the context to which you refer does not get pushed onto
the stack) until the thread is resumed.
Your description of the gory details of user-mode APC dispatching and
process initialization is very accurate, but none of it occurs until the
initial thread is resumed.
KM
PS: The DLL injection app that I wrote using this technique works great
under NT4 and Win2K.
Hello Slava,
The program included in my original message that demonstrates the
problem does not depend on any vagaries of the DLL initialization
process. Well into the main() routine it creates a new thread and
exits the initial thread. Since this program does not break any
written rules of Win32 programming (of which I am aware), I would
say that the aggressive reclaiming of resources (the root of the
problem) is a bug in win32k.sys.
Gary
Hello Slava,
Have you tested this code? On my system it always crashes. The problem is that user-mode APCs don't
execute serially; they are dispatched serially, but because of kernel-mode to user-mode transition
issues they may actually run concurrently.
Gary
>The program included in my original message that demonstrates the
>problem does not depend on any vagaries of the DLL initialization
>process. Well into the main() routine it creates a new thread and
>exits the initial thread. Since this program does not break any
>written rules of Win32 programming (of which I am aware), I would
>say that the aggressive reclaiming of resources (the root of the
>problem) is a bug in win32k.sys.
Gary,
even if what causes the problem is a bug in win32k.sys, that does not
necessarily mean that CRT() into an un-initialized process is legal
Win32. There are no written rules, of which I am aware, one should or
may follow in such circumstances.
Slava
>Hello Slava,
>
>Have you tested this code?
Of course. Runs on a variety of different machines.
>On my system it always crashes. The problem is that user-mode APCs don't
>execute serially; they are dispatched serially, but because of kernel-mode to user-mode transition
>issues they may actually run concurrently.
In what way? Another thread is created for concurrent execution? Or
the running APC is pre-empted and another one is dispatched? Either
way violates the written rules.
Slava
>You are correct in that the first bit of user-mode code to execute in a
>process is LdrInitializeThunk() which, as you pointed out, loads DLLs,
>invokes init routines, and generates the first breakpoint (for processes
>that are being debugged). However, since the initial thread is created
>suspended, the APC queue is not run down (and therefore LdrInitializeThunk()
>is not invoked and the context to which you refer does not get pushed onto
>the stack) until the thread is resumed.
There seems to be a difference between the behaviors of
GetThreadContext() when the target process is created without
DEBUG_PROCESS and with DEBUG_PROCESS. In the latter case, the context
returned is in fact the context with the APC about to run, in the
former case it is the "non-APC" context. You're correct, the dynamics
is changed.
>PS: The DLL injection app that I wrote using this technique works great
>under NT4 and Win2K.
Yes, it should, unless you create the process with DEBUG_PROCESS.
Sorry for my confusion.
Slava
>On Mon, 4 Dec 2000 10:28:47 +0100, "Gary Nebbett"
><gary.n...@syngenta.com> wrote:
>
>
>>Hello Slava,
>>
>>Have you tested this code?
>
>Of course. Runs on a variety of different machines.
>
>>On my system it always crashes.
Correction. I indeed tested the code an a few SMP machines, and then
on non-SMP machines, but less thoroughly. On SMP machines it executed
perfectly, but turns out, there is a problem on uni-processor. If I
understand correctly what is going on, on uniprocessor the KM thread
initialization code does not get a chance to run and the
LdrInitializeThunk() APC is not posted. So "my" APC becomes the first
APC that is run, and that crushes the thread silently. That and the
fact that I chose ExitProcess() for the APC routine made an impression
that everything worked properly; the problem can be easily
demonstrated if ExitProcess() is replaced with Sleep(). It is
important *not* to break execution between CreateProcess() and
QUAPC(), otherwise the problem goes away.
The problem may be remedied by putting Sleep(0); just before QUAPC(),
but that is, unfortunately, not 100% reliable.
Another solution that comes to mind would be to inject code into the
target that calls NtQueueApcThread() on the same thread with some
functional APC routine, and call NtQueueApcThread() on the target
thread with APC routine set to the injected code. But that's becoming
quite messy and approaches the "usual" SetThreadContext() business.
It is interesting to note that we have another instance of perfectly
legitimate Win32 code which misbehaves when invoked before
process/Win32 initialization happens. That is certainly not a bug in
win32k.sys that crashes the thread, it is namely the Win32
uninitializedness.
>>The problem is that user-mode APCs don't
>>execute serially; they are dispatched serially, but because of kernel-mode to user-mode transition
>>issues they may actually run concurrently.
I'm interpreting this again: "concurrently" means that the APC I
thought would have been the second in fact was the first. "Kernel-mode
to user-mode transition" means execution of KM _thread_initialization_
code and _first_time_ transition to UM. Correct?
Thanks for pointing out the problem, Gary. It gave me a valuable
insight.
Slava
Could you elaborate on the "concurrently" part? I sheepishly confess never
having queued an APC, and I'm wondering about the thread context in which
the APCs run.
Thanks,
Will
Hello Will,
"concurrently" was definitely the wrong word. What I had noticed
was that the execution of an APC could be preempted by another
APC. In my test case, the APC routine called (amongst other
routines) the Win32 function Beep; I did not know at the time
that Beep internally calls SleepEx with an Alertable argument of
TRUE. So if an APC function enters an alertable wait then other
queued APCs can run before the first APC is finished.
Gary