LdrInitializeThunk Causes Process to Die

Matt Taylor

unread,

Jul 23, 2004, 12:32:38 PM7/23/04

to

I am creating a process in the suspended state and trying to inject a
thread. The problem is that LdrInitializeThunk calls NtRaiseHardError with
STATUS_DLL_INIT_FAILED from the newly created thread. Is there any way to
bypass LdrInitializeThunk on a usermode thread? All I need to do is call
LdrLoadDll in the new process.

Alternatively, is there a way to cause the initial thread in a process to
execute its LdrInitializeThunk APC and then become suspended before
executing BaseProcessStart? I suspect that once this initial APC has
executed I will be able to successfully create threads in the process.

TIA,
-Matt

Slava M. Usov

unread,

Jul 23, 2004, 1:17:17 PM7/23/04

to

"Matt Taylor" <mta...@blackhole.com> wrote in message
news:G4bMc.39762$KP6.2...@twister.tampabay.rr.com...

> I am creating a process in the suspended state and trying to inject a
> thread. The problem is that LdrInitializeThunk calls NtRaiseHardError with
> STATUS_DLL_INIT_FAILED from the newly created thread. Is there any way to
> bypass LdrInitializeThunk on a usermode thread? All I need to do is call
> LdrLoadDll in the new process.

Do not do it. Do not inject anything into a process before its entry point
is called.

> Alternatively, is there a way to cause the initial thread in a process to
> execute its LdrInitializeThunk APC and then become suspended before
> executing BaseProcessStart? I suspect that once this initial APC has
> executed I will be able to successfully create threads in the process.

It could be done by a double-APC technique, described in this forum a few
times. But the remark above applies just the same. Do not do it. It is a lot
simpler to patch the entry point in the image header than do all these
unholy things.

S

Ivan Brugiolo [MSFT]

unread,

Jul 23, 2004, 1:31:20 PM7/23/04

to

How can you get a remote thread to execute a LdrLoadDll if the loader
(and the process, for what matters) is not initialized yet ?
The behavior you are seeing is expected, and what you are trying
to accomplish is way past the unsupported area.

Technically speaking, you can inject how many threads you feel like
in a suspended process, as long as they are aware of the global state
of the process when they execute.

If you have gotten the concept that a thread in a process is an APC
that continues a well estabilished context instead of returning,
you should have your mechanism to execute arbitrary code.
Replace the to-be-continued context with something suitable for your
purpose.

However, I really hope that your project will never get to any commercial
stage,
since this is going to NOT work across implementations, build version,
service packs, etc, etc.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Matt Taylor" <mta...@blackhole.com> wrote in message
news:G4bMc.39762$KP6.2...@twister.tampabay.rr.com...

Matt Taylor

unread,

Jul 23, 2004, 1:58:53 PM7/23/04

to

"Slava M. Usov" <stripit...@gmx.net> wrote in message
news:uw0bjkNc...@TK2MSFTNGP11.phx.gbl...

> "Matt Taylor" <mta...@blackhole.com> wrote in message
> news:G4bMc.39762$KP6.2...@twister.tampabay.rr.com...
> > I am creating a process in the suspended state and trying to inject a
> > thread. The problem is that LdrInitializeThunk calls NtRaiseHardError
with
> > STATUS_DLL_INIT_FAILED from the newly created thread. Is there any way
to
> > bypass LdrInitializeThunk on a usermode thread? All I need to do is call
> > LdrLoadDll in the new process.
>
> Do not do it. Do not inject anything into a process before its entry point
> is called.

Between BaseProcessStart and the entry point there is nothing of interest.
One must be careful not to throw an exception, but it really doesn't make a
difference either way. One of the things my DLL does is hook
KiUserExceptionDispatcher. If something really bad happens then the
exception handler in BaseProcessStart wouldn't get called anyway.

I am already injecting with SetThreadContext, but I want to use
CreateRemoteThread because my injection stub has to leak memory with
SetThreadContext.

> > Alternatively, is there a way to cause the initial thread in a process
to
> > execute its LdrInitializeThunk APC and then become suspended before
> > executing BaseProcessStart? I suspect that once this initial APC has
> > executed I will be able to successfully create threads in the process.
>
> It could be done by a double-APC technique, described in this forum a few
> times. But the remark above applies just the same. Do not do it. It is a
lot
> simpler to patch the entry point in the image header than do all these
> unholy things.

What is the double-APC technique? Google does not find anything under that
name.

-Matt

Matt Taylor

unread,

Jul 23, 2004, 2:22:16 PM7/23/04

to

"Ivan Brugiolo [MSFT]" <ivan...@online.microsoft.com> wrote in message
news:u$q4arNcE...@TK2MSFTNGP10.phx.gbl...

> How can you get a remote thread to execute a LdrLoadDll if the loader
> (and the process, for what matters) is not initialized yet ?
> The behavior you are seeing is expected, and what you are trying
> to accomplish is way past the unsupported area.

That is true, but I had already planned to write my own loader. The ideal
would be to get LdrInitializeThunk to execute without executing
BaseProcessStart. If I can't do that but instead am able to execute code
before LdrInitializeThunk executes, then I think I can accomplish the same.

> Technically speaking, you can inject how many threads you feel like
> in a suspended process, as long as they are aware of the global state
> of the process when they execute.

No. LdrInitializeThunk is queued to the thread before any of my code
executes, and that causes the process to terminate itself. Despite the call
to NtRaiseHardError, my thread executes correctly. However, the initial
thread in the process promptly terminates after I resume it.

> If you have gotten the concept that a thread in a process is an APC
> that continues a well estabilished context instead of returning,
> you should have your mechanism to execute arbitrary code.
> Replace the to-be-continued context with something suitable for your
> purpose.

I already do this. To elaborate, the code I am working on is a debugger, and
the code I am injecting is a stub to load a DLL with various routines to
assist the debugger. Right now I support two separate injection methods:
SetThreadContext and CreateRemoteThread. The former is required for init due
to the fact that LdrInitializeThunk in the second thread causes the process
to terminate. The latter is required to attach to processes since all
threads may be stuck in the kernel.

For various reasons I would prefer to support only the CreateRemoteThread
path. The foremost is probably that CreateRemoteThread makes the injection
code cleaner (i.e. I can fail gracefully) and is far less prone to failure.

> However, I really hope that your project will never get to any commercial
> stage,
> since this is going to NOT work across implementations, build version,
> service packs, etc, etc.

Could you elaborate? What am I relying on that can break? AFAIK both the
SetThreadContext and CreateRemoteThread injection methods are widely used
already.

-Matt

Ivan Brugiolo [MSFT]

unread,

Jul 23, 2004, 4:47:52 PM7/23/04

to

You can create how many remote threads you feel like.
As I said below, they will run in an hostile environment (nothig is
initialized),
and they will race (and synchronize, possibly) for process initialization,
and all the intermediate steps that process initialization implies.
I think that you are seeing the synchronization over process initialization.
However, by simply using RtlQueryProcessDebugInformation,
you can inject a thread that can do the process initialization even
if it's not the thread creted by CreateProcessInternalW.
Of course, since this is all internal,
you migh not be able to leverage any of this for your work.
I'm just saying that it's possible to have the NON main
thread that does process initializaiton.

There are many debuggers that does not require any support code
in the target process. With the module-laod event for ntdll.dll,
you can even debug the very same LdrpProcessInitialize.
The exceptions are dispatched to the debugger by the K-Mode
excpetion handling code well before they reach the
KiUserExcpetionDispatcher.

I think you are doing a sealth debugger,
so that it cannot be detected by the applications
via traditional means (PEB->BeingDebugges, presence of a DebugPort in
EPROCESS).
If you are just interested in monitoring exceptions, you can use the
VectoredExceptionHandlers.
If you are interested in monitoring exceptions and divert execution.
(for example, to avoid popular code scrambling and anti-debugging
techniques)
then you are facing a quite interesting and challenging task,
and I wish you my best luck. Technically speaking, I think that it would
be quite hard even with the full access to the Loader Code.
I hope that the purpose of this effort is not malicious.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Matt Taylor" <mta...@blackhole.com> wrote in message news:sHcMc.39776$KP6.
253...@twister.tampabay.rr.com...

Matt Taylor

unread,

Jul 24, 2004, 3:58:41 AM7/24/04

to

"Ivan Brugiolo [MSFT]" <ivan...@online.microsoft.com> wrote in message

news:upXZRZPc...@TK2MSFTNGP10.phx.gbl...
[...]

> I think that you are seeing the synchronization over process
initialization.
> However, by simply using RtlQueryProcessDebugInformation,
> you can inject a thread that can do the process initialization even
> if it's not the thread creted by CreateProcessInternalW.
> Of course, since this is all internal,
> you migh not be able to leverage any of this for your work.
> I'm just saying that it's possible to have the NON main
> thread that does process initializaiton.

Can you give me any leads on RtlQueryProcessDebugInformation? There is
scanty documentation on how to use it (in particular a description of the
buffer parameter). It does not look useful to me, but I was curious to see
if/how it avoids the initialization problem.

Interestingly, I stumbled across this while Googling for
RtlQueryProcessDebugInformation:
http://www.anticracking.sk/EliCZ/bugs/WinBugs.htm

The third bug listed is a much more detailed analysis of the one that I
described. I observed that it was kernel32.dll failing with
STATUS_DLL_INIT_FAILED on Windows Server 2003 -- exactly what he documents.

I believe I have found a hackish workaround, however. I now queue an APC to
the thread that executes NtSuspendThread on itself. This re-suspends the
thread after LdrpInitialize finishes executing. Unfortunately I know of no
way to synchronize on the APC, so I have to leak memory. That's not
absolutely terrible, but I would prefer a better method if one exists...

> There are many debuggers that does not require any support code
> in the target process. With the module-laod event for ntdll.dll,
> you can even debug the very same LdrpProcessInitialize.
> The exceptions are dispatched to the debugger by the K-Mode
> excpetion handling code well before they reach the
> KiUserExcpetionDispatcher.
>
> I think you are doing a sealth debugger,
> so that it cannot be detected by the applications
> via traditional means (PEB->BeingDebugges, presence of a DebugPort in
> EPROCESS).

While it can stealthily debug, that is not the commerical application. The
commercial application is being able to provide ICE-like capabilities in a
software-based debugger. This sort of functionality is not possible through
Windows debugging. How many Windows debuggers can give you an instruction
backtrace, and how many can step backwards through a program? Providing such
functionality does necessitate stealth; that is unimportant to the problem
of injection, though.

> If you are just interested in monitoring exceptions, you can use the
> VectoredExceptionHandlers.
> If you are interested in monitoring exceptions and divert execution.
> (for example, to avoid popular code scrambling and anti-debugging
> techniques)
> then you are facing a quite interesting and challenging task,
> and I wish you my best luck. Technically speaking, I think that it would
> be quite hard even with the full access to the Loader Code.
> I hope that the purpose of this effort is not malicious.

[...]

Actually the exception handling code was written and has been working for
almost a year. In a nutshell KiUserExceptionDispatcher is hooked and the
debugger decides whether to pass the exception on to the app or to handle it
itself.

The general idea of circumventing LdrInitializeThunk/LdrpInitialize is still
interesting to me. My debugger DLL communicates with the host debugger via a
pipe, and I have experienced problems where the pipe thread deadlocks. To
work around that, I created a separate thread to process command requests
and put a watchdog timer on it. However, if a thread deadlocks while the
target is halted and the PEB lock is held, then LdrpInitialize blocks and
effectively deadlocks the process. As I recall, the DLL callouts are the
problem there.

-Matt

Slava M. Usov

unread,

Jul 24, 2004, 11:43:24 AM7/24/04

to

"Matt Taylor" <mta...@blackhole.com> wrote in message

news:xlcMc.39773$KP6.2...@twister.tampabay.rr.com...

> Between BaseProcessStart and the entry point there is nothing of interest.

We're not talking about BaseProcessStart. We're talking about
LdrInitializeThunk.

> What is the double-APC technique? Google does not find anything under that
> name.

Look for APC and CreateRemoteThread. That should get you going. The idea is
that you queue an APC that queues another that gets the real job done. The
second APC will run in safe environment with everything initialized. Until
MS decides to do process/thread initialization differently.

But now that I see what you're trying to do, that will not help you much.
You want to be there before the DLLs are initialized. Why do you need to be
in the target's address space in the first place?

S

Ivan Brugiolo [MSFT]

unread,

Jul 24, 2004, 12:36:29 PM7/24/04

to

>[ RtlQueryProcessDebugInformation]

> It does not look useful to me, but I was curious to see
> if/how it avoids the initialization problem.

It does not avoid the problem, Either it does the initialization
itself or it waits for one other thread to do it.
The point here is that the code executed is aware of the
global state of ntdll in early initialization stages.

>This re-suspends the
> thread after LdrpInitialize finishes executing. Unfortunately I know of no
> way to synchronize on the APC, so I have to leak memory. That's not
> absolutely terrible, but I would prefer a better method if one exists...

This is a terribly short-leg-ed method. As Slava pointed out as well.

> How many Windows debuggers can give you an instruction
> backtrace, and how many can step backwards through a program? Providing
such
> functionality does necessitate stealth; that is unimportant to the problem
> of injection, though.

Modern CPUs (P4 and upward) can store execution history in non-pageable
memory pointed by some MSRs. I think it's normally known as branch trace
buffer.
For other CPUs, there is not much a user-mode debugger can do.
Even stepping backwards, it's always an usafe operation, since
you have to transact the global state of the application,
not just the executing code-path.

> The general idea of circumventing LdrInitializeThunk/LdrpInitialize is
still
> interesting to me. My debugger DLL communicates with the host debugger via
a
> pipe, and I have experienced problems where the pipe thread deadlocks. To
> work around that, I created a separate thread to process command requests
> and put a watchdog timer on it. However, if a thread deadlocks while the
> target is halted and the PEB lock is held, then LdrpInitialize blocks and
> effectively deadlocks the process. As I recall, the DLL callouts are the
> problem there.
>

This is the whole reason why the debugging infrastrucutre in Windows
is implemented in the Kernel, and the debugger is a separate executable.
There is no easy way to get that code right.
I'm not sure this will work for anyting besides
void main() { printf("hello world!"); };
as your own experiments points out.
Good luck in your attempts.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Matt Taylor" <mta...@blackhole.com> wrote in message

news:REoMc.808$kU5....@twister.tampabay.rr.com...

Matt Taylor

unread,

Jul 27, 2004, 2:06:29 PM7/27/04

to

"Slava M. Usov" <stripit...@gmx.net> wrote in message

news:#UX2xUZc...@tk2msftngp13.phx.gbl...

> "Matt Taylor" <mta...@blackhole.com> wrote in message
> news:xlcMc.39773$KP6.2...@twister.tampabay.rr.com...
>
> > Between BaseProcessStart and the entry point there is nothing of
interest.
>
> We're not talking about BaseProcessStart. We're talking about
> LdrInitializeThunk.

I don't really care about LdrInitializeThunk. I'm just trying to find some
way to work around it, and I've probably been a bit confusing because since
my original post I have found some misconceptions that I had about this
function and initialization in general.

> > What is the double-APC technique? Google does not find anything under
that
> > name.
>
> Look for APC and CreateRemoteThread. That should get you going. The idea
is
> that you queue an APC that queues another that gets the real job done. The
> second APC will run in safe environment with everything initialized. Until
> MS decides to do process/thread initialization differently.

The second APC is only necessary for Win9x, right? Aren't APCs always called
in-order on NT systems?

> But now that I see what you're trying to do, that will not help you much.
> You want to be there before the DLLs are initialized. Why do you need to
be
> in the target's address space in the first place?

I want to be in the process's address space as early as possible, but it
doesn't have to be before ntdll et. al. initialize themselves. It just needs
to be before BaseProcessStart begins executing. The problem is that I have
no way to allow the initial thread to execute the LdrInitializeThunk APC
without causing BaseProcessStart to execute.

The project I am working on is similar in function to Valgrind
(http://valgrind.kde.org/), except that I am (obviously) targetting Windows.
That is why I need to be in the target process's address space.

-Matt

Slava M. Usov

unread,

Jul 27, 2004, 4:27:07 PM7/27/04

to

"Matt Taylor" <mta...@blackhole.com> wrote in message news:FQwNc.349

[...]

> The second APC is only necessary for Win9x, right?

Nope, NT.

> Aren't APCs always called in-order on NT systems?

Yes.

> I want to be in the process's address space as early as possible, but it
> doesn't have to be before ntdll et. al. initialize themselves. It just
> needs to be before BaseProcessStart begins executing. The problem is that
> I have no way to allow the initial thread to execute the
> LdrInitializeThunk APC without causing BaseProcessStart to execute.

BaseProcessStarts, if memory serves, executes as soon as the
LdrInitializeThunk APC returns. That is done by NtContinue(). So set a
breakpoint at NtContinue(). A few years ago I posted a sample that used IA32
debug registers in this newsgroup, and it used this very technique.

Or you could modify the EIP in the context record, which is stored on stack
and which is used by NtContinue() -- have look at the stack contents when
LdrInitializeThunk is just about to execute, you will see what I mean.

S

Matt Taylor

unread,

Jul 28, 2004, 7:56:18 PM7/28/04

to

"Slava M. Usov" <stripit...@gmx.net> wrote in message

news:ee01YhBd...@tk2msftngp13.phx.gbl...

> "Matt Taylor" <mta...@blackhole.com> wrote in message news:FQwNc.349
>
> [...]
>
> > The second APC is only necessary for Win9x, right?
>
> Nope, NT.
>
> > Aren't APCs always called in-order on NT systems?
>
> Yes.

[...]

Hmm, I don't follow, then; why is the second APC necessary on NT?
LdrInitializeThunk is queued to the thread from NtCreateThread, so it would
execute first. Next my APC would execute. When my APC completed and
returned, then BaseProcessStart would execute.

I have a working solution now, but I am still interested just for personal
knowledge. Many thanks to both of you for all of your insights and
suggestions.

-Matt

Slava M. Usov

unread,

Jul 28, 2004, 8:30:36 PM7/28/04

to

"Matt Taylor" <mta...@blackhole.com> wrote in message

news:C2XNc.5880$wM....@twister.tampabay.rr.com...

> Hmm, I don't follow, then; why is the second APC necessary on NT?
> LdrInitializeThunk is queued to the thread from NtCreateThread, so it
> would execute first. Next my APC would execute.

Nope. When a process is created suspended and you queue an APC, your APC
will go first. Again, I suggest that you find the old discussions in this
newsgroup.

S

Matt Taylor

unread,

Jul 29, 2004, 1:09:40 AM7/29/04

to

"Slava M. Usov" <stripit...@gmx.net> wrote in message

news:u$XBJOQdE...@TK2MSFTNGP12.phx.gbl...

I see. Now I'm encountering the same problem as you did: the APC actually
executes prior to LdrInitializeThunk. I was casually single-stepping through
LdrInitializeThunk earlier tonight because of that. I don't know how it
worked since I still have a large number of Win32 and RTL dependencies. I
suppose I got lucky.

It's a shame that there is no thread creation flag to prevent DLL attach
messages, but I suppose it would be hard to make a convincing argument given
that such a feature would be horribly abused and result in a lot of
headaches.

-Matt

Ivan Brugiolo [MSFT]

unread,

Jul 29, 2004, 2:45:28 AM7/29/04

to

A Module can declare that it does not need THREAD_ATTACH/DETACH,
and not the other way around (the thread declares that it does not need it).

Once you've cleared yourself with all the details of the APC
that calls NtContinue over a pre-estabilished context,
you should re-think your design given your goals.

If you want to create some debugging helper infrastructure
that tracks heap allocations, lock issues, threadding issue,
you should start from a debugger, and from the existing
OS features in this area: PageHeap, Handle-Verifier, Lock-Verifier.

--
This posting is provided "AS IS" with no warranties, and confers no rights.
Use of any included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

"Matt Taylor" <mta...@blackhole.com> wrote in message

news:oE%Nc.15360$DZ.9...@twister.tampabay.rr.com...

Matt Taylor

unread,

Jul 29, 2004, 4:56:44 AM7/29/04

to

"Ivan Brugiolo [MSFT]" <ivan...@online.microsoft.com> wrote in message

news:OKJmleT...@TK2MSFTNGP12.phx.gbl...

> A Module can declare that it does not need THREAD_ATTACH/DETACH,
> and not the other way around (the thread declares that it does not need
it).
>
> Once you've cleared yourself with all the details of the APC
> that calls NtContinue over a pre-estabilished context,
> you should re-think your design given your goals.

I am already familiar with APCs. I've got stubs that hook
KiUserApcDispatcher, KiUserExceptionDispatcher, NtContinue, and something
else which is called mostly by win32k.sys, so I've forgotten its name now.
(KiUserCallback? Something like that...) These are all essentially the same,
and I have no difficulty running MSIE or .NET apps -- both of which throw
numerous exceptions. I also have emulation of XP SP2's NX feature working on
my Pentium-4.

> If you want to create some debugging helper infrastructure
> that tracks heap allocations, lock issues, threadding issue,
> you should start from a debugger, and from the existing
> OS features in this area: PageHeap, Handle-Verifier, Lock-Verifier.

My debugger is a fundamentally different architecture. A number of the
features that have been implemented and almost all of the features planned
cannot be done out-of-proc. It would be too slow. In some cases, 1 million
times slower is an understatement. I am familiar with the dangers of
operating in-proc, but I really have no choice. I've made sure that my code
does not require the PEB lock except on process attach/detach.

-Matt

Slava M. Usov

unread,

Jul 29, 2004, 11:07:10 AM7/29/04

to

"Matt Taylor" <mta...@blackhole.com> wrote in message

news:oE%Nc.15360$DZ.9...@twister.tampabay.rr.com...

> I see. Now I'm encountering the same problem as you did: the APC actually
> executes prior to LdrInitializeThunk. I was casually single-stepping
> through LdrInitializeThunk earlier tonight because of that. I don't know
> how it worked since I still have a large number of Win32 and RTL
> dependencies. I suppose I got lucky.

Yep. Most of win32 does lazy init, and that helps. But I would not rely on
it.

S