Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Deadlock in RAND_poll's Heap32First call

198 views
Skip to first unread message

sandeep kiran p

unread,
Feb 23, 2012, 8:11:23 AM2/23/12
to
Hi,

OpenSSL Version: 0.9.8o
OS : Windows Server 2008 R2 SP1

I am seeing a deadlock in a windows application between two threads, one thread calling Heap32First from OpenSSL's RAND_poll and the other that allocates memory over the heap.

Here is the relevant stack trace from both the threads involved in deadlock.

Thread 523
----------------
ntdll!ZwWaitForSingleObject+a
ntdll!RtlpWaitOnCriticalSection+e8
ntdll!RtlEnterCriticalSection+d1
ntdll!RtlpAllocateHeap+18a6
ntdll!RtlAllocateHeap+16c
ntdll!RtlpAllocateUserBlock+145
ntdll!RtlpLowFragHeapAllocFromContext+4e7
ntdll!RtlAllocateHeap+e4
ntdll!RtlInitializeCriticalSectionEx+d2
ntdll!RtlpActivateLowFragmentationHeap+181
ntdll!RtlpPerformHeapMaintenance+27
ntdll!RtlpAllocateHeap+1819
ntdll!RtlAllocateHeap+16c


Thread 454
-----------------
ntdll!NtWaitForSingleObject+0xa
ntdll!RtlpWaitOnCriticalSection+0xe8
ntdll!RtlEnterCriticalSection+0xd1
ntdll!RtlLockHeap+0x3b
ntdll!RtlpQueryExtendedHeapInformation+0xf4
ntdll!RtlQueryHeapInformation+0x3c
ntdll!RtlQueryProcessHeapInformation+0x3ad
ntdll!RtlQueryProcessDebugInformation+0x3b0
kernel32!Heap32First+0x71

WinDBG reports that thread 523 and 454 both hold locks and are waiting for each other locks thereby resulting in a deadlock. 

On searching, I have found a couple instances where such an issue has been reported with Heap32Next on Windows 7 but haven't found anything that helps me solve the problem. Most of the references I found conclude that this could be because of a possible bug in heap traversal APIs. If someone has faced a similar problem, can you guide me to possible workarounds by which I can avoid the deadlock? Can I remove the heap traversal routines and find some other sources of entropy?

Thanks for your help.

Regards
Sandeep





Jakob Bohm

unread,
Feb 23, 2012, 11:38:25 AM2/23/12
to
From the evidence given, I would *almost* certainly characterize
this as a deadlock bug in ntdll.dll, the deepest, most trusted
user mode component of Windows!

Specifically, nothing should allow regular user code such as
OpenSSL to hold onto NT internal critical sections while not
running inside NTDLL, and NTDLL should be designed not to
deadlock against itself.

There is one other possibility though:

The OpenSSL code in rand_win.c holds on to a "snapshot" lock
on some of the heap data while walking it. It may be doing
this in a way not permitted by the rules that are presumed
by the deadlock avoidance design of the speed critical heap
locking code.
Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. http://www.wisemo.com
Transformervej 29, 2730 Herlev, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

______________________________________________________________________
OpenSSL Project http://www.openssl.org
User Support Mailing List openss...@openssl.org
Automated List Manager majo...@openssl.org

sandeep kiran p

unread,
Feb 24, 2012, 8:14:31 AM2/24/12
to
You mentioned that OpenSSL is holding a "snapshot" lock in rand_win.c. I couldn't find anything like that in that file. Can you specifically point me to the code that you are referring to? I would also like to get an opinion on possible workarounds that I can enforce to avoid the deadlock. 

1. Can I remove the heap traversal routines Heap32First and Heap32Next? Will it badly affect the PRNG output later on?

2. Can I replace Heap32First and Heap32Next calls with any other sources of entropy? What if I make a call to CryptGenRandom again in place of the heap traversal routines?

3. Any other possible ways out?

Thanks,
Sandeep

Jeffrey Walton

unread,
Feb 24, 2012, 4:30:20 PM2/24/12
to
On Fri, Feb 24, 2012 at 4:08 PM, Jakob Bohm <jb-op...@wisemo.com> wrote:
> On 2/24/2012 2:14 PM, sandeep kiran p wrote:
>>
>> You mentioned that OpenSSL is holding a "snapshot" lock in rand_win.c. I
>> couldn't find anything like that in that file. Can you specifically point me
>> to the code that you are referring to? I would also like to get an opinion
>> on possible workarounds that I can enforce to avoid the deadlock.
>>
> In OpenSSL 1.0.0 it is line 486 which says
>
>         module_next && (handle = snap(TH32CS_SNAPALL,0))
>
> where snap is a pointer to KERNEL32.CreateToolhelp32Snapshot()
I've found that creating too many tool tip snapshots too frequently
causes problems (in a different problem domain). How frequently is
OpenSSL doing it? Just once during module startup? From where is it
being called (it can't be DllMain)?

If the heap walk occurs once after DllMain, there should not be any
problems (in theory).

>> 1. Can I remove the heap traversal routines Heap32First and Heap32Next?
>> Will it badly affect the PRNG output later on?
>
> It depends how good the other sources of random numbers are,
> more below.
>>
>>
>> 2. Can I replace Heap32First and Heap32Next calls with any other sources
>> of entropy? What if I make a call to CryptGenRandom again in place of the
>> heap traversal routines?
>
> Calling CryptGenRandom() twice isn't going to help much.
>
> If CryptGenRandom() is as good as it is "supposed to" be,
> the other entropy sources are not really needed.  But if
> CryptGenRandom() is somehow broken or untrustworthy,
> calling it a million times wouldn't help.
"Cryptanalysis of the Random Number Generator of the Windows Operating
System," eprint.iacr.org/2007/419.pdf

>
> [SNIP]
>

Also of interest might be "Analysis of the Linux Random Number
Generator," eprint.iacr.org/2006/086.pdf.

Jeff

sandeep kiran p

unread,
Feb 25, 2012, 9:30:25 AM2/25/12
to
MSDN says

" To enumerate the heap or module states for all processes, specify TH32CS_SNAPALL and set th32ProcessID to zero. "

So it presumably does the heap and module walk for all processes and not only for the current process.

Do you think  CreateToolhelp32Snapshot's  lock on the read-only snapshot could be a possible culprit?

I am now thinking about removing the calls to Heap32First and Heap32Next in rand_win.c and look for alternate sources of entropy.

Thanks for you help.

Regards
Sandeep

On Sat, Feb 25, 2012 at 2:38 AM, Jakob Bohm <jb-op...@wisemo.com> wrote:
On 2/24/2012 2:14 PM, sandeep kiran p wrote:
You mentioned that OpenSSL is holding a "snapshot" lock in rand_win.c. I couldn't find anything like that in that file. Can you specifically point me to the code that you are referring to? I would also like to get an opinion on possible workarounds that I can enforce to avoid the deadlock.

In OpenSSL 1.0.0 it is line 486 which says

        module_next && (handle = snap(TH32CS_SNAPALL,0))

where snap is a pointer to KERNEL32.CreateToolhelp32Snapshot()
1. Can I remove the heap traversal routines Heap32First and Heap32Next? Will it badly affect the PRNG output later on?
It depends how good the other sources of random numbers are,
more below.


2. Can I replace Heap32First and Heap32Next calls with any other sources of entropy? What if I make a call to CryptGenRandom again in place of the heap traversal routines?
Calling CryptGenRandom() twice isn't going to help much.

If CryptGenRandom() is as good as it is "supposed to" be,
the other entropy sources are not really needed.  But if
CryptGenRandom() is somehow broken or untrustworthy,
calling it a million times wouldn't help.

Anyway, I have my doubts about the value of using the local
heap walking functions as a source of entropy, as they
reflect only the state of your own process.  Pretending that
the address and size of each malloc()-ed memory block in
your process contributes 3 to 5 bytes of additional entropy
(which is what the comments say) is wildly optimistic and
quite unrealistic.

In a long-running web browser or a similarly long running
web server, the net total of the memory layout effects of
thousands of semi-chaotic previous network requests and
user actions might contribute a total of 10 to 50 bits of
entropy.  But in a typical freshly started process, the
layout is going to be pretty deterministic (if the OS
uses address layout randomization, it probably does so
based on entropy sources already incorporated into its
standard random source, i.e. CryptGenRandom() on Windows).


3. Any other possible ways out?

Thanks,
Sandeep

       which I can avoid the deadlock? Can I remove the heap

       traversal routines and find some other sources of entropy?

       Thanks for your help.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  http://www.wisemo.com
Transformervej 29, 2730 Herlev, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

Jakob Bohm

unread,
Feb 26, 2012, 2:06:41 PM2/26/12
to
On 2/25/2012 3:30 PM, sandeep kiran p wrote:
> MSDN says
>
> " To enumerate the heap or module states for all processes, specify
> TH32CS_SNAPALL and set /th32ProcessID/ to zero. "
>
> So it presumably does the heap and module walk for all processes and
> not only for the current process.
>
Aha! Missed that detail in this hard-to-read code. I had
enough trouble untangling the crazy run-on lines and the
unconventional naming of function pointers very differently
than the pointed-to functions, not to mention the lack of
comments clarifying why it doesn't check for lack of a
pointer to the snapshot close function (there is a reason,
several pages further down in the code, but still no comment).
> Do you think *CreateToolhelp32Snapshot's* lock on the read-only
> snapshot could be a possible culprit?
That was the guess, but just a guess, hard to know without
spending several days reverse engineering that particular
version of the heap code in ntdll .
> <mailto:jb-op...@wisemo.com <mailto:jb-op...@wisemo.com>>>
0 new messages