Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Hang/Deadlock issue with ReaderThreads in jldap

247 views
Skip to first unread message

krishna_r

unread,
Sep 27, 2009, 11:28:26 PM9/27/09
to
Hi folks,

We faced a deadlock issue in jldap Version: CVS tag - Toct_ndk_2006.
Looks like it might happen if a connection gets broken and a
particular ordering of threads execution.I did some digging in the
code and I am able to reproduce it consistently with some breakpoints.

The stack trace for the related threads are below:

Our application thread:

3XMTHREADINFO "http-13003-Processor72" (TID:0x32218E00,
sys_thread_t:0x33206BD8, state:CW, native ID:0x00287059) prio=5
4XESTACKTRACE at java/lang/Object.wait(Native Method)
4XESTACKTRACE at java/lang/Object.wait(Object.java:199
(Compiled Code))
4XESTACKTRACE at com/novell/ldap/
Connection.acquireWriteSemaphore(Connection.java:285(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection.writeMessage
(Connection.java:757(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection.writeMessage
(Connection.java:726(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Message.sendMessage
(Message.java:101(Compiled Code))
4XESTACKTRACE at com/novell/ldap/MessageAgent.sendMessage
(MessageAgent.java:286(Compiled Code))
4XESTACKTRACE at com/novell/ldap/
LDAPConnection.sendRequestToServer(LDAPConnection.java:3731(Compiled
Code))
4XESTACKTRACE at com/novell/ldap/LDAPConnection.bind
(LDAPConnection.java:1532(Compiled Code))
4XESTACKTRACE at com/novell/ldap/LDAPConnection.bind
(LDAPConnection.java:1399(Compiled Code))
4XESTACKTRACE at com/novell/ldap/LDAPConnection.bind
(LDAPConnection.java:1361(Compiled Code))
... <our application code>


2 Connection$ReaderThread exist for a single connection - stack traces
below:

Thread-1:

3XMTHREADINFO "Thread-16488" (TID:0x32E18100, sys_thread_t:
0x32D28F80, state:CW, native ID:0x003731A9) prio=5
4XESTACKTRACE at java/lang/Object.wait(Native Method)
4XESTACKTRACE at java/lang/Object.wait(Object.java:231
(Compiled Code))
4XESTACKTRACE at java/lang/Thread.join(Thread.java:671
(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection.shutdown
(Connection.java:986(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection.access$1300
(Connection.java:54(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection$ReaderThread.run
(Connection.java:1405(Compiled Code))
4XESTACKTRACE at java/lang/Thread.run(Thread.java:803
(Compiled Code))


Thread-2:

3XMTHREADINFO "Thread-16489" (TID:0x334CA600, sys_thread_t:
0x33206940, state:CW, native ID:0x002F30A1) prio=5

4XESTACKTRACE at java/lang/Object.wait(Native Method)
4XESTACKTRACE at java/lang/Object.wait(Object.java:199
(Compiled Code))
4XESTACKTRACE at com/novell/ldap/
Connection.acquireWriteSemaphore(Connection.java:285(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection.writeMessage
(Connection.java:757(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Message.abandon(Message.java:
506(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection.shutdown
(Connection.java:945(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection.access$1300
(Connection.java:54(Compiled Code))
4XESTACKTRACE at com/novell/ldap/Connection$ReaderThread.run
(Connection.java:1405(Compiled Code))
4XESTACKTRACE at java/lang/Thread.run(Thread.java:803
(Compiled Code))


My analysis below:
1. Ideally there should be only 1 ReaderThread per connection, but I
see 2 threads as above. (there is only one LDAPConnection from the
application - and the thread-ids point that they are spawned almost
consecutively - which voids that it could be a left-over thread from a
previous connection)
2. If 1 more ReaderThread were spawned, then the 1st one might try to
join (wrongly) with the other - as seen in Connect.shutdown()
3. In LDAPConnection.bind(), I see that there is a connect() in case
the socket in/out streams are null.(same in Connection.writeMessage()
as well)
4. So chances are that, a connection was broken and a reconnect caused
another ReaderThread to be spawned - without original one being
stopped. From our code we do a LDAPConnection.connect() first and then
LDAPConnection.bind(), so there is no other reason why connection
should start.

There had been a restart of the AD server (to which we are connecting
to), around the time the deadlock was triggered - which concurs with
my assumption. So I emulated a broken network connection, and used
breakpoints in jldap code and reproduced the deadlock (consistently).

Essentially, the root-cause of this deadlock is the 2 ReaderThreads.
Is this a known issue and fix is available? I checked in CVS in the
latest versions but couldn't find any.

Any thoughts/comments are appreciated.

Thanks,
Krishna R

krishna_r

unread,
Sep 28, 2009, 1:09:11 AM9/28/09
to
I read my post and found many details missing, so adding them here -
please let me know if anything is not clear.

1. I checked out HEAD source and reproduced the issue - I assume that
the bug still exists.

2. Steps to reproduce:

User/Application Code:
String host = "...";
int port = 389;
String principal = "uid=admin, ...";
String credential = "admin";
LDAPConnection connection = new LDAPConnection();
connection.connect(host, port);
connection.bind(LDAPConnection.LDAP_V3, principal,
credential.getBytes("UTF-8"));

Launched in eclipse with debugger and following breakpoints:
1. LDAPConnection.java (line 1523 in HEAD) - (in bind() checking
for isConnected())
2. LDAPConnection.java (line 1533) - (acquireSemaphore)
3. Connection.java (line 956) - (shutdown() - acquireSemaphore())
4. Connection.java (line 993) - (shutdown() - before join())

a. Run the application code in debugger. Debugger will suspend
the user thread in Breakpoint#1 (bind)
b. Using TCPView (on windows), close the TCP connection from
the domain to LDAP server. The daemon thread-1 (launched by LDAP) will
suspend in Breakpoint #3
c. Allow user thread to continue and suspend at Breakpoint #2.
By this time a second daemon thread (thread-2 say) would be launched
by LDAP
d. Allow thread-1 to continue – this thread will by blocked in
Thread.join()
e. Allow all the threads (and other blocked threads if any) to
continue, but the thread will be blocked in waiting to acquire Write-
semaphore.

Now the 3 threads would be deadlocked.


Thanks,
Krishna R

krishna_r

unread,
Sep 28, 2009, 7:40:29 AM9/28/09
to

I checked the Connection class further and below are my thoughts:

1. Connection.connect() calls for waitForThread(null) -> Waiting for
ReaderThread to shutdown incase there is already one. But as I said
before, still the new ReaderThread is launched though the older one is
running.

2. As I debugged through waitForReader(), I found that it reaches the
condition (thread == deadReader and thread == null) and exits. This
does not go with the expectation of the method (that it will exit only
after the reader has completed)

My guess is the condition should be (reader == deadReader and thread
== null). I modified the logic based on that and now the deadlock
doesn't happen. But still, I'm not convinced that this is the fix and
it will not break anything.

I have pasted the changed code below. Could somebody review this and
pass on their comments (on top of Connection.java, 1.87)

Thanks,
Krishna R

>diff Connection.java.orig Connection.java.fix
363,367c363,368
< /*
< * The reader thread may start and immediately
terminate.
< * To prevent the waitForReader from waiting forever
< * for the dead to rise, we leave traces of the
deceased.
< * If the thread is already gone, we throw an
exception.
---
>
> /**
> * Logic:
> * 1. If shutdown request, wait until reader == deadReader
> * 2. If start request, wait until reader == thread (checked in outer while loop)
> * Or if reader == deadReader, in which case throw an exception
369,382c370,397
< if( thread == deadReader) {
< if (thread == null) /* then we wanted a shutdown
*/
< return;
< if( Debug.LDAP_DEBUG) {
< Debug.trace( Debug.messages, name +
< "reader already terminated, throw
exception");
< }
< IOException lex = deadReaderException;
< deadReaderException = null;
< deadReader = null;
< // Reader thread terminated
< throw new LDAPException(
< ExceptionMessages.CONNECTION_READER,
< LDAPException.CONNECT_ERROR, null, lex);
---
> if( thread == null) {
> /*
> * Since this is a reader thread shutdown request,
> * return since the reader thread is dead.
> */
> if (reader == deadReader) /* then we wanted a shutdown */
> return;
> }
> else {
> /*
> * The reader thread may start and immediately terminate.
> * To prevent the waitForReader from waiting forever
> * for the dead to rise, we leave traces of the deceased.
> * If the thread is already gone, we throw an exception.
> */
> if( thread == deadReader) {
> if( Debug.LDAP_DEBUG) {
> Debug.trace( Debug.messages, name +
> "reader already terminated, throw exception");
> }
> IOException lex = deadReaderException;
> deadReaderException = null;
> deadReader = null;
> // Reader thread terminated
> throw new LDAPException(
> ExceptionMessages.CONNECTION_READER,
> LDAPException.CONNECT_ERROR, null, lex);
> }

0 new messages