Preventing stuck thread errors

2,086 views
Skip to first unread message

sger...@gmail.com

unread,
Jun 7, 2011, 7:10:26 AM6/7/11
to xad...@googlegroups.com
The configured timeouts for XADisk's threads should be in par with the global thread timeout of your container.
 
So, e.g. in Weblogic, if you get regular errors like the one at the end of this message, then you need to either increase your server's thread timeout or decrease the corresponding intervals of XADisk's RAR configuration. 

####<Jun 7, 2011 10:51:12 AM BST> <Error> <WebLogicServer> <uv419.eudra.org> <gwfilehandler_2> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <> <1307440272516> <BEA-000337> <[STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "700" seconds working on the request "weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@5e517550", which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:

sun.misc.Unsafe.park(Native Method)

java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)

java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)

org.xadisk.filesystem.workers.EventWorker.run(EventWorker.java:40)

weblogic.connector.security.layer.WorkImpl.runIt(WorkImpl.java:108)

weblogic.connector.security.layer.WorkImpl.run(WorkImpl.java:44)

weblogic.connector.work.WorkRequest.run(WorkRequest.java:95)

weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516)

weblogic.work.ExecuteThread.execute(ExecuteThread.java:201)

weblogic.work.ExecuteThread.run(ExecuteThread.java:173)

>

Nitin Verma

unread,
Jun 7, 2011, 1:04:30 PM6/7/11
to XADisk
Hi Stelios,

Thanks for starting this thread.

Weblogic uses the term "stuck threads" for all kinds of threads, even
for JCA worker threads. But, JCA specification explicitly supports
long running worker threads. I have
created a separate thread about why worker threads created by JCA
Resource Adapters are really justified and common place:
http://groups.google.com/group/xadisk/t/844d9d29e15bc9cd?hl=en


Now, coming back to the workarounds. I had heard from someone that if
we don't want to change the global setting (the option you have
suggested), a separate work-manager can be used and configured to
ignore stuck threads. I do not have a weblogic license to try that
out. It would be great if you can also elaborate about that as an
alternative to weblogic users.

I found a relevant thread about weblogic "stuck threads".
http://stackoverflow.com/questions/2709410/weblogic-stuck-thread-protection


You also mentioned: "decrease the corresponding intervals of XADisk's
RAR configuration.". Are you referring
to the ra.xml configuration properties like
"deadLockDetectorInterval"?

Thanks,
Nitin

sger...@gmail.com

unread,
Jun 9, 2011, 4:51:12 AM6/9/11
to xad...@googlegroups.com
Yes I refer to these intervals.
 
I will check the work manager thing, because a solution to this is really necessary.
Even though there is no problem eventually (the container cleans up its own thread pool from "stuck" threads and XADisk continues working), the logs are quickly filled up with exceptions while server instances are marked with Warning in the AdminConsole.
 
This could makes the server admins of any organization nervous. :-)
 
S.

sger...@gmail.com

unread,
Jun 9, 2011, 11:10:39 AM6/9/11
to xad...@googlegroups.com
So after some investigation the solution to the "stuck thread" problem is indeed to create a custom WorkManager for XADisk.
 
So, building on the example weblogic-ra.xml given in this thread https://groups.google.com/forum/#!topic/xadisk/9nmA-_GODGw, you modify it as follows
 
<?xml version="1.0" encoding="UTF-8" ?>
<weblogic-connector...>
...
<enable-global-access-to-classes>true</enable-global-access-to-classes>
<work-manager>
   <name>XADisk-WorkManager</name>
   <response-time-request-class>
      <name>xadisk_response_time</name>
      <goal-ms>2000</goal-ms>
   </response-time-request-class>
   <min-threads-constraint>
      <name>xadisk_min_threads</name>
      <count>3</count>
   </min-threads-constraint>
   <ignore-stuck-threads>true</ignore-stuck-threads>
</work-manager>
<outbound-resource-adapter>
...
</weblogic-connector>
Names can be whatever you want.
The really important part is ignore-stuck-threads setting, but since you are defining a WorkManager, you might as well take better advantage of it.
The full reference of options for the work-manager element can be found here http://download.oracle.com/docs/cd/E13222_01/wls/docs90/resadapter/weblogic_ra_xml.html#1070460 (where, strangely, the stuck thread property is missing).
 

Nitin Verma

unread,
Jun 9, 2011, 12:06:20 PM6/9/11
to XADisk
Thanks Stelios.

If you can please clarify : the solutions proposed in the first post
are also valid and can be used as alternatives? Or, you
would strongly recommend the work-manager solution?

Thanks Again...
Nitin

sger...@gmail.com

unread,
Jun 14, 2011, 6:56:24 AM6/14/11
to xad...@googlegroups.com
No, the WorkManager solution IS the solution for Weblogic.
 
Unfortunately, from a brief inspection of XADisk code, not all thread-related timeouts are configurable, so there might still be some "suck thread" messages, even after increasing the domain-level timeouts.
In addition, since this is a domain-level change it affects other parts of the system, like HTTP handling which may be unacceptable.
 
So, eventually the right way is to define a WorkManager
Reply all
Reply to author
Forward
0 new messages