All threads not showing in full thread dump

579 views
Skip to first unread message

Brad Wood

unread,
Sep 6, 2013, 11:20:28 PM9/6/13
to fusion...@googlegroups.com
I don't think the "Stack Trace All' function is actually giving me all the threads on the JVM.  Case and point; see the full stack trace here: 


Thread http-bio-8888-exec-266 claims it's being blocked by thread http-bio-8888-exec-219, yet the latter doesn't show anywhere in the trace.

Threads http-bio-8888-exec-261http-bio-8888-exec-21, and http-bio-8888-exec-256, claim they're being blocked by thread http-bio-8888-exec-242 but it doesn't show anywhere in the trace either.

Can someone confirm if Stack Trace All is supposed to return all threads.  I need it to in order to track down blocking threads in my JVM.

Thanks!

~Brad

David Stockton

unread,
Sep 9, 2013, 11:16:32 AM9/9/13
to fusion...@googlegroups.com
Hello Brad,

Thank you for your email. The "gist" you sent appears to only list (some) of the http-bio-* threads. I think the stack trace is from the Requests->Stack Trace All feature. This doesn't give you a stack trace of all threads, only of WebRequest threads. If you click Resources->Threads->Stack Trace All, this will give you a list of all threads.

However, having said that there is a very small possibility that the blocking thread could have completed before all the stack traces were gathered. There is typically only a very very small chance of this happening. It would be interesting to know if you're commonly seeing this issue. Not wanting to rest with this, we are planning an improvement in the 5.0.8 point release which will minimize further the possibility of this happening.

Just to give you a bit of background as to why we can't/don't force an atomic (i.e. static) stack trace of all threads - this is because it would require a JVM halt which would be poor for performance.

I hope this helps explain and hopefully you're not seeing this occur too often (and less so in the forthcoming 5.0.8). Again, if you are reproducing this several times; please contact us with further details at sup...@fusion-reactor.com

Best regards,
David Stockton
Fusion Team


--
You received this message because you are subscribed to the Google Groups "FusionReactor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fusionreacto...@googlegroups.com.
To post to this group, send email to fusion...@googlegroups.com.
Visit this group at http://groups.google.com/group/fusionreactor.
For more options, visit https://groups.google.com/groups/opt_out.

Brad Wood

unread,
Sep 9, 2013, 1:17:18 PM9/9/13
to fusionreactor
 If you click Resources->Threads->Stack Trace All, this will give you a list of all threads.

Ahh, I think that's the missing piece I didn't know about.  I will use the Stack Trace All under Resources next time.

> However, having said that there is a very small possibility that the blocking thread could have completed before all the stack traces were gathered.

That is possible.  I collected that stack trace during some server load testing where I I pushing the server until requests started piling up.  As you can see- most of the threads in that trace are all blocked, so most of the threads weren't moving very fast.

> stack trace of all threads - this is because it would require a JVM halt

I completely understand.

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp 

ColdBox Platform: http://www.coldbox.org 

Brad Wood

unread,
Sep 20, 2013, 12:00:28 AM9/20/13
to fusion...@googlegroups.com
One final follow up on this thread in reference to the stack trace obtained via "Resources->Threads->Stack Trace All".

Will those stack traces be guaranteed to be a snapshot of the entire JVM at the exact same time, or will some time have passed between the threads as they are captured?  In a previous E-mail, you said you "can't/don't force an atomic (i.e. static) stack trace of all threads" but I wanted to clarify if that only applied to the "Requests->Stack Trace All" or all methods including "Resources->Threads->Stack Trace All".

In my current scenario I'm trying to iron out some Java blocking on synchronized methods during load tests and to track the blocking hierarchy of threads, I need to make sure they're all take from the same slice of time.  Also in this specific instance, I don't care about the performance since it's just a load test and an accurate reproduction of my threads is more important so I can track exactly where each thread is blocking the other.

For reference, here's my thread in the Railo forums where I'm using this stack trace:

Thanks!

~Brad

David Stockton

unread,
Sep 20, 2013, 6:48:13 AM9/20/13
to fusion...@googlegroups.com
Hello Brad,

Thanks for your questions. Yesterday we released v5.0.8 which includes a relevant improvement:
FR3796 Improvement On JVM 6 platforms, FR should obtain thread, monitor and ownable synchronizer information
Stack traces are "more likely" to be atomic and also contain more information. We use the same method to get the stack trace as many of the JVM vendors own tooling so this is a very reliable method. I realise this may not sound good but we're open about the fact it's not guaranteed atomic whereas other monitor vendors simply say nothing and use similar methods.

I believe CTRL-BREAK in a command window for the JVM generates an atomic trace if you're just using this to grab one-time data as part of a test. This could help you if you find your high-thread counts are still causing non-atomic dumps on FR v5.0.8 in your setup.

Best regards,
David Stockton
Fusion Team

Brad Wood

unread,
Sep 20, 2013, 1:42:00 PM9/20/13
to fusionreactor
Thanks for info.  I'll try the new version, but unfortunately, I may need to abandon FR for stack traces and start getting them from the command line.  I'm analyzing a trace right now and there's 44 missing threads that are supposedly blocking other threads.  I would have swore that I used the method that is supposed to get all threads, but now I'm starting to question myself.  I'll have to wait until I can another stack to be sure.

I wrote a small CFML script to parse a stack, build a thread graph, and analyze it for blocking chains and circular dependencies.  Surely thing kind of thing already exists out there in the Java world-- perhaps with a GUI.  Any suggestions?

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp 

ColdBox Platform: http://www.coldbox.org 



David Stockton

unread,
Sep 26, 2013, 4:17:07 AM9/26/13
to fusion...@googlegroups.com
Hello Brad,

Regarding this new trace and the 44 missing threads, could you point one out to us please?
If I look at the truncated gist - https://gist.github.com/bdw429s/6531016 - it's of course not complete. However, if you look at the raw (un-truncated) version - https://gist.github.com/bdw429s/6531016/raw/5b994f9be6471c17d408966624d7032eae9d58e8/gistfile1.txt - it appears to be complete. Could it just be that you perhaps accidentally were analyzing the truncated version?

I will point out that we noticed thread "http-bio-8888-exec-109" (Id=524) appears to have had its name truncated/corrupted (to "`9") but it's ID is still available. I'm not sure if that's a copy & paste issue or something else but the linked list of blocked threads appears to be valid.

If you don't think that's the case, it'd be a big help if you could point out what's missing because none of the tools we use or manual analysis are showing anything missing. 

Many thanks for your continued help/information,
David Stockton
Fusion Team

Brad Wood

unread,
Sep 26, 2013, 4:19:25 PM9/26/13
to fusionreactor
Hi Dave, thanks for the follow up.  Sorry I let the thread languish but the environment I had been testing went live and I hadn't been able to run any additional tests since my last post.  However, Railo released patch 004 that addressed several performance items yesterday and I just finished setting up a very similar environment that needed load testing so I spent several hours in front of Jmeter and FR last night hammering a ContentBox install on Railo.

I made sure that every stack trace I retrieved used the "Resources > Threads > Stack Trace All" method and not the "Request > Activity Stack Trace All" option.  (Perhaps the latter could be renamed to "Stack Trace Running" or something?)  I analyzed over a dozen stacks and I don't think I found a single one with a missing thread, so that's good.  

At this point I'm reasonably assured that I'm getting all threads present and accounted for.  I even went back and reanalyzed the FULL Gist after fixing thread 109 and it shows as all there so I'm not sure what was going on there. 

So the final remaining issue is whether or not my stack trace is an exact slice of time or not.  Of course, you've already addressed that this is not a guarantee the way stack traces are implemented.  It takes about 2-3 seconds for the server to reply when retrieving a trace and with the server processing ~200 request per second at an average time of ~500ms per request, that means 400-600 requests complete in the time the stack trace is obtained and none of the threads that were running at the beginning of the thread stack retrieval are likely to still be running by the time it's done.  

That being said, I'm not sure if that makes the "blocked by" bits quite as useful.  For instance, in analyzing an average trace, I'll see several instances of what appears to be circular references where two or more threads appear to be blocking each other, yet deadlocks are not resulting so I think it's highly possible that there aren't actually circular references at any one point in time, but the threads are captured over a period of a few seconds and so it appears that way.  Does that seem fair?  

Atomicity aside, I'm not nearly as worried about finding out what threads are the source of blocking now.  Almost all instances have turned out to be synchronized Java methods in Railo and Hibernate and that's been enough to get me where I need to go.

For reference, here's one of the latest full thread dumps I captured:

You should find 103 blocked threads and 32 running for a total of 135 which roughly matches my number of Jmeter threads.  An example of an apparent circular references would be:

http-bio-8888-exec-108 > http-bio-8888-exec-123 > http-bio-8888-exec-87 > http-bio-8888-exec-27 > http-bio-8888-exec-43 > http-bio-8888-exec-108

The full trace doesn't quite pan out though since thread 54 is blocked by 108 on the synchronized method railo.runtime.tag.TagHandlerPool.reuse(), yet 108 is busy in Hibernate code with reference to reuse() in its stack so it shouldn't still be holding any monitor entries for it.

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp 

ColdBox Platform: http://www.coldbox.org 



Brad Wood

unread,
Sep 26, 2013, 4:36:24 PM9/26/13
to fusionreactor
Ok, I'm starting a new post for my observations about 5.0.8 since my last message was getting long and this is sort of a different thing.  All the tests I ran in my last message I also ran against FR 5.0.8.  Stack traces seemed to take about the same amount of time to return.  One thing that I have noticed (same with both versions) is that there is no visual indicator on the screen that you are waiting on the server.  This is even more difficult if the request has begun returning, but taking a while to transfer due to the size of the response.  When hitting the "refresh" button on an existing trace, I'm not always sure if I'm looking at the new one yet or the old one still.  I'd file that under minor UI enhancements to consider.

The traces I get with 5.0.8 are different on the same server experiencing the same load as with 5.0.6.  The GIST I posted in my last message shows 103/135 of the requests blocked by a specific thread.  The new stack tells you that the thread is blocked and on what resource, but not by whom.

Example:

Thread from 5.0.6 trace:
"http-bio-8888-exec-10" Id=94 BLOCKED on org.hibernate.cache.UpdateTimestampsCache@dc9280 owned by "http-bio-8888-exec-126" Id=249
at org.hibernate.cache.UpdateTimestampsCache.invalidate(UpdateTimestampsCache.java:77)
-  blocked on org.hibernate.cache.UpdateTimestampsCache@dc9280
at org.hibernate.engine.ActionQueue$AfterTransactionCompletionProcessQueue.afterTransactionCompletion(ActionQueue.java:593)
etc...

Thread from 5.0.8 trace:
"http-bio-8888-exec-129" daemon id=234 prio=5 BLOCKED
   java.lang.Thread.State: BLOCKED
        at org.hibernate.cache.UpdateTimestampsCache.invalidate(UpdateTimestampsCache.java:77)
        - blocked on org.hibernate.cache.UpdateTimestampsCache@414ba50f
        at org.hibernate.engine.ActionQueue$AfterTransactionCompletionProcessQueue.afterTransactionCompletion(ActionQueue.java:593)
        snip...
       at java.lang.Thread.run(Thread.java:724)

   Locked ownable synchronizers: 
        - java.util.concurrent.ThreadPoolExecutor$Worker@7173113f

This makes it so I can't trace the hierarchy of what requests are blocking other requests.  Is there a way to get back the information of the thread owning the shared resource that is blocking?

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp 

ColdBox Platform: http://www.coldbox.org 



Darren Pywell

unread,
Sep 26, 2013, 6:19:47 PM9/26/13
to fusion...@googlegroups.com
Hi Brad,

You should be able to search the stack trace for the thread that is locking the object org.hibernate.cache.UpdateTimestampsCache@414ba50f

It should look something like the following on the locking thread:
- locked org.hibernate.cache.UpdateTimestampsCache@414ba50f

The owned by field from on the blocked thread trace should be there in our stack trace. I will follow up to see what's going on.

On another note, you asked about a tool to analyze the thread traces. I usually use IntelliJ IDEA to do this. They have a free community edition of the software that includes stack analysis.

Cheers,
Darren

Brad Wood

unread,
Sep 26, 2013, 10:45:59 PM9/26/13
to fusionreactor
Ahh, I see the "- locked" lines now.  If you can get the blocking thread put back into the top of the trace, that would be great.  I can easily parse that all out in a single regex, but trying to parse for the "- locked" bit will probably prove too much effort.  
I'll check out IntelliJ IDEA too.

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp 

ColdBox Platform: http://www.coldbox.org 



Darren Pywell

unread,
Sep 27, 2013, 9:13:20 AM9/27/13
to fusion...@googlegroups.com
Cool. The -locked lines give you the same result. I'll see that we have the owned by in for the next update. I've also noticed that IntelliJ have an issue with thread id= instead of thread Id= when parsing. I've logged a bug with IntelliJ but we may also put the Id= back in to make this work with them. If you parse any 5.0.8 stack traces try replacing the id= with Id= to get better results.

Cheers,
Darren

Charlie Arehart

unread,
Oct 2, 2013, 11:15:46 AM10/2/13
to fusion...@googlegroups.com

Brad, while you await a possible tweak to FR to help make that info more readily identifiable, note that Ben Nadel has also started a project/tool (open source) to make it easier to analyze FR crash protection alert emails, of which 90% is a thread dump (stack traces of all threads):

https://github.com/bennadel/Fusion-Reactor-Alert

He has a live online version available at : http://bennadel.github.io/Fusion-Reactor-Alert/examples/javascript/index.htm

(And more info on the project http://www.bennadel.com/blog/2525-HTML-And-JavaScript-Viewer-For-Fusion-Reactor-Alert-Emails.htm and http://www.bennadel.com/blog/2535-ColdFusion-Insights-For-My-Online-Fusion-Reactor-Alert-Viewer.htm)

While its focus is indeed on the alert emails, it could certainly be made to do analysis (like you want) from just an FR stack trace, since again that’s what makes up 90% of an FR CP alert. As an open source project, anyone can contribute, of course. Or you may want to put the idea out to him (or on github) to consider as a possible improvement.

/charlie

Brad Wood

unread,
Oct 2, 2013, 6:46:47 PM10/2/13
to fusionreactor
Thanks for the Interesting link Charlie.  Unfortunately, Ben's project seems to error out on my stack trace.  Perhaps the E-mails are different in FR5.  

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp 

ColdBox Platform: http://www.coldbox.org 



--

David Stockton

unread,
Oct 3, 2013, 6:03:31 AM10/3/13
to fusion...@googlegroups.com
Hi,

The FRv5 emails now contain more information so the format has changed a little. Thanks for reminding us of the tool Charlie - hopefully we'll see the community pitch in together and update it to work with new stacks for those unfamiliar with other tools (like IntelliJ's).

Best regards,
David Stockton
Fusion Team
Reply all
Reply to author
Forward
0 new messages