Stuck Threads - Curious if these requests activated Crash Protection.

298 views
Skip to first unread message

Aaron Greenlee

unread,
Feb 2, 2012, 8:59:35 PM2/2/12
to fusion...@googlegroups.com
Hello Fusion Reactor Community!

I've deployed a redesign for a site and I see a few threads getting stuck at the stacktrace below. I am curious, are these threads that have been killed by FR Crash Protection? I've highlighted a line below that leads me to wonder that.

Thanks.

Aaron

java.net.SocketOutputStream.socketWrite0(SocketOutputStream.java:???)[Native Method]
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
java.net.SocketOutputStream.write(SocketOutputStream.java:136)
jrun.servlet.jrpp.JrppOutputStream$SpillStream.write(JrppOutputStream.java:182)
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
jrun.servlet.jrpp.JrppOutputStream.write(JrppOutputStream.java:70)
jrun.servlet.JRunResponse$OutputStreamDelegate.write(JRunResponse.java:858)
com.intergral.fusionreactor.b.b.g(TrackedStream.java:199)
com.intergral.fusionreactor.b.b.j(TrackedStream.java:229)
com.intergral.fusionreactor.b.b.write(TrackedStream.java:78)
com.intergral.fusionreactor.filter.b.d.write(SoftKillResponseStream.java:80)
coldfusion.runtime.CachedBufferedOutputStream.flushBuffer(CachedBufferedOutputStream.java:86)
coldfusion.runtime.CachedBufferedOutputStream.flush(CachedBufferedOutputStream.java:129)
sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
java.io.PrintWriter.flush(PrintWriter.java:276)
coldfusion.runtime.NeoJspWriter.flush(NeoJspWriter.java:316)
coldfusion.runtime.NeoPageContext.flushOutput(NeoPageContext.java:1981)
coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:42)
coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)
coldfusion.CfmServlet.service(CfmServlet.java:200)
coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
com.intergral.fusionreactor.filter.FusionReactorFilter.b(FusionReactorFilter.java:376)
com.intergral.fusionreactor.filter.FusionReactorFilter.c(FusionReactorFilter.java:254)
com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionReactorFilter.java:164)
jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
jrun.servlet.FilterChain.service(FilterChain.java:101)
jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

charlie arehart

unread,
Feb 2, 2012, 9:57:51 PM2/2/12
to fusion...@googlegroups.com

Aaron, since it’s overnight for the FR support folks in Germany, I’ll take a shot in case this may be urgent for you, or if you may want to try something and add more before they arrive in their morning.

I would suspect the more important point is that the requests are in the state of output buffering, which may merely be that they are sending a large amount of content to the browser. If you look at the request details (for the request next time it’s running, rather than stack trace it), what do you see for the “bytes sent” on the first/”main” tab? If it’s millions of bytes, this could explain it. The CF request thread is tied up until the output is received by the client. (It could be HTML being generated, or it could be the result of sending a large CFCONTENT perhaps.)

FR will show many sites having lots of “long-running requests”, which are not “slow” in terms of CF processing at all, but are instead just slow to send the results to the browser—but the CF request thread “pays that price”, and if too many of them are tied up, then everyone suffers and it appears that CF is “slow” or “hanging”. I see it all the time in helping troubleshoot.

Then, related to this, there have been some instances where the FR engineers found that requests were showing in FR as being in that output buffering state for millions of seconds—clearly longer than it should take to output even a huge file. They discovered that some browsers (older IE browsers) would fail to give the acknowledgement of having received the data, so the request would be hung in this state. They (the FR engineers) were able to come up with a work-around for this bug, in the form of a tweak you can enable in an FR config file. More info at:

http://www.fusion-reactor.com/support/kb/FRS-274.cfm

Let us know if either helps.

 

/charlie

 

--
You received this message because you are subscribed to the Google Groups "FusionReactor" group.
To view this discussion on the web visit https://groups.google.com/d/msg/fusionreactor/-/kEGFzqsbOsgJ.
To post to this group, send email to fusion...@googlegroups.com.
To unsubscribe from this group, send email to fusionreacto...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/fusionreactor?hl=en.

David Stockton

unread,
Feb 3, 2012, 5:08:21 AM2/3/12
to fusion...@googlegroups.com
Hi Aaron,

Further to Charlie's response, I'd also chime in that even if this were caught by crash protection, it wouldn't have been killed due to a limitation of Java - it can't kill threads that are stuck in "Native Method"s. This isn't an FR limitation but rather a Java one - http://www.fusion-reactor.com/support/kb/FRS-149.cfm

As a tip and to answer your specific question, from this stack-trace alone I can't see if it was caught be the crash protection feature. There are several ways to check if it was...
  • Is the request tagged with "Killing" in the running request view (only possible if the request is still running)
  • Check the crash protection log (you can see the most recent entries right in the FR interface under the CrashProtection -> CP Log or you can look at the actual log file on disk FusionReactor/instance/<instance name>/log ; or if you're on FR4 and the logs have been rotated & archived FusionReactor/instance/<instance name>/archive)
  • If you have CrashProtection notification emails enabled you would have received an email alert that it had caught the request.
Best regards,
David Stockton
Fusion Support Team

Aaron Greenlee

unread,
Feb 4, 2012, 12:41:07 AM2/4/12
to fusion...@googlegroups.com
Thank you Charlie and David. 

I'll take some time to review your suggestions and report back. 

I greatly appreciate your help.

Aaron

Aaron Greenlee

unread,
Feb 4, 2012, 11:58:36 AM2/4/12
to fusion...@googlegroups.com
It appears Charlie's advice pointing me to 'http://www.fusion-reactor.com/support/kb/FRS-274.cfm' may be the winner here. Will continue to monitor and report back if something changes.

Thank you!

Aaron Greenlee

Aaron Greenlee

unread,
Feb 4, 2012, 3:41:32 PM2/4/12
to fusion...@googlegroups.com
Looks like the issue is still persistent.

@David, I do not see the requests as being flagged by crash protection

@Charlie,

I do think you are on to something. I did apply the Fusion Reactor fix FRS-149, and, I saw an immediate improvement in performance. However, after a few hours we are still having issues. When I inspect a long running request, I do see "[Note: Data is still streaming]" next to the Stream Closed remark under Stream Metrics and frequently some bytes already sent. This may be unrelated, but, on some completed requests I see two (2) flushes even though the bytes sent can be as little as 1,762. If I understand FRS-149 correctly, I expected to only see a single flush since the buffer is 64k. 

Thanks,

Aaron




charlie arehart

unread,
Feb 4, 2012, 6:30:07 PM2/4/12
to fusion...@googlegroups.com

Aaron has reached out to me to do a consulting session together. He or I will report back what is discovered for him.

 

/charlie

 

From: fusion...@googlegroups.com [mailto:fusion...@googlegroups.com] On Behalf Of Aaron Greenlee


Sent: Saturday, February 04, 2012 3:42 PM
To: fusion...@googlegroups.com

--

You received this message because you are subscribed to the Google Groups "FusionReactor" group.

To view this discussion on the web visit https://groups.google.com/d/msg/fusionreactor/-/EBnV0czoNEEJ.

BobbyH

unread,
Jan 23, 2013, 3:29:09 PM1/23/13
to fusion...@googlegroups.com
Any update?

Charlie Arehart

unread,
Jan 23, 2013, 5:18:23 PM1/23/13
to fusion...@googlegroups.com

Bobby, this is from 11 months ago. I’m afraid I don’t remember the details at this point to say if whether I did offer a follow-up, or Aaron did, or what may have been said.

But let’s make this more useful for and more specific to you: what about the discussion has caught your attention? I’m assuming you perhaps did some googling that took you to this page. What problem do you have? And as for whether what was discussed there applies, what version of FR do you have? What updates to it have you applied? (that frs-149, if indeed you need it?)

/charlie

BobbyH

unread,
Jan 24, 2013, 8:56:18 AM1/24/13
to fusion...@googlegroups.com
That's OK, Charlie. I noticed how old it was after I asked for the update. I was seeing long running threads also hanging on the socketOutputStream. Your comments were enough to lead me to the bytes sent. I saw 7,8 and 9MB requests (via Bytes Sent in the request details) which led to the discovery of a massive amount of data being crammed into a drop down. It didn't help that the customer was pulling it all down at 23KBytes per second either...

I was, more or less, wondering if that was what the issue turned out to be for Aaron as well.

Thanks for the follow up.
Reply all
Reply to author
Forward
0 new messages